# Spotify Recommender Model

Using Word2Vec, a variety of different recommender model options are explored.

1. **Content-Based**
> Predicts based on what a user has listened to in the past.
> Uses features of songs to find similar songs.

2. **Collaborative**
> Predicts based on what other listeners like
> Focuses on what songs other users liked who also liked a chosen song. 


## Word2Vec
In both types of recommender models, a 'vectorized' representation of a song is used to find similar songs.  Word2Vec was originally intended to do what it's name implies; convert Words to Vectors.  Here, we will use that intended functionality to convert Songs to Vectors.

### Embeddings
Word2Vec is a process that uses vectorized words to predict other words.  It does this by ingesting a series of documents, parsing out the words, vectorizing the words and then using the vector representations to predict other words.  The vectors are built in such a way that each word has a unique vector that is based on its usage in the documents.  The result is a vector space filled with words where related words have vectors that are similar.  This vector space is referred to an an **embedding**.  This embedding is used in two common word prediction tasks: Skip-Gram and Continuous Bag of Words.

> **Skip-Gram** <br>
> The Skip-Gram model asks for a single word and then predicts words surrounding the word.

> **Bag-of-Words** <br>
> The bag-of-words model asks for a series of words and will return the missing word.


### Making a Playlist
What does this have to do with playlists?  Good question.  If we can consider a Song as a Word and a Playlist as related words or sentences, the applicability is evident.

To make a playlist, we simply convert Song to Vectors and then find new songs by finding other songs with similar vectors.  To schieve this, we can use the Bag-of-Words or Skip-Gram approach as mentioned above.  Provide a song, a Skip-Gram model can supply a playlist.  Provide a list of songs, and Bag-of-Words model can give you the next song.

### That Was Easy!
Not so fast!  How we build the embedding of Songs will have an impact on how the new songs are predicted.  When we make the embedding, what are we giving as a song?  The title of the song?  The genre?  The artist? What are we providing as documents?  Playlists?  Albums? These choices will provide different results.  Three embedding options are explored:

<a name='index'></a>
### <a href=#1>1. Embeddings from Playlists - Song ID - Unsupervised</a>
> Here, we will take data from Spotify that included 1M playlists and the songs in each playlist.  We'll use the Word2Vec process supplying playlists as documents and each song's unique id is used as the word. <br><br>
After the embedding is created, we can skip the creation of building and training a BOW or Skip-Gram model.  All we need to do is find vectors that are similar to a song or a list of songs.  

### <a href=#2>2. Embeddings from Playlists - Song ID - BOW</a>
> We can use the same embedding to create a BOW model.


### <a href=#3>3. Embeddings from Playlists - Song ID - Skip-Gram</a>
> Let's use the embedding from the playlists and use Word2Vec to create a Skip-Gram model.


### <a href=#4>4. Embeddings from Song Features - Unsupervised</a>
> Here, we can take a break from Word2Vec and get very basic.  We create our own vectors based on Spotify accoustive feature data.  We have a series of fields available for all of our songs that numerically represent various characteristics of the songs; dancebaility, loudness, temp, key, energy, etc.



<br>

**References:**

https://www.analyticsvidhya.com/blog/2019/07/how-to-build-recommendation-system-word2vec-python/

https://towardsdatascience.com/using-word2vec-for-music-recommendations-bb9649ac2484

https://towardsdatascience.com/how-to-build-a-simple-song-recommender-296fcbc8c85



### Import libraries

In [3]:
# Basic Imports
import warnings;
warnings.filterwarnings('ignore')

import os
import sys
import pandas as pd
import numpy as np
import seaborn as sns
import time
import random
import matplotlib.pyplot as plt
%matplotlib inline


from gensim.models import Word2Vec
from gensim import utils
import gensim.models
from gensim.models import KeyedVectors

# For the Spotify Dataset
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, Float, MetaData, and_, or_, func
from sqlalchemy import create_engine
import sqlite3
from sqlalchemy.orm import sessionmaker
from sqlalchemy import exc

sys.path.append('../../')
from spotify_api import get_spotify_data, get_tracks, get_artists, get_audiofeatures
from spotify_database import get_session, display_time
from spotify_utils import Table_Generator, List_Generator, pickle_load, pickle_save

# !pip install ipywidgets 
# !jupyter nbextension enable --py widgetsnbextension
# !jupyter labextension install @jupyter-widgets/jupyterlab-manager

# %%capture
from tqdm import tqdm_notebook as tqdm

### Set Data Path Variables

In [4]:
data_path = '../../data/SpotifyDataSet'
db_path = '../../data/SpotifyDataSet/spotify_songs.db'

# Get sesion
session = get_session(db_path)
engine = create_engine('sqlite:///' + db_path)

# Get Songs class
Playlists = getattr(get_session, "Playlists")
Artists = getattr(get_session, "Artists")
Tracks = getattr(get_session, "Tracks")

<a name='1'></a>
## 1. Embeddings from Playlists using Song ID - Unsupervised
<a href=#index>back to index</a>


The baseline model will use embeddings to find similarities between songs.  The embeddings are built from playlists, where the playlist serves as a sentence made up of songs.

Similarities between songs are determined by their cosine distance with other songs.

To speed up to building of the enbedding, an extract is made from the database which will serve as documents for the embedding.  Each 'sentence' is a playlist and each 'word' is a song in the playlist.

From the DB, the following view is created which is subseqntly extracted as a CSV file:<br>

`CREATE VIEW playlist_tracks_uris AS SELECT t.playlist_id, group_concat(t.track_uri, ' ')  FROM playlists t GROUP BY  t.playlist_id;`


In [5]:
# Iterator that yields the songs for each playlist in the CSV file
class Playlist_URIs(object):
    """
    Playlist generator that yileds the track uris in a playlist.
    Yields one playlist at a time.
    """
    def __init__(self,
                 filename:str=os.path.join(data_path,'playlist_tracks.csv'),
                 name:str=None,
                 iters:int=None):
        self.filename     = filename
        self.length       = len(open(self.filename).readlines())
        self.name         = name
        self.count        = 0
        self.iters        = iters
        print("Creating Playlist Track Listing Generator:")
        print("\tlength     : ", self.length)
    
    def __iter__(self):
        
        self.count += 1
        progbar = tqdm(total=self.length, desc="{}:{}/{}".format(self.name, self.count, self.iters+1))
        
        for line in open(self.filename, 'r'):
            progbar.update(1)
            yield line.split('\t')[1].split(' ') # tab-delimited CSV file
            
        progbar.close()    
        

In [93]:
# Create a generator object to use while building the embedding
corpus_file = 'playlist_tracks.csv'
corpus_filepath = os.path.join(data_path, corpus_file)

iters = 5
playlists_gen = Playlist_URIs(filename=corpus_filepath,
                              name="Building Vectors",
                              iters=iters) 

Creating Playlist Track Listing Generator:
	length     :  999001


In [9]:
# Build a gensim BOW model including a word embedding
model_BOW = gensim.models.Word2Vec(sentences=playlists_gen,
                               workers = 8,    # number of processors
                               sg = 0,         # 1=skip-gram, 0=CBOW
                               iter=iters      # training iterations - default=5
                              )

# save the model
model_filepath = os.path.join(data_path, 'playlists_BOW.model')
model_BOW.save(model_filepath)

# Save the embedding
kv_filepath = os.path.join(data_path, 'playlists.embedding')
model_BOW.wv.save(kv_filepath)

# about 5 minutes per iteration with 4 processors // 2 min per iteration with 8

HBox(children=(IntProgress(value=0, description='Building Vectors:1/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:2/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:3/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:4/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:5/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:6/6', max=999001, style=ProgressStyle(descri…




In [94]:
# Build a gensim Skip-Gram model including a word embedding
model_SG = gensim.models.Word2Vec(sentences=playlists_gen,
                               workers = 8,    # number of processors
                               sg = 1,         # 1=skip-gram, 0=CBOW
                               iter=iters      # training iterations - default=5
                              )

# save the model
model_filepath = os.path.join(data_path, 'playlists_SG.model')
model_SG.save(model_filepath)

# NOTE: No need to save the embedding again, it is the same as in the BOW model

# about 5 minutes per iteration with 4 processors // 2 min per iteration with 8

HBox(children=(IntProgress(value=0, description='Building Vectors:1/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:2/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:3/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:4/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:5/6', max=999001, style=ProgressStyle(descri…




HBox(children=(IntProgress(value=0, description='Building Vectors:6/6', max=999001, style=ProgressStyle(descri…




### Model Attributes
Now, we have created an embedding and an associated model which follows the BOW approach.

#### wv
> This object essentially contains the mapping between words and embeddings. It can be used directly to query the embeddings in various ways. 

#### vocabulary
> This object represents the vocabulary (sometimes called Dictionary in gensim) of the model. 

## Test the model
Get 10 'words' from the embedding's vocabulary.

In [4]:
# reload saved embedding
kv_filepath = os.path.join(data_path, 'playlists.embedding')
embedding = KeyedVectors.load(kv_filepath, mmap='r')

In [25]:
# display list of random track along with a count of appearances
songids = list(embedding.wv.vocab.keys())
idxs_rnd = [random.randint(0,len(songids)) for x in range(10)]

for i in idxs_rnd:
    print("Track: {}".format(songids[i]))
    print("\t",embedding.wv.vocab[songids[i]])

Track: spotify:track:54Ouh41CWAFULtChxDc2P3
	 Vocab(count:21, index:224838, sample_int:4294967296)
Track: spotify:track:5Q0O7LxPqvce6JoMxHXjvK
	 Vocab(count:6, index:499762, sample_int:4294967296)
Track: spotify:track:7crRxRrHCeD1e2ZzSmWCsl
	 Vocab(count:5, index:561638, sample_int:4294967296)
Track: spotify:track:2G2eB4sPEZGVTx7UdLUSNm
	 Vocab(count:16, index:264870, sample_int:4294967296)
Track: spotify:track:22E817F7gP14umxr9pf1Xn
	 Vocab(count:7, index:472415, sample_int:4294967296)
Track: spotify:track:6R68z0ptyhTSzmqnUY5hhx
	 Vocab(count:8, index:424215, sample_int:4294967296)
Track: spotify:track:3f0fcyzTkduyD7mgosWmu4
	 Vocab(count:8, index:420451, sample_int:4294967296)
Track: spotify:track:1bRatBxvVgdpirSYhHzu5j
	 Vocab(count:8, index:437915, sample_int:4294967296)
Track: spotify:track:2vyGuWMSfyvD8zdEu2YXE1
	 Vocab(count:23, index:210151, sample_int:4294967296)
Track: spotify:track:0cunAi5FhuVTMMIE9XfqfP
	 Vocab(count:224, index:36974, sample_int:4294967296)


In [34]:
songids[0]

'spotify:track:2d7LPtieXdIYzf7yHPooWd'

In [38]:
# Show the vector value for the first word in in the list above
rnd = random.randint(0, len(embedding.wv.vocab))
print(songids[idxs_rnd[0]])
display(embedding.wv.get_vector(songids[idxs_rnd[0]]))

spotify:track:54Ouh41CWAFULtChxDc2P3


memmap([ 0.1272708 ,  0.05054647,  0.165743  , -0.14787427, -0.04797522,
         0.12914099,  0.11278312, -0.06526133,  0.28049994,  0.13390438,
         0.13675198,  0.2215344 , -0.2629792 , -0.45036826,  0.1126852 ,
        -0.21523614, -0.00472977,  0.08885465, -0.20052877, -0.12271424,
        -0.03128362,  0.01141706,  0.00753152, -0.24122466,  0.10402884,
        -0.03048653, -0.11512107, -0.3331953 , -0.04206612,  0.21890008,
        -0.28543273,  0.13576445,  0.31936753, -0.5335075 ,  0.27800167,
         0.448359  ,  0.2260995 , -0.32358688, -0.05028354, -0.05638677,
        -0.07736696, -0.1543951 ,  0.15390296, -0.18399835, -0.3167476 ,
         0.24209599,  0.20233868, -0.19944245, -0.03791945, -0.23465024,
        -0.20722246, -0.14768681,  0.41862273,  0.6273913 ,  0.04115644,
        -0.07902044,  0.37934414, -0.1239025 , -0.42826465, -0.18111324,
        -0.04696696,  0.6651888 , -0.15092582,  0.03998433,  0.07693795,
        -0.05004881, -0.15606925,  0.03041912,  0.2

### Make a Playlist
Find the ID of a specific song of your choice.  Here, we use the Spotify API to get the song we are interested in.

In [39]:
# Get a song as a seed for a test playlist
db_track = display_time(session.query(Playlists.track_name, 
                                Playlists.track_uri,
                                Playlists.artist_uri,
                                Playlists.album_uri).filter(Playlists.track_name=="Free Fallin'").distinct().first)

print("Artist:    {}".format(get_artists([db_track.artist_uri])[0]['name']))
print("Track:     {}".format(db_track.track_name))
print("Track URI: {}".format(db_track.track_uri))

Time to Execute: 0.05 seconds
Setting credentials
token():INFO:   Getting initial token
token():INFO:   Token refreshed
Artist:    Tom Petty
Track:     Free Fallin'
Track URI: spotify:track:5tVA6TkbaAH9QMITTQRrNv


### Use the Word2Vec embedding to find similar songs
Using the Word2Vec function `similar_by_word()`

In [41]:
# Find similar songs 
playlist = np.array(embedding.similar_by_word(db_track.track_uri, topn=10, restrict_vocab=None))
playlist

array([['spotify:track:7gSQv1OHpkIoAdUiRLdmI6', '0.8741670846939087'],
       ['spotify:track:17S4XrLvF5jlGvGCJHgF51', '0.7749687433242798'],
       ['spotify:track:7MooGz4ZPE4bNxjFegR6Jx', '0.7716947793960571'],
       ['spotify:track:5xS9hkTGfxqXyxX6wWWTt4', '0.7716909646987915'],
       ['spotify:track:7MRyJPksH3G2cXHN8UKYzP', '0.7674297094345093'],
       ['spotify:track:43btz2xjMKpcmjkuRsvxyg', '0.7614291906356812'],
       ['spotify:track:67eX1ovaHyVPUinMHeUtIM', '0.7145407199859619'],
       ['spotify:track:2HsjJJL4DhPCzMlnaGv7ap', '0.6921212673187256'],
       ['spotify:track:5RsUlxLto4NZbhJpqJbHfN', '0.6734362244606018'],
       ['spotify:track:5tVA6TkbaAH9QMITTQRrNv\n', '0.672991156578064']],
      dtype='<U37')

In [74]:
# Get the similar songs from Spotify to show their details, including preview link (if available)
sp_playlist = get_tracks(playlist[:,0])

for t in sp_playlist:
    print("Artist       : ", t['artists'][0]['name'])
    print("Track        : ", t['name'])
    print("Track Preview: \n", t['preview_url'] )
    print()

Artist       :  Tom Petty
Track        :  I Won't Back Down
Track Preview:  None

Artist       :  John Mellencamp
Track        :  Jack & Diane
Track Preview:  None

Artist       :  Tom Petty and the Heartbreakers
Track        :  American Girl
Track Preview:  https://p.scdn.co/mp3-preview/36d69a9c5b7a78b378f349e319ca49075993717b?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  Tom Petty and the Heartbreakers
Track        :  Mary Jane's Last Dance
Track Preview:  None

Artist       :  Tom Petty and the Heartbreakers
Track        :  Learning To Fly
Track Preview:  None

Artist       :  Tom Petty
Track        :  You Don't Know How It Feels
Track Preview:  https://p.scdn.co/mp3-preview/920e1367344e020100e499744890d45f0f8729ce?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  John Mellencamp
Track        :  Small Town
Track Preview:  None

Artist       :  John Mellencamp
Track        :  Hurts So Good
Track Preview:  None

Artist       :  Bryan Adams
Track        :  Summer Of '69
Tra

### Results:
Based on the playlists that were supplied when buidling the embedding, the list of Top 10 most similar songs is presented above.  Several other Tom Petty songs appeared as well as other artists that are similar in genre, attitude, etc.

This playlist is based on the top songs that other users placed in playlists that include the song we selected as our seed.  This is a good example of 'collaborative filtering' as it uses preferences from others to recommend songs.

<a name='2'></a>
## 2. Embeddings from Playlists - Song ID - BOW
<a href=#index>back to index</a>

Now, let's use the model that was created when we initially established our embedding.  The BOW model can be used to predict a song from a song or a list of supplied songs.

Let's use the seed song "Free Fallin'" again and see what is returned.

In [44]:
model_filepath = os.path.join(data_path, 'playlists_BOW.model')
BOW_model = Word2Vec.load(model_filepath)

In [91]:
recommended_songs = np.array(BOW_model.predict_output_word([db_track.track_uri], topn=10))
sp_tracks = get_tracks(recommended_songs[:,0])

for track in sp_tracks:
    print("\nArtist: {}".format(track['artists'][0]['name']))
    print("Track Name: {}".format(track['name']))
    print("Preview: {}".format(track['preview_url']))


Artist: Tom Petty
Track Name: I Won't Back Down
Preview: None

Artist: Tom Petty
Track Name: Free Fallin'
Preview: None

Artist: Tom Petty
Track Name: You Don't Know How It Feels
Preview: https://p.scdn.co/mp3-preview/920e1367344e020100e499744890d45f0f8729ce?cid=72413f75d4db4ec79c6caaf02523959e

Artist: Tom Petty and the Heartbreakers
Track Name: Learning To Fly
Preview: None

Artist: Tom Petty
Track Name: Wildflowers
Preview: https://p.scdn.co/mp3-preview/4f8fc91bd4a98e7c27863b08aa24a57ffaf654c1?cid=72413f75d4db4ec79c6caaf02523959e

Artist: Tom Petty and the Heartbreakers
Track Name: American Girl
Preview: https://p.scdn.co/mp3-preview/36d69a9c5b7a78b378f349e319ca49075993717b?cid=72413f75d4db4ec79c6caaf02523959e

Artist: Tom Petty and the Heartbreakers
Track Name: Mary Jane's Last Dance
Preview: None

Artist: Tom Petty
Track Name: Runnin' Down A Dream
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: Refugee
Preview: None

Artist: Tom Petty and the Heartbreakers
Trac

### Result:
Most of the songs are the same, but the BOW model only returned songs from Tom Petty where our basic embedding-only apprach found other artists as well.

<a name='3'></a>
## 3. Embeddings from Playlists - Song ID - Skip-Gram
<a href=#index>back to index</a>

Here, we create a playlist from the Skip-Gram model.  For consistency, we will use the same song, "Free Fallin'", again to see if our results differ.

In [95]:
# load the skip-gram model we previously created
model_filepath = os.path.join(data_path, 'playlists_SG.model')
SG_model = Word2Vec.load(model_filepath)

In [96]:
recommended_songs = np.array(SG_model.predict_output_word([db_track.track_uri], topn=10))
sp_tracks = get_tracks(recommended_songs[:,0])

for track in sp_tracks:
    print("\nArtist: {}".format(track['artists'][0]['name']))
    print("Track Name: {}".format(track['name']))
    print("Preview: {}".format(track['preview_url']))


Artist: Tom Petty
Track Name: Free Fallin'
Preview: None

Artist: Tom Petty
Track Name: I Won't Back Down
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: Learning To Fly
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: Mary Jane's Last Dance
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: Into The Great Wide Open
Preview: None

Artist: Tom Petty
Track Name: You Don't Know How It Feels
Preview: https://p.scdn.co/mp3-preview/920e1367344e020100e499744890d45f0f8729ce?cid=72413f75d4db4ec79c6caaf02523959e

Artist: Tom Petty
Track Name: Runnin' Down A Dream
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: American Girl
Preview: https://p.scdn.co/mp3-preview/36d69a9c5b7a78b378f349e319ca49075993717b?cid=72413f75d4db4ec79c6caaf02523959e

Artist: John Mellencamp
Track Name: Jack & Diane
Preview: None

Artist: Tom Petty and the Heartbreakers
Track Name: Refugee
Preview: None


### Result:
This list looks much like our original playlist based on the embeddings, but it isn't the same.  Here, different artists are suggested.

<a name='4'></a>
## 4. Embeddings from Song Features - Unsupervised
<a href=#index>back to index</a>

Here, we can take a break from Word2Vec and get very basic.  We create our own vectors based on Spotify accoustive feature data.  We have a series of fields available for all of our songs that numerically represent various characteristics of the songs; dancebaility, loudness, temp, key, energy, etc.

In [97]:
# fetch all db tracks
db_tracks = display_time(session.query(Tracks).all)
session.close()

Time to Execute: 75.28 seconds


In [98]:
# create a Pandas dataframe
df_all_tracks = pd.DataFrame([x.__dict__ for x in db_tracks]).drop('_sa_instance_state', axis=1).set_index(['track_uri'])
df_all_tracks.head()

Unnamed: 0_level_0,loudness,energy,artist_uri,track_popularity,duration_ms,tempo,liveness,acousticness,mode,key,danceability,time_signature,valence,instrumentalness,speechiness
track_uri,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
spotify:track:2d7LPtieXdIYzf7yHPooWd,-9.649,0.157,spotify:artist:0MeLMJJcouYXCymQSHPn8g,65,242564,108.13,0.0816,0.974,1,11,0.467,4,0.277,1e-06,0.0336
spotify:track:0y4TKcc7p2H6P0GJlt01EI,-13.367,0.207,spotify:artist:7w0qj2HiAPIeUcoPogvOZ6,36,253933,93.778,0.0773,0.961,1,10,0.312,4,0.278,0.00818,0.0347
spotify:track:6q4c1vPRZREh7nw3wG7Ixz,-14.214,0.159,spotify:artist:32ogthv0BdaSMPml02X9YB,54,103920,85.462,0.083,0.991,1,9,0.412,4,0.0389,0.772,0.0278
spotify:track:54KFQB6N4pn926IUUYZGzK,-15.399,0.122,spotify:artist:32ogthv0BdaSMPml02X9YB,72,371320,148.658,0.094,0.885,1,9,0.264,4,0.0735,0.349,0.0349
spotify:track:0NeJjNlprGfZpeX2LQuN6c,-10.866,0.179,spotify:artist:3qnGvpP8Yth1AqSBMqON5x,75,238560,128.128,0.17,0.689,1,8,0.658,4,0.191,0.0,0.0448


In [99]:
# define features that we will use to create our custom vectors
vector_features= [
    'acousticness',
    'danceability',
    'duration_ms',
    'energy',
    'instrumentalness',
    'key',
    'liveness',
    'loudness',
    'mode',
    'speechiness',
    'tempo',
    'time_signature',
    'valence'
]

In [100]:
# Create a dataframe with the vectors for simplicity
drop_cols = set(df_all_tracks.columns) - set(vector_features)
df = df_all_tracks.drop(drop_cols, axis=1)
df.head()

Unnamed: 0_level_0,loudness,energy,duration_ms,tempo,liveness,acousticness,mode,key,danceability,time_signature,valence,instrumentalness,speechiness
track_uri,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
spotify:track:2d7LPtieXdIYzf7yHPooWd,-9.649,0.157,242564,108.13,0.0816,0.974,1,11,0.467,4,0.277,1e-06,0.0336
spotify:track:0y4TKcc7p2H6P0GJlt01EI,-13.367,0.207,253933,93.778,0.0773,0.961,1,10,0.312,4,0.278,0.00818,0.0347
spotify:track:6q4c1vPRZREh7nw3wG7Ixz,-14.214,0.159,103920,85.462,0.083,0.991,1,9,0.412,4,0.0389,0.772,0.0278
spotify:track:54KFQB6N4pn926IUUYZGzK,-15.399,0.122,371320,148.658,0.094,0.885,1,9,0.264,4,0.0735,0.349,0.0349
spotify:track:0NeJjNlprGfZpeX2LQuN6c,-10.866,0.179,238560,128.128,0.17,0.689,1,8,0.658,4,0.191,0.0,0.0448


In [101]:
# create a KeyedVectors object from the Word2Vec library
# This will allow us to use the built-in Word2Vec functions
accoustic_vectors = KeyedVectors(len(vector_features))

# weights are the vectors for each track
weights = np.array(df)

# entities are the trackuris
entities = np.array(df.index)

# add the vectors to the dataset
accoustic_vectors.add(entities, weights)

In [116]:
seed_uri = db_track.track_uri
playlist = np.array(accoustic_vectors.most_similar(seed_uri, topn=10))[:,0]
# playlist = np.array(accoustic_vectors.similar_by_word(seed_uri, topn=10, restrict_vocab=None))[:,0]
seed_track = get_tracks([seed_uri])[0]
sp_playlist = get_tracks(playlist)

In [117]:
print("Playlist Seed:")
print("\tArtist       : ", seed_track['artists'][0]['name'])
print("\tTrack        : ", seed_track['name'])
print("\tTrack Preview: ", seed_track['preview_url'] )
print()
for t in sp_playlist:
    print("Artist       : ", t['artists'][0]['name'])
    print("Track        : ", t['name'])
    print("Track Preview: ", t['preview_url'] )
    print() 

Playlist Seed:
	Artist       :  Tom Petty
	Track        :  Free Fallin'
	Track Preview:  None

Artist       :  Grateful Dead
Track        :  I Second That Emotion (Jerry Garcia/Bob Weir David Letterman Show October 1989)
Track Preview:  https://p.scdn.co/mp3-preview/e4fd06bbdd507c528979040c4dc4250118336232?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  Orup
Track        :  Stockholm
Track Preview:  https://p.scdn.co/mp3-preview/0c3ba81e7c01001afc2b96b3552635950e67f7ab?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  General Midi
Track        :  Milton
Track Preview:  https://p.scdn.co/mp3-preview/226b45f8a48d27590c3b19883c6ba82760f28340?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  Steelism
Track        :  Lewis & Clark
Track Preview:  None

Artist       :  Denny Schneidemesser
Track        :  Far Beyond The Stars
Track Preview:  https://p.scdn.co/mp3-preview/528371a04ce56449d46df8ae1b6a03c6ab5276ea?cid=72413f75d4db4ec79c6caaf02523959e

Artist       :  Alex Vans
Tra

### Result:
This does not look so great.  It looks nothing like our previous lists and the songs are not as similar to the seed song as I would like.  What happened?

I can tell you what happened.  This approach takes independent songs and creates vectors from the songs features.  These vectors have nothing to do with the playlists and have no other relationship to other vectors other than their mathmatical distance.  Why doesn't this work?  Because 2 totally different songs can have a combination of feature values that result in the same/similar vector magnitude but not have any commonality other than that.  

This approach is more like the 'content-based recommender' approach described in the intoduction.  