# Spotify Recommender Model

Using Word2Vec, a variety of different recommender model options are explored.

1. **Content-Based**
> Predicts based on what a user has listened to in the past.
> Uses features of songs to find similar songs.

2. **Collarbotive**
> Predicts based on what other listeners like
> Focuses on what songs other users liked who also liked a chosen song. 


## Word2Vec
In both types of recommender models, a 'vectorized' representation of a song is used to find similar songs.  For a given song, we find other songs that look similar.  With collarborative recommendations we need to take another step.  A song's vector value isn't deetrmined by the song properties, but rather its appearnace in playlists.

For both models, the selection approach is similar, we find others songs with similar vector values, but the vector values are calculated differently in each model.

References:

https://www.analyticsvidhya.com/blog/2019/07/how-to-build-recommendation-system-word2vec-python/

https://towardsdatascience.com/using-word2vec-for-music-recommendations-bb9649ac2484

https://towardsdatascience.com/how-to-build-a-simple-song-recommender-296fcbc8c85



In [2]:
# Basic Imports
import warnings;
warnings.filterwarnings('ignore')

import os
import sys
import pandas as pd
import numpy as np
import seaborn as sns
import time
import random
import matplotlib.pyplot as plt
%matplotlib inline


from gensim.models import Word2Vec
from gensim import utils
import gensim.models


In [3]:
# For the Spotify Dataset
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, Float, MetaData, and_, or_, func
from sqlalchemy import create_engine
import sqlite3
from sqlalchemy.orm import sessionmaker
from sqlalchemy import exc

sys.path.append('../../')
from spotify_api import get_spotify_data, get_tracks, get_artists, get_audiofeatures
from spotify_database import get_session, display_time
from spotify_utils import Table_Generator, List_Generator, pickle_load, pickle_save

In [4]:
# !pip install ipywidgets 
# !jupyter nbextension enable --py widgetsnbextension
# !jupyter labextension install @jupyter-widgets/jupyterlab-manager

# %%capture
from tqdm import tqdm_notebook as tqdm

In [5]:
data_path = '../../data/SpotifyDataSet'
db_path = '../../data/SpotifyDataSet/spotify_songs.db'

# Get sesion
session = get_session(db_path)
engine = create_engine('sqlite:///' + db_path)

# Get Songs class
Playlists = getattr(get_session, "Playlists")
Artists = getattr(get_session, "Artists")
Tracks = getattr(get_session, "Tracks")

## Building Vectors for Spotify Songs

In [205]:
# Iterator that yields playlists
class Playlist_URIs(object):
    """
    Playlist generator that yileds the track uris in a playlist.
    Yields one playlist at a time.
    """
    def __init__(self,
                 filename:str='/Volumes/DataHub/SpotifyDataSet/playlist_tracks.csv', 
                 name:str=None,
                 iters:int=None):
        self.filename     = filename
        self.length       = len(open(self.filename).readlines())
        self.name         = name
        self.count        = 0
        self.iters        = iters
        print("Creating Playlist Track Listing Generator:")
        print("\tlength     : ", self.length)
    
    def __iter__(self):
        
        self.count += 1
        progbar = tqdm(total=self.length, desc="{}:{}/{}".format(self.name, self.count, self.iters))
        
        # yield a list of lists; sentences = [["cat", "say", "meow"], ["dog", "say", "woof"]]
        for line in open(self.filename, 'r'):
            progbar.update(1)
            yield line.split(',')[1].split(' ') # Works!
            
        progbar.close()    
        

In [207]:
!pwd

/Users/markmcdonald/Library/Mobile Documents/com~apple~CloudDocs/Harvard/CSCI_E109a/Homework/project/CS109a_finalproject_group20/mark/model


In [209]:
corpus_file = 'playlist_tracks.csv'
corpus_filepath = os.path.join(data_path, corpus_file)

iters = 5
playlists_gen = Playlist_URIs(filename=corpus_filepath,
                              name="Building Vectors",
                              iters=iters) 

model = gensim.models.Word2Vec(sentences=playlists_gen,
                               workers = 8,    # number of processors
                               sg = 0,         # 1=skip-gram, 0=CBOW
                               iter=iters      # training iterations - default=5
                              )

# about 5 minutes per iteration with 4 processors // 2 min per iteration with 8

Creating Playlist Track Listing Generator:
	length     :  999001


HBox(children=(IntProgress(value=0, description='Building Vectors:1/5', max=999001, style=ProgressStyle(descri…

HBox(children=(IntProgress(value=0, description='Building Vectors:2/5', max=999001, style=ProgressStyle(descri…

HBox(children=(IntProgress(value=0, description='Building Vectors:3/5', max=999001, style=ProgressStyle(descri…

HBox(children=(IntProgress(value=0, description='Building Vectors:4/5', max=999001, style=ProgressStyle(descri…

HBox(children=(IntProgress(value=0, description='Building Vectors:5/5', max=999001, style=ProgressStyle(descri…

HBox(children=(IntProgress(value=0, description='Building Vectors:6/5', max=999001, style=ProgressStyle(descri…

#### Test the model

In [210]:
for i, word in enumerate(model.wv.vocab):
    if i == 20:
        break
    print(word)

spotify:track:2d7LPtieXdIYzf7yHPooWd
spotify:track:0y4TKcc7p2H6P0GJlt01EI
spotify:track:6q4c1vPRZREh7nw3wG7Ixz
spotify:track:54KFQB6N4pn926IUUYZGzK
spotify:track:0NeJjNlprGfZpeX2LQuN6c
spotify:track:2kuFVY6hWX6yavTiWHE3SQ
spotify:track:66mmvchQ4C3LnPzq4DiAI3
spotify:track:4gFxywaJejXWxo0NjlWzgg
spotify:track:6wQSrFnJYm3evLsavFeCVT
spotify:track:3ZjnFYlal0fXN6t61wdxhl
spotify:track:617EQMgzYFe2THz093j68m
spotify:track:6Hki3HcbeU2c4T72lJjyZ5
spotify:track:5j9iuo3tMmQIfnEEQOOjxh
spotify:track:03gNt9LmzZOZcnjKFKrFMl
spotify:track:163F4SPkExGwITk7VKz5sp
spotify:track:6zQyu8L8yUuJkl6LbQ6iKU
spotify:track:47yVYUPLE1K5QhAS4NX1Uo
spotify:track:3KIIwkf6lNwJqLcx6GUIzr
spotify:track:62tW8h5l2UrgqvdbhZaXL1
spotify:track:5R22QnkOWbSVSWmV1DT8iz


In [None]:
# you can save the whole model - not necessary - will be large and includes things we don't need
# model.save("playlists_1.model")
# model = Word2Vec.load("playlists_1.model")

In [9]:
# you only needs the vectors - so avoid saving the whole model - only save vectors
from gensim.models import KeyedVectors
kv_filepath = os.path.join("../../data/SpotifyDataset", 'playlists_BOW1')
# model.wv.save(kv_filepath)
model_v = KeyedVectors.load(kv_filepath, mmap='r')


In [10]:
type(model_v)

gensim.models.keyedvectors.Word2VecKeyedVectors

### Model Attributes
#### wv
> This object essentially contains the mapping between words and embeddings. After training, it can be used directly to query those embeddings in various ways. See the module level docstring for examples.

#### vocabulary
> This object represents the vocabulary (sometimes called Dictionary in gensim) of the model. Besides keeping track of all unique words, this object provides extra functionality, such as constructing a huffman tree (frequent words are closer to the root), or discarding extremely rare words.

In [230]:
rv = display_time(session.query(Playlists.track_name, 
                                Playlists.track_uri,
                                Playlists.artist_uri,
                                Playlists.album_uri).filter(Playlists.track_name=="Free Fallin'").distinct().first)


Time to Execute: 0.0 seconds


In [231]:
get_artists([rv.artist_uri])[0]['name']


'Tom Petty'

In [234]:
playlist = np.array(model_v.similar_by_word(rv.track_uri, topn=10, restrict_vocab=None))[:,0]
sp_playlist = get_tracks(playlist)

ERR: Unable to access Spotify data:  ERR:400


In [235]:
for t in sp_playlist:
    print("Artist       : ", t['artists'][0]['name'])
    print("Track        : ", t['name'])
    print("Track Preview: ", t['preview_url'] )
    print()

TypeError: 'bool' object is not iterable