This notebook explores two ways to make song recommendations:
1. Using spotify labels such as 'acousticness', 'tempo'
2. Using lyrics of songs

Using the spotify labels alone doesn't seem to result in very good recommendations.

However, using song lyrics and LDA, we get some better results.

TODO: Explore song lyrics in conjunction with images of sound waves.

In [1]:
import itertools
import os

import lyricsgenius as lg
import numpy as np
import pandas as pd
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer#TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler

from spotify import SpotifyService
from record_collection import RecordCollection
import utils

pd.set_option('display.max_columns', 50)

In [2]:
spotifyService = SpotifyService(scope='user-read-recently-played')
recordCollection = RecordCollection()
recordCollection.build_collection(spotifyService)

In [3]:
tracks = recordCollection.tracks

In [4]:
len(tracks)

713

In [5]:
_audio_features = spotifyService.get_audio_features(tracks)

In [6]:
def get_track_name_and_artist(track_uri):
    r = spotifyService.spotipyClient.track(track_uri)
    return r['name'], r['album']['artists'][0]['name']

In [7]:
audio_features_df = pd.DataFrame.from_dict(_audio_features, orient='index')
audio_features_df['track_name'], audio_features_df['artist'] = zip(*audio_features_df.reset_index()['index'].map(get_track_name_and_artist))
audio_features_df.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,artist
spotify:track:3tn5eJxxrG6RmKQcxRaCXG,0.318,0.646,5,-7.15,0,0.0392,0.00521,0.172,0.0679,0.172,125.986,audio_features,3tn5eJxxrG6RmKQcxRaCXG,spotify:track:3tn5eJxxrG6RmKQcxRaCXG,https://api.spotify.com/v1/tracks/3tn5eJxxrG6R...,https://api.spotify.com/v1/audio-analysis/3tn5...,382800,4,Who's Your Money On? (Plastic House),Inhaler
spotify:track:2sJHJITMt2SYuim7KvK8yG,0.602,0.849,9,-7.033,1,0.033,0.017,0.000166,0.185,0.69,114.974,audio_features,2sJHJITMt2SYuim7KvK8yG,spotify:track:2sJHJITMt2SYuim7KvK8yG,https://api.spotify.com/v1/tracks/2sJHJITMt2SY...,https://api.spotify.com/v1/audio-analysis/2sJH...,223897,4,Ice Cream Sundae,Inhaler


In [8]:
key_categories = pd.get_dummies(audio_features_df['key'].map(str), prefix='key', drop_first=True)
time_signature_categories = pd.get_dummies(audio_features_df['time_signature'], prefix='time_signature', drop_first=True)

audio_features_df_encoded = pd.concat([audio_features_df, key_categories, time_signature_categories], axis=1)
audio_features_df_encoded.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,artist,key_1,key_10,key_11,key_2,key_3,key_4,key_5,key_6,key_7,key_8,key_9,time_signature_3,time_signature_4,time_signature_5
spotify:track:3tn5eJxxrG6RmKQcxRaCXG,0.318,0.646,5,-7.15,0,0.0392,0.00521,0.172,0.0679,0.172,125.986,audio_features,3tn5eJxxrG6RmKQcxRaCXG,spotify:track:3tn5eJxxrG6RmKQcxRaCXG,https://api.spotify.com/v1/tracks/3tn5eJxxrG6R...,https://api.spotify.com/v1/audio-analysis/3tn5...,382800,4,Who's Your Money On? (Plastic House),Inhaler,0,0,0,0,0,0,1,0,0,0,0,0,1,0
spotify:track:2sJHJITMt2SYuim7KvK8yG,0.602,0.849,9,-7.033,1,0.033,0.017,0.000166,0.185,0.69,114.974,audio_features,2sJHJITMt2SYuim7KvK8yG,spotify:track:2sJHJITMt2SYuim7KvK8yG,https://api.spotify.com/v1/tracks/2sJHJITMt2SY...,https://api.spotify.com/v1/audio-analysis/2sJH...,223897,4,Ice Cream Sundae,Inhaler,0,0,0,0,0,0,0,0,0,0,1,0,1,0


### 1. Get similar songs using spotify labels

In [9]:
AUDIO_FEATURES = list(itertools.chain(['danceability', 'energy', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo'], key_categories.columns.tolist(), time_signature_categories.columns.tolist()))

X, y, z = audio_features_df_encoded[AUDIO_FEATURES], audio_features_df_encoded['track_name'], audio_features_df_encoded['artist']

In [10]:
standardScaler = StandardScaler()
X_scaled = standardScaler.fit_transform(X)

In [11]:
similarity_matrix = cosine_similarity(X_scaled)

In [15]:
utils.get_top_n_similar_tracks(y, z, similarity_matrix, "80s Mercedes", "Maren Morris", 5)

[('Brave Face', 'Frank Turner'),
 ('Never Change', 'Picture This'),
 ('My Church', 'Maren Morris'),
 ('Ice Cream Sundae', 'Inhaler'),
 ('Violet', 'Sea Girls')]

### 2. Get similar songs using lyrics

In [12]:
utils.get_song_lyrics(title="Californication", artist="Red Hot Chilli Peppers")[:10]

Searching for "Californication" by Red Hot Chilli Peppers...
Done.


['californication',
 'psychic',
 'spy',
 'china',
 'try',
 'steal',
 'mind',
 'elation',
 'little',
 'girl']

In [13]:
TRACK_NAMES = audio_features_df_encoded['track_name'].values
ARTISTS = audio_features_df_encoded['artist'].values

In [14]:
%%time
lyrics_df = utils.build_lyrics_df(TRACK_NAMES[:10], ARTISTS[:10])

Searching for "Who's Your Money On? (Plastic House)" by Inhaler...
Done.
Searching for "Ice Cream Sundae" by Inhaler...
Done.
Searching for "Habits" by Plested...
Done.
Searching for "Nervous" by Maren Morris...
Done.
Searching for "Feathered Indians" by Tyler Childers...
Done.
Searching for "Zuckerpunch" by The Snuts...
Done.
Searching for "Don't Leave Me This Way (feat. Teddy Pendergrass)" by Harold Melvin & The Blue Notes...
Done.
Searching for "To Lose My Life" by White Lies...
Done.
Searching for "Tokyo" by White Lies...
Done.
Searching for "Molly" by Jordy Searcy...
Done.
Wall time: 10.1 s


#### CountVectorizer or TfidfVectorizer

In [15]:
clean_corpus = lyrics_df['lyrics'].values.tolist()

In [17]:
cv = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False)
corpus_transformed = cv.fit_transform(clean_corpus)
vocab = cv.get_feature_names()
corpus_transformed.shape

(10, 389)

#### LDA

In [18]:
lda_model = LatentDirichletAllocation(n_components=5, max_iter=20)
topics = lda_model.fit_transform(corpus_transformed)

In [19]:
utils.get_top_n_topic_words(lda_model, vocab, 10)

Topic 1: ['leave' 'way' 'love' 'baby' 'tell' 'house' 'stay' 'money' 'come']
Topic 2: ['let' 'time' 'die' 'grow' 'old' 'say' 'lose' 'nervous' 'love']
Topic 3: ['going' 'to' 'break' 'easy' 'ice' 'cream' 'habit' 'sundae' 'get']
Topic 4: ['find' 'right' 'wrong' 'new' 'tokyo' 'love' 'york' 'different' 'jupiter']
Topic 5: ['molly' 'see' 'think' 'bad' 'wish' 'feel' 'want' 'heart' 'ask']


#### Similarity

In [20]:
sim_matrix = cosine_similarity(topics)

In [21]:
sim_matrix.shape

(10, 10)

In [22]:
# helpers
#artists = lyrics_df['artist'].values
#track_names = lyrics_df['track_name'].values
track_artist_idx = dict((y,x) for x,y in dict(enumerate(zip(TRACK_NAMES, ARTISTS))).items())

In [25]:
utils.get_top_n_similar_tracks(TRACK_NAMES[:10], ARTISTS[:10], sim_matrix, "Ice Cream Sundae", "Inhaler", 5)

[('Habits', 'Plested'),
 ('Zuckerpunch', 'The Snuts'),
 ('Nervous', 'Maren Morris'),
 ("Who's Your Money On? (Plastic House)", 'Inhaler'),
 ('Molly', 'Jordy Searcy')]

In [27]:
GENIUS_ACCESS_TOKEN = os.environ.get('GENIUS_ACCESS_TOKEN')
genius = lg.Genius(GENIUS_ACCESS_TOKEN,
                   skip_non_songs=True, 
                   excluded_terms=["(Remix)", "(Live)"], 
                   remove_section_headers=True)

In [28]:
song = genius.search_song("Zuckerpunch", 'The Snuts')
print(song.lyrics)

Searching for "Zuckerpunch" by The Snuts...
Done.
Zuckerpunch Lyrics
Robotic, hypnotic, Amazonian shopping
For the masses, and your cache says you need a little more plastic
In your life, bro, is that right, no?
All your clothes are from sweatshops
And their savour is your labour, social handcuff enslavement
Your search history is no mystery, rim-shot, kick, clap, hold
From the South Beach to the cold East, the Middle East and Rome
It's "Tweet this, snap that," all the things that make you go "Hmm"
A Zuckerpunch and a cold ass lunch, don't speak 'til you've been told

So walk with me down memory lane
When the phone in your pocket only had a few games
I said, "Yes, Mister, please help me beliеve, don't patronise, no"
Is this clickbait my new safе space? Sit back and do what you're told

So walk with me down memory lane
When the phone in your pocket only had a few games
One snake, two snakes, now it's your brain
Throw me the keys to your cuffs, and then I'll be on my way
Walk with me dow