This notebook explores two ways to make song recommendations:
1. Using spotify labels such as 'acousticness', 'tempo'
2. Using lyrics of songs

Using the spotify labels alone doesn't seem to result in very good recommendations.

However, using song lyrics and LDA, we get some better results.

TODO: Explore song lyrics in conjunction with images of sound waves.

In [51]:
import itertools
import os

import lyricsgenius as lg
import numpy as np
import pandas as pd
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer#TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler

from spotify import SpotifyService
from record_collection import RecordCollection
import utils

pd.set_option('display.max_columns', 50)

In [2]:
spotifyService = SpotifyService(scope='user-read-recently-played')
recordCollection = RecordCollection(spotifyService)

In [3]:
tracks = recordCollection.tracks
tracks[1]

Track(track_id='2sJHJITMt2SYuim7KvK8yG', track_uri='spotify:track:2sJHJITMt2SYuim7KvK8yG', track_name='Ice Cream Sundae', artist_name='Inhaler', artist_id='6lyMYewq2SuTFIXgiv7OxH', artist_uri='spotify:artist:6lyMYewq2SuTFIXgiv7OxH', popularity=45, duration_ms=223896)

In [4]:
len(tracks)

712

In [5]:
_audio_features = spotifyService.get_audio_features(tracks)

In [6]:
def get_track_name_and_artist(track_uri):
    r = spotifyService.spotipyClient.track(track_uri)
    return r['name'], r['album']['artists'][0]['name']

In [7]:
audio_features_df = pd.DataFrame.from_dict(_audio_features, orient='index')
audio_features_df['track_name'], audio_features_df['artist'] = zip(*audio_features_df.reset_index()['index'].map(get_track_name_and_artist))
audio_features_df.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,artist
spotify:track:3tn5eJxxrG6RmKQcxRaCXG,0.318,0.646,5,-7.15,0,0.0392,0.00521,0.172,0.0679,0.172,125.986,audio_features,3tn5eJxxrG6RmKQcxRaCXG,spotify:track:3tn5eJxxrG6RmKQcxRaCXG,https://api.spotify.com/v1/tracks/3tn5eJxxrG6R...,https://api.spotify.com/v1/audio-analysis/3tn5...,382800,4,Who's Your Money On? (Plastic House),Inhaler
spotify:track:2sJHJITMt2SYuim7KvK8yG,0.602,0.849,9,-7.033,1,0.033,0.017,0.000166,0.185,0.69,114.974,audio_features,2sJHJITMt2SYuim7KvK8yG,spotify:track:2sJHJITMt2SYuim7KvK8yG,https://api.spotify.com/v1/tracks/2sJHJITMt2SY...,https://api.spotify.com/v1/audio-analysis/2sJH...,223897,4,Ice Cream Sundae,Inhaler


In [8]:
key_categories = pd.get_dummies(audio_features_df['key'].map(str), prefix='key', drop_first=True)
time_signature_categories = pd.get_dummies(audio_features_df['time_signature'], prefix='time_signature', drop_first=True)

audio_features_df_encoded = pd.concat([audio_features_df, key_categories, time_signature_categories], axis=1)
audio_features_df_encoded.head(2)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,track_name,artist,key_1,key_10,key_11,key_2,key_3,key_4,key_5,key_6,key_7,key_8,key_9,time_signature_3,time_signature_4,time_signature_5
spotify:track:3tn5eJxxrG6RmKQcxRaCXG,0.318,0.646,5,-7.15,0,0.0392,0.00521,0.172,0.0679,0.172,125.986,audio_features,3tn5eJxxrG6RmKQcxRaCXG,spotify:track:3tn5eJxxrG6RmKQcxRaCXG,https://api.spotify.com/v1/tracks/3tn5eJxxrG6R...,https://api.spotify.com/v1/audio-analysis/3tn5...,382800,4,Who's Your Money On? (Plastic House),Inhaler,0,0,0,0,0,0,1,0,0,0,0,0,1,0
spotify:track:2sJHJITMt2SYuim7KvK8yG,0.602,0.849,9,-7.033,1,0.033,0.017,0.000166,0.185,0.69,114.974,audio_features,2sJHJITMt2SYuim7KvK8yG,spotify:track:2sJHJITMt2SYuim7KvK8yG,https://api.spotify.com/v1/tracks/2sJHJITMt2SY...,https://api.spotify.com/v1/audio-analysis/2sJH...,223897,4,Ice Cream Sundae,Inhaler,0,0,0,0,0,0,0,0,0,0,1,0,1,0


### 1. Get similar songs using spotify labels

In [9]:
AUDIO_FEATURES = list(itertools.chain(['danceability', 'energy', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo'], key_categories.columns.tolist(), time_signature_categories.columns.tolist()))

X, y, z = audio_features_df_encoded[AUDIO_FEATURES], audio_features_df_encoded['track_name'], audio_features_df_encoded['artist']

In [10]:
standardScaler = StandardScaler()
X_scaled = standardScaler.fit_transform(X)

In [11]:
similarity_matrix = cosine_similarity(X_scaled)

In [15]:
utils.get_top_n_similar_tracks(y, z, similarity_matrix, "80s Mercedes", "Maren Morris", 5)

[('Brave Face', 'Frank Turner'),
 ('Never Change', 'Picture This'),
 ('My Church', 'Maren Morris'),
 ('Ice Cream Sundae', 'Inhaler'),
 ('Violet', 'Sea Girls')]

### 2. Get similar songs using lyrics

In [49]:
reload(utils)

<module 'utils' from 'C:\\Users\\caitr\\coding\\spotifcation\\utils.py'>

In [42]:
utils.get_song_lyrics(title="Californication", artist="Red Hot Chilli Peppers")[:10]

Searching for "Californication" by Red Hot Chilli Peppers...
Done.


['californication',
 'psychic',
 'spy',
 'china',
 'try',
 'steal',
 'mind',
 'elation',
 'little',
 'girl']

In [43]:
TRACK_NAMES = audio_features_df_encoded['track_name'].values
ARTISTS = audio_features_df_encoded['artist'].values

In [50]:
%%time
lyrics_df = utils.build_lyrics_df(TRACK_NAMES, ARTISTS)

Searching for "Who's Your Money On? (Plastic House)" by Inhaler...
Done.
Searching for "Ice Cream Sundae" by Inhaler...
Done.
Searching for "Habits" by Plested...
Done.
Searching for "Nervous" by Maren Morris...
Done.
Searching for "Feathered Indians" by Tyler Childers...
Done.
Searching for "Zuckerpunch" by The Snuts...
Done.
Searching for "Don't Leave Me This Way (feat. Teddy Pendergrass)" by Harold Melvin & The Blue Notes...
Done.
Searching for "To Lose My Life" by White Lies...
Done.
Searching for "Tokyo" by White Lies...
Done.
Searching for "Molly" by Jordy Searcy...
Done.
Searching for "Seventeen Going Under" by Sam Fender...
Done.
Searching for "Rabbit Hole" by Jake Bugg...
Done.
Searching for "Rock 'n' Roll Raver" by NOISY...
Done.
Searching for "Run Away to Mars" by TALK...
Done.
Searching for "Happier Than Ever" by Billie Eilish...
Done.
Searching for "1963" by Catching Mangoes...
Done.
Searching for "Black Horse And The Cherry Tree" by KT Tunstall...
Done.
Searching for "Ice

#### CountVectorizer or TfidfVectorizer

In [52]:
clean_corpus = lyrics_df['lyrics'].values.tolist()

In [53]:
cv = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False)
corpus_transformed = cv.fit_transform(clean_corpus)
vocab = cv.get_feature_names()
corpus_transformed.shape

(701, 36478)

#### LDA

In [64]:
lda_model = LatentDirichletAllocation(n_components=5, max_iter=20)
topics = lda_model.fit_transform(corpus_transformed)

In [65]:
reload(utils)

<module 'utils' from 'C:\\Users\\caitr\\coding\\spotifcation\\utils.py'>

In [67]:
utils.get_top_n_topic_words(lda_model, vocab, 10)

Topic 1: ['like' 'know' 'come' 'time' 'people' 'go' 'day' 'life' 'thing']
Topic 2: ['\t' 'tony' 'jordan' '\t\t' 'look' 'manny' 'like' 'day' 'get']
Topic 3: ['oh' 'know' 'love' 'like' 'get' 'let' 'yeah' 'time' 'wanna']
Topic 4: ['man' 'craig' 'look' 'like' 'bloom' 'vincent' 'know' 'hand' 'venkman']
Topic 5: ['2018' 'feat' '2017' '🇪' '🇮' 'm.' 'new' 'january' 'ep']


#### Similarity

In [68]:
sim_matrix = cosine_similarity(topics)

In [69]:
sim_matrix.shape

(701, 701)

In [72]:
# helpers
#artists = lyrics_df['artist'].values
#track_names = lyrics_df['track_name'].values
track_artist_idx = dict((y,x) for x,y in dict(enumerate(zip(TRACK_NAMES, ARTISTS))).items())

In [75]:
utils.get_top_n_similar_tracks(TRACK_NAMES, ARTISTS, sim_matrix, "Let Me Hold You", "Netsky", 5)

[('Fire In Bone', 'The Killers'),
 ("I'll Never Love Again - Film Version", 'Lady Gaga'),
 ('Holy Water', 'Freya Ridings'),
 ('Gloria', 'Laura Branigan'),
 ('Life In The City', 'The Lumineers')]

In [76]:
song = genius.search_song("Let Me Hold You", 'Netsky')
print(song.lyrics)

Searching for "Let Me Hold You" by Netsky...
Done.
Let Me Hold You Lyrics
How you doin'? How you been?
Life got heavy, didn't it?
Don't be shy, speak your mind
I'm not perfect, nor are you
But we both got things we been through
So I'll do all I can, I'll try to understand
'Cause I wanna love strangers like neighbours
I wanna be unafraid
So when the tears roll like waterfalls
You can call my name

And when the weight of the world's on your shoulders
I will hold ya, I will hold you
And when the nights only seem to get colder
I will hold ya, I will hold ya
So let me hold you, ooh-ooh
Let me hold you, ooh-ooh
Let me hold you, ooh-ooh
Let me hold you, let me hold you

We look diffеrent, you and me
But our magic's undernеath
Different skin, same within
And we could do with healing
But love's my favorite feeling
So I'll give all of me if that is what you need

And when the weight of the world's on your shoulders
I will hold ya, I will hold you
And when the nights only seem to get colder
I wil

In [77]:
song = genius.search_song('Fire In Bone', 'The Killers')
print(song.lyrics)

Searching for "Fire In Bone" by The Killers...
Done.
Fire In Bone Lyrics
(Fire in bone)
I felt fire in bone
(Fire in bone)
I felt fire in bone
(Fire in bone)

I felt cast out
I felt eighty-sixed
I felt darkness
But I felt fire in bone
I felt no good
I felt lowdown
And I felt alone
I felt unknown
I felt fire in bone
I felt fire in bone
(Wrongly accused and disowned)
I felt fire in bone

I felt big time
I felt major league
I felt weird sin
I felt fire in bone
I felt ripped off
Run down, wrung out
Empty, unseen
And preyed upon

I felt fire in bone
I felt fire in bone

When I came back empty-handed
You were waiting in the road
And you fell on my neck
And you took me back home
After all that I took from you
After all that I put you through
Here I am
I felt fire in bone

I felt washed up
I felt tempest-tossed and seasick
But I felt fire in bone
I felt mislead
I felt chained up, unclean, unknown
I felt unknown

I felt fire in bone
I felt fire in bone

And they say no one's gonna save you
You 