# Song Similarity and Recommender

One of my primary goals with this project is to prototype a song recommendation mechanism that can auto-generate a playlist of similar songs or return the most similar song based on an input song.

In this notebook, I assess the quality of song recommendations from applying cosine and euclidean distances to averaged Effnet and Pandora Mule embeddings. 

Similar to previous notebooks, this is an exploratory exercise meant to educate me about my database and how far I can go with this methodology.

In [73]:
#Imports
from typing import Union
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from itertools import combinations
import warnings
from umap import UMAP
from hdbscan import HDBSCAN
from nomic import atlas, AtlasProject
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
from sklearn.metrics.pairwise import cosine_distances, cosine_similarity, euclidean_distances

warnings.filterwarnings('ignore')
pd.set_option('max_colwidth', 100)
plt.style.use("ggplot")

In [2]:
#Imports from project package called project_tools
from project_tools.utils import adapt_array, convert_array, table_loader, camelot_convert

In [3]:
#Register these functions with sqlite3 so that we I can work with 
sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)

#Connect to db
conn = sqlite3.connect("../jaage.db", detect_types= sqlite3.PARSE_DECLTYPES)
cur = conn.cursor()

Show tables

In [4]:
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cur.fetchall()
tables

[('effnet_embeddings',),
 ('tonal_features',),
 ('lowlevel_features',),
 ('rhythm_features',),
 ('approachability_2c_effnet_discogs_1_activations',),
 ('danceability_effnet_discogs_1_activations',),
 ('engagement_2c_effnet_discogs_1_activations',),
 ('genre_electronic_effnet_discogs_1_activations',),
 ('mood_acoustic_effnet_discogs_1_activations',),
 ('mood_aggressive_effnet_discogs_1_activations',),
 ('mood_happy_effnet_discogs_1_activations',),
 ('mood_party_effnet_discogs_1_activations',),
 ('mood_sad_effnet_discogs_1_activations',),
 ('mtg_jamendo_genre_effnet_discogs_1_activations',),
 ('mtg_jamendo_moodtheme_effnet_discogs_1_activations',),
 ('mtg_jamendo_top50tags_effnet_discogs_1_activations',),
 ('timbre_effnet_discogs_1_activations',),
 ('lowlevel_barkbands_mean_tbl',),
 ('lowlevel_barkbands_stdev_tbl',),
 ('lowlevel_erbbands_mean_tbl',),
 ('lowlevel_erbbands_stdev_tbl',),
 ('lowlevel_gfcc_mean_tbl',),
 ('lowlevel_melbands_mean_tbl',),
 ('lowlevel_melbands_stdev_tbl',),
 ('lo

**Data Querying**

Load in the tags metadata. And drop songs longer than 10 minutes.

In [40]:
tags = table_loader(conn, "SELECT * FROM TAGS", apply_function=None)
tags = tags[tags.length <=60*10]
tags.head()

Unnamed: 0_level_0,length,gain,codec,file_name,bpm,initialkey,title,album,artist,date,genre,label
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
b806881a54bdbf9dd93a290716adf191,287.393372,-9.89514,pcm_s16le,04 House Of Love_PN.wav,119.0,6A,House Of Love_PN,,,,,
46e54d2ab920a088b77382e04877141b,311.251892,-11.281836,pcm_s16le,Alex Virgo - Rough N' Ready Edits - 06 A.T.S - Baa Daa Laa (Alex Virgo's Rough n Ready edit)_PN.wav,128.0,1A,Baa Daa Laa (Alex Virgo's Rough n Ready edit),Rough N' Ready Edits,A.T.S,2020.0,,
a204ddef5763df6d8f7677701fe9d96f,415.114746,-9.958479,pcm_s16le,01 Protostar_PN.wav,117.0,5A,Protostar,Planetary Groove,FROM BEYOND,2020.0,,
960097894e83c5810a9c649f17a4e551,321.108765,-12.223524,pcm_s16le,Cristal - Drink My Soul (Running Hot Edit)_PN.wav,120.0,5A,Drink My Soul (Running Hot Edit)_PN,,Cristal,,,
a3c1f277aa0110ffc418bf5fa3aa16aa,378.276276,-12.410757,pcm_s16le,Maya - Lait De Coco ( Les Yeux Orange Edit)_PN.wav,109.0,6A,Lait De Coco ( Les Yeux Orange Edit)_PN,,Maya,2016.0,,


Load in the effnet genres and embeddings data.

In [8]:
genres = table_loader(conn, "SELECT * FROM effnet_genres", apply_function=lambda x:x[0].mean())
keep_gcols = np.load("../keep_genre_cols.pkl", allow_pickle=True).tolist()
genres = genres[keep_gcols]

effnet_embeddings = table_loader(conn, "SELECT * FROM effnet_embeddings", 
                                 apply_function = lambda x:x[0].mean(axis=0))
effnet_embeddings = effnet_embeddings.effnet_embedding.apply(pd.Series)

Load in the Pandora Mule embeddings

In [9]:
mule_embeddings = pd.read_pickle("../music-audio-representations/pandora_embeddings.pkl")

Load in the key and scale data and convert to [camelot scale](https://mixedinkey.com/harmonic-mixing-guide/)

In [78]:
tonal = pd.read_sql_query("SELECT sid, tonal_chords_key, tonal_chords_scale FROM tonal_features", con = conn).set_index("sid")
tonal.head()

Unnamed: 0_level_0,tonal_chords_key,tonal_chords_scale
sid,Unnamed: 1_level_1,Unnamed: 2_level_1
b806881a54bdbf9dd93a290716adf191,G,minor
46e54d2ab920a088b77382e04877141b,B,major
a204ddef5763df6d8f7677701fe9d96f,C,major
960097894e83c5810a9c649f17a4e551,C,minor
a3c1f277aa0110ffc418bf5fa3aa16aa,G,minor


In [79]:
keys = tonal.tonal_chords_key + tonal.tonal_chords_scale
keys = keys.apply(camelot_convert).rename("tonal_keys")
tags = tags.join(keys)

Query the two embeddings dataframes with the metadata.

In [41]:
effnet_embeddings = effnet_embeddings.loc[tags.index]
mule_embeddings = mule_embeddings.loc[[i for i in tags.index if i in mule_embeddings.index]]

**Distance Matrices**

I precompute the distance matrics, so I don't have to recompute distance scores every time I want to derive the similarity between two songs.

There are four distance matrices. One for each combination of embeddings and distance function.

Remember that the cosine scores represent distance not similarity therefore lower equals more similar.

In [42]:
effnet_euclidean = euclidean_distances(effnet_embeddings)
effnet_euclidean = pd.DataFrame(index=effnet_embeddings.index, 
                                columns=effnet_embeddings.index,
                               data = effnet_euclidean)

effnet_cosine = cosine_distances(effnet_embeddings)
effnet_cosine = pd.DataFrame(index=effnet_embeddings.index, 
                                columns=effnet_embeddings.index,
                               data = effnet_cosine)


mule_euclidean = euclidean_distances(mule_embeddings)
mule_euclidean = pd.DataFrame(index=mule_embeddings.index, 
                                columns=mule_embeddings.index,
                               data = mule_euclidean)


mule_cosine = cosine_distances(mule_embeddings)
mule_cosine = pd.DataFrame(index=mule_embeddings.index, 
                                columns=mule_embeddings.index,
                               data = mule_cosine)

## Song Similarity


This has the potential to be one of the more fruitful tasks in this project. This has obvious usefulness if I'm compiling a playlist of similar songs and I don't want to trudge through 100s of songs to find the right one. If I ever build a real-time song recommendation app that outputs similar songs to the current one I'm playing, this analysis will form the foundation of that program.

As always it's important to keep in the back of my mind the limitations of the averaged embeddings in capturing the full essence of a song.

**Ear Test**

In each section of this notebook I'll be assessing the quality of similarity scores and similar songs using my domain experience accrued from my DJ career.

### Original vs Remixes

I am especially curious to see how this methodology appraises the similarity of a song and one of its remixes. In other words, how much does a remixing a song change it from the original version.

Let's test this out with a few examples.

####  [Enjoy Your Life - Oby Onyioha](https://www.youtube.com/watch?v=0FE_BEpxRzo&ab_channel=SoundwayRecords)

This wonderful Nigerian boogie gem is one of my absolute favorites and has been since I first got into DJing  — have listen by clicking the hyperlinked title. 

<img src="https://f4.bcbits.com/img/a2002161335_16.jpg" width=200 height=200 />


Below are two edits that are faster and more energetic than the original, both are actually kind of similar to one another. I like these edits because they inject some electricity into a groovy melody which makes the song suitable for a fast-paced club environment.

Remixes:

- [Partner Disco Edit](https://soundcloud.com/partnermusic/enjoy-your-life-partner-disco-edit)

- [Ben Gomori's New Lease Of Life Edit](https://www.youtube.com/watch?v=Hkq639FprNU&ab_channel=Gazzz696)

Grab the ids of the songs and collect them in a list

In [14]:
original_id = tags[tags.title == "Enjoy Your Life"].index[0]
partner_edit_id = 'd07d662459e5182bfbaf4eb201331f58'
ben_gomori_id = '488b1bb866c6d2c8d40f69ea44a78faf'

enjoy_ids = [original_id, partner_edit_id, ben_gomori_id]

This function outputs a dataframe that displays the distance score for every combination of song id.

In [30]:
def compare_songs(ids:list, dist_matrix:pd.DataFrame) -> pd.DataFrame:
    
    """
    This function returns a dataframe of the similarity scores for each combination of song that also includes
    the metadata information of each song.
    
    Params:
    
    ids: List of song ids
    
    dist_matrix: Precomputed distance matrix — one of the four embeddings&distance function combinations
    
    Output:
    
    A dataframe of 5 columns and n rows equal to number of pair combinations made from the ids.
    
    """
    
    #Initialize the output list
    output = []
    

    #Make combinations of ids
    combos = combinations(ids, 2)
    
    
    #Iterate over combinations
    for i1, i2 in combos:

        #Grab the similarity score using the provided dist_matrix
        score = dist_matrix.loc[i1, i2]
        
        #Grab title and artist information 
        title1 = tags.loc[i1, "title"]
        title2 = tags.loc[i2, "title"]
        artist1 = tags.loc[i1, "artist"]
        artist2 = tags.loc[i2, "artist"]
        
        #Append to the output list
        output.append([title1, artist1, title2, artist2, score])
        
    #Collect data into a dataframe sorted by score ascending
    df = pd.DataFrame(output, columns = ["title1", "artist1",
                                             "title2", "artist2", 
                                             "score"]).sort_values(by = "score")
    return df

Effnet Embeddings & Euclidean Distance

In [31]:
compare_songs(enjoy_ids, effnet_euclidean)

Unnamed: 0,title1,artist1,title2,artist2,score
2,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",2.343352
0,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,3.433256
1,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",3.85545


Effnet Embeddings & Cosine Distance

In [22]:
compare_songs(enjoy_ids, effnet_cosine)

Unnamed: 0,title1,artist1,title2,artist2,score
2,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",0.169025
0,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,0.442819
1,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",0.48302


Mule Embeddings & Euclidean Distance

In [23]:
compare_songs(enjoy_ids, mule_euclidean)

Unnamed: 0,title1,artist1,title2,artist2,score
2,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",7.179798
0,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,9.310528
1,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",9.622669


Mule Embeddings & Cosine Distance

In [24]:
compare_songs(enjoy_ids, mule_cosine)

Unnamed: 0,title1,artist1,title2,artist2,score
2,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",0.350808
1,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (Ben Gomori's New Lease Of Life Edit),"Oby Onyioha, Ben Gomori",0.595406
0,Enjoy Your Life,Oby Onyioha,Enjoy Your Life (PARTNER DISCO EDIT),Oby Onyioha,0.606712


Not surprised to see that the two remixes are more similar to each other than the original, if you give them a listen you'll likely to make the same conclusion.

While it's hard to asses the euclidean scores without the proper context, I am curious as to why the pandora embeddings produce less similar cosine scores than its effnet counterpart. And I do think the effnet cosine scores for the remixes are appropriate.

#### [Lotus72D - Ze Roberto](https://www.youtube.com/watch?v=c910QHoeZ3Q&ab_channel=DeejayAlexPaz)

Lotus 72D is a sumptous Brazilian samba track that sits on the top shelf of my DJ collection.

<img src="https://f4.bcbits.com/img/a3968505998_16.jpg" width=200 height=200 />


Here are four remixes that are more varied than the two "Enjoy Your Life" edits.


Remixes:

- [Leslie Lello Edit](https://www.youtube.com/watch?v=IPi1y3WBEfk&ab_channel=LeslieLello)

- [DJ Zinco Edit](https://www.youtube.com/watch?v=C4Ny_xRS_OQ&ab_channel=DjZinco)

- [Bernardo Pinheiro Edit](https://soundcloud.com/bernardopinheiro/ze-roberto-lotus-72d-bernardo-pinheiro-edit)

- [Wurzelholz Edit](https://soundcloud.com/lisztomania/premiere-wurzelholz-ze-roberto-villes-et-fleurs-3)

Collect song ids

In [32]:
original_id = "0c6b1ccd4818e876d5e65fdee752e790"
lello_id = 'd62e1676940560cdfb1212a0a329c6ff'
zinco_id = '8703870dec527b72ecb6986c9859072a'
bernardo_pinheiro_id = '942f7fe5897b42d6f9bac95457303a39'
wurzelholz_id = '1a4d85802e3e6a6761e70a61ef1d5cf3'

lotus_ids = [original_id, lello_id, zinco_id, bernardo_pinheiro_id, wurzelholz_id]

Effnet & Euclidean

In [33]:
compare_songs(lotus_ids, effnet_euclidean)

Unnamed: 0,title1,artist1,title2,artist2,score
9,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,2.177342
0,Lotus 72D (Original),Zé Roberto,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,2.519627
7,Lotus 72 (DJ Zinco Edit),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,2.761502
5,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,3.142235
6,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,3.471438
2,Lotus 72D (Original),Zé Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,3.480601
8,Lotus 72 (DJ Zinco Edit),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,3.575589
4,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,3.613783
3,Lotus 72D (Original),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,3.663074
1,Lotus 72D (Original),Zé Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,3.787559


In [34]:
compare_songs(lotus_ids, effnet_cosine)

Unnamed: 0,title1,artist1,title2,artist2,score
0,Lotus 72D (Original),Zé Roberto,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,0.21308
9,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,0.240237
7,Lotus 72 (DJ Zinco Edit),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.314538
5,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.423896
4,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,0.464147
2,Lotus 72D (Original),Zé Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.478943
1,Lotus 72D (Original),Zé Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,0.479748
6,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,0.482267
3,Lotus 72D (Original),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,0.497812
8,Lotus 72 (DJ Zinco Edit),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,0.506239


In [35]:
compare_songs(lotus_ids, mule_euclidean)

Unnamed: 0,title1,artist1,title2,artist2,score
0,Lotus 72D (Original),Zé Roberto,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,4.510644
7,Lotus 72 (DJ Zinco Edit),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,4.756836
4,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,5.691854
5,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,5.8103
1,Lotus 72D (Original),Zé Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,5.938774
2,Lotus 72D (Original),Zé Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,6.117986
9,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,9.061983
8,Lotus 72 (DJ Zinco Edit),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,9.240207
6,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,9.372065
3,Lotus 72D (Original),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,9.436915


In [36]:
compare_songs(lotus_ids, mule_cosine)

Unnamed: 0,title1,artist1,title2,artist2,score
0,Lotus 72D (Original),Zé Roberto,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,0.162822
7,Lotus 72 (DJ Zinco Edit),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.174665
4,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,0.240912
5,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.268645
1,Lotus 72D (Original),Zé Roberto,Lotus 72 (DJ Zinco Edit),Ze Roberto,0.276393
2,Lotus 72D (Original),Zé Roberto,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,0.319198
8,Lotus 72 (DJ Zinco Edit),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,0.627806
9,Lotus 72D (Bernardo Pinheiro Edit),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,0.646767
6,Lotus 72d (Leslie Lello Re-Groove),Ze Roberto,Ze Roberto (Original Mix),Wurzelholz,0.659285
3,Lotus 72D (Original),Zé Roberto,Ze Roberto (Original Mix),Wurzelholz,0.710224


### Top 5 Similar Songs

Let's go from assessing the scores for a group of similar songs to assessing the recommended songs given an input song.

`most_similar_songs` is a function that returns the n most similar songs for a chosen song

In [49]:
def most_similar_songs(ref_id:str, dist_matrix:pd.DataFrame, n:int = 5) -> pd.DataFrame:
    
    """
    This functions takes a song id and distance matrix, queries the distance matrix with that song id
    and then returns the n most similar songs to the input song.
    
    Params:
    
    ref_id: Short for reference id, a str representing a song's unique id that's used to query a distance matrix
    
    dist_matrix: Precomputed distance matrix — one of the four embeddings&distance function combinations
    
    n: Defaults to 5, the number of most similar songs to return
    
    """
    
    #Grabs the all the distance scores for the input song
    dists = dist_matrix[ref_id]
    
    #Sorts the distances in ascending order and grabs the n+1 most similar scores 
    #because I drop the reference song later
    
    top_scores = dists.nsmallest(n = n+1).rename("score")
    
    #Concatenate the top scores with the metadata for the output and drop the reference song id
    frame = pd.concat([tags[["artist", "title"]], top_scores],axis = 1, join = "inner").drop(ref_id)
    
    #Sort by score column in ascending order
    return frame.sort_values(by = "score")

The first song in this section is a touch up of a [classic house banger](https://www.youtube.com/watch?v=aNwhGeu9Jv0&pp=ygUSYmlnIGZ1biBpbm5lciBjaXR5) by Detroit outfit Inner City. The HBR remix is what I like to see in an edit/remix/rework in that it polishes the tune while reinforcing the drums.

I chose this song because I want to start off with something not so complex in terms of song composition.

The ideal goal here is to retrieve other (presumably house) songs that match the vibe and energy of this one.

**Song1: [Big Fun HBR Extended City Remix - Inner City](https://www.youtube.com/watch?v=osxUR8Whf2A&ab_channel=BMG1999MusicGroup)**


https://f4.bcbits.com/img/a1395136794_16.jpg

In [50]:
big_fun_id = '82acd6f97f46af0579b94ea28db31151'

Show the top 5 most similar songs to Big Fun HBR Remix

Effnet & Euclidean

In [53]:
most_similar_songs(big_fun_id, effnet_euclidean)

Unnamed: 0_level_0,artist,title,score
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
309ed5f8812e8a26906a79af72ebe728,Azari & III,Reckless (With Your Love) (Steve Lawler Remix),1.477532
f2bf3d1f2743546c91c2d3ca753584f7,Dimitri From Paris & DJ Rocca,Days of a Better Paradise,1.489283
2518d845aed4fca9cd4e68ea85496f59,Cheb Runner,Law Kan (RCCR Remix),1.627394
6a4015904ca708aae0bb4cc3332c1122,2 In A Room,A Passing Thought (12 Remix),1.655594
d51ea51e5b360c4f6b0131dfbb074a48,2 In A Room,A Passing Thought (LP Mix),1.673257


Effnet & Cosine

In [54]:
most_similar_songs(big_fun_id, effnet_cosine)

Unnamed: 0_level_0,artist,title,score
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
309ed5f8812e8a26906a79af72ebe728,Azari & III,Reckless (With Your Love) (Steve Lawler Remix),0.077974
f2bf3d1f2743546c91c2d3ca753584f7,Dimitri From Paris & DJ Rocca,Days of a Better Paradise,0.092292
8ffb02945ae97b27083370f2f4116442,Andy Ash,Drums 4 Acid,0.113116
558230f7ee0ab452a7e407dec4691509,Dagfest,Chasing (Original Mix),0.118583
7ec228a634d82638744136a33c03b7d3,Marshall Jefferson,Move Your Body (Mack The Producer remix),0.119364


On first take, I have to say I'm pleased with the results. The most similar effnet embedding for both distance matrix has a distinctly classic house feel — courtesy of the remixer who transformed the original which was released in 2009. 

Reckless (With Your Love) (Steve Lawler Remix) certainly belongs in the same set list as the reference track.

Days Of A Better Paradise probably wouldn't make my top 30 most similar songs if I did this task manually, it's much more synthy and its energy lags behind Big Fun's.

The two "A Passing Thought" tracks in the euclidean dataframe are acceptable inclusions given their strong late 80s house identity. "Law Kan (RCCR Remix)" is a curious inclusion because even though it's been given a radical house-flavored transformation, it possess melodies and instrumentation different from Big Fun.


For the cosine dataframe, I think "Drum 4 Acid" and "Chasing (Original mix)" are debatable in terms of their low cosine distance scores but aren't out of place. "Move Your Body (Mack The producer remix)" deserves a higher ranking than 5th given that it is also a remix of legendary classic house track.


In [60]:
most_similar_songs(big_fun_id, mule_euclidean)

Unnamed: 0,artist,title,score
e6c5a8d32288da13fcf90b85518bd3a5,Inner City,Paradise_(Original Mix)_PN,4.918695
62d5c22ea4d47d8b80a1f51ac9e92d23,Shan,Work It (Piano Mix),4.987339
8dd289b722e63233a89ac0dbdaedd584,Roma SZ,Husn Hai Suhana [Bomba!],5.407232
8bf264dc346615ad103e9281397723e4,Nick Garcia,Joia,5.411583
c946e57f51b5e9a703667f7553b081c8,Aroop Roy,Quem Vai Querer,5.507912


In [61]:
most_similar_songs(big_fun_id, mule_cosine)

Unnamed: 0,artist,title,score
e6c5a8d32288da13fcf90b85518bd3a5,Inner City,Paradise_(Original Mix)_PN,0.15912
62d5c22ea4d47d8b80a1f51ac9e92d23,Shan,Work It (Piano Mix),0.165804
f9d27ff5460b9c21665d7c2ed4d17816,Beat Foundation,Martini Attack,0.195497
8dd289b722e63233a89ac0dbdaedd584,Roma SZ,Husn Hai Suhana [Bomba!],0.196009
df93b9af75b1a70db473e3596360747b,Sunaas,2 in a Room - Do What You Want (confined Sunaas edit),0.204035


Two things jump out to me right away: the most similar song in each table is from the same artist as the reference track and there's no over lap with the two effnet tables.

The usefulness of seeing another Inner City song as the most similar song is that functions as a sanity check, meaning the embeddings and distance function can actually work to find similar pairs of songs.


["Work It (Piano Mix)"](https://www.youtube.com/watch?v=od887TIwVaQ) and ["Husn Hai Suhana [Bomba!]"](https://www.youtube.com/watch?v=8X-8PZO79pU) appear in both tables and are worthy of being grouped with Big Fun, though "Work It" is more deserving because of its prominent old school synths.

["Martini Attack"](https://www.youtube.com/watch?v=rhNFOtO8o3g) certainly wouldn't be categorized as a false positive in my book.

After only one song, I am quite satisfied with the results. My expectations were surpassed I have to say, I think the embeddings in conjunction with distance metrics for the most part can be relied on to retrieve similar songs.

**Song 2: [Stages (Prins Thomas Edit) - Don Laka](https://www.youtube.com/watch?v=WVemGiWZSps)**

For the second song, I present a song that ranks 13th in playcount (based on rekordbox's counting) because I love it that much. This is a touch up of a South African boogie classic by the legendary Don Laka.

Let's see if we can find some songs worthy of preceding or succeeding this song in a setlist.

In [66]:
stages_id = 'bcf086a353ea27536d84b3668388f935'

Show top 6 because the most similar song is a different edit of the same original song.

In [68]:
most_similar_songs(stages_id, effnet_euclidean, n= 6)

Unnamed: 0_level_0,artist,title,score
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4625a8921ac4d1620ea190bfa60acc14,Don Laka,Stages (Dance Motif Edit),1.930395
5490ec30cafba5f87afa3bef180b45a9,Tabu Ley Rochereau,Hafi Deo (Adult Instruments & Linntronix Edit)FREE DOWNLOAD,2.20098
ee669442e5a745a8e8752ce08bb71501,Adelle First,Don't Give Up (Dub Mix)_PN,2.237833
5f4bc658d31980f47a339e83eb348d2b,Adelle First,Don't Give Up,2.269334
9868b97a596f30ade61b172bc5e0bbc9,Band of Misfits,Saffa Saphela (Edit)_PN,2.301913
a8265d96ab0a0a4c79c7faa53a265962,Om Alec Khaoli,Say You Love Me,2.387556


In [69]:
most_similar_songs(stages_id, effnet_cosine, n= 6)

Unnamed: 0_level_0,artist,title,score
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4625a8921ac4d1620ea190bfa60acc14,Don Laka,Stages (Dance Motif Edit),0.098063
0fe15b6f047b5f1aa8acce9944bfa4b7,Bornn,Visible Love,0.113873
5490ec30cafba5f87afa3bef180b45a9,Tabu Ley Rochereau,Hafi Deo (Adult Instruments & Linntronix Edit)FREE DOWNLOAD,0.134235
5f4bc658d31980f47a339e83eb348d2b,Adelle First,Don't Give Up,0.137341
ee669442e5a745a8e8752ce08bb71501,Adelle First,Don't Give Up (Dub Mix)_PN,0.137543
fa1f1eef40907f1aabf238fe1f9c0908,TZ Junior,Sugar My Love_PN,0.143752


All the recommended songs come from fellow South African artists who were prominent in the 80s.

No criticism from me here — constructive or otherwise. 

In [71]:
most_similar_songs(stages_id, mule_euclidean, n= 6)

Unnamed: 0,artist,title,score
4625a8921ac4d1620ea190bfa60acc14,Don Laka,Stages (Dance Motif Edit),3.827309
fb91f9ff577a2a3b7f033f30a15a42c5,BODIE LEE,Jiggl'O,5.364482
b40959be2beccf18062b943211005746,Mystic Jungle,Get Down On It,5.37687
12a9b87b3b8a5ba3880c5a5e6b07b8b4,Amadou & Mariam,Bofou Safou,5.383849
e3ca824b3b636c9515d9d13d5ca8a9e4,Le Choc Stars Du Zaire,What Did She Say,5.512924
072d75c075b27a804ab51e7a3ae57deb,Le Choc Stars Du Zaire,Nakombe Nga,5.543299


In [72]:
most_similar_songs(stages_id, mule_cosine, n= 6)

Unnamed: 0,artist,title,score
4625a8921ac4d1620ea190bfa60acc14,Don Laka,Stages (Dance Motif Edit),0.085239
072d75c075b27a804ab51e7a3ae57deb,Le Choc Stars Du Zaire,Nakombe Nga,0.184512
e3ca824b3b636c9515d9d13d5ca8a9e4,Le Choc Stars Du Zaire,What Did She Say,0.186017
b40959be2beccf18062b943211005746,Mystic Jungle,Get Down On It,0.191575
fb91f9ff577a2a3b7f033f30a15a42c5,BODIE LEE,Jiggl'O,0.204021
397252a6edc577932d490ea9c05920ff,Jivaro,What Next (Dub Mix),0.208111


I think I'm starting to see how Pandora Mule embeddings differ from effnet. The effnet recommendations are all the obvious choices — distinctively South African tunes with strong synth pop and boogie notes. While the Pandora recommendations feature songs that aren't like-for-life matches they do feel like musical relatives on a more subtle and abstract level.

For example ["Jiggl'O"](https://www.youtube.com/watch?v=8LtXaSQ8mT4) is something I'd play after Stages when I want to up the energy of the environment.


Like with "Big Fun" I bestow my seal of approval on these results. I also appreciate that for both songs, there was no overlap between the most similar effnet and Pandora songs because the variety could come in handy when one of set of embeddings isn't suitable for a task.


I could keep going on and on but I think I've had a decent taste of what I'm working with here.


## Song Recommender

Now let's see if I can generate a playlist of songs based on a starting point. What that means is that I chose a song, retrieve it's most similar song, then retrieve that song's closest relative and until repeat this process for n number of songs.

For this task I'm incorporating my DJ domain expertise by filtering the most similar songs based on tempo (bpm) and key. Since this playlist is sequential, I want to play songs that similar bpms and keys to the current song. DJs typically don't change bpms by more than 5-10 points and when possible avoid [key clashing](https://mixedinkey.com/harmonic-mixing-guide/)


Therefore my setlist generator function pulls recommended songs that are within a given bpm window of the reference track and don't differ from its key by a lot.

###  Functions

`song_recommender` returns the most similar song based an input reference track and distance matrix. Uses an utility function called `key_matcher` which determines the distance between a pair of camelot keys, refer to [this guide](https://mixedinkey.com/harmonic-mixing-guide/) for information on harmonic mixing and mixing in key. I plan to use 5 as my bpm window.

`setlist_generator` generates a dataframe of songs using `song_recommender`.


In [87]:
def song_recommender(ref_id:str, dist_matrix:pd.DataFrame, bpm_window:int= 5, excludes:Union[None, list] = None) -> str:
    
    
    """
    Returns the most similar to a reference song assuming to query stipulations about bpm and key.
    
    Params:
    
    ref_id: Short for reference id, a str representing a song's unique id that's used to query a distance matrix
    
    dist_matrix: Precomputed distance matrix — one of the four embeddings&distance function combinations
    
    bpm_window: An integer used to filtered the pool of possible recommendations so that the function 
    only returns a song of a bpm within n points of the reference track
    
    
    excludes: Acts as a way to manually excludes songs from the recommendation pool. Useful for the setlist_generator
    function when I don't want a song already on the setlist to be re-recommended.
    
    
    Returns:
    
    The id of the most similar song.
    
    """
    
    #Grab bpm and key
    song_bpm = tags.loc[ref_id, "bpm"]
    song_key = tags.loc[ref_id, "tonal_keys"]
    
    #Calculate upper and lower bpm window
    lower_bpm = song_bpm - bpm_window
    upper_bpm = song_bpm + bpm_window
    
    #Ensures that the distance matrix is in line with the metadata
    dist_matrix = dist_matrix.loc[tags.index]
    
    #Initialize songs I'm going to drop from the song pool, starting with the reference song
    drops = [ref_id]
    
    #Add excludes song to drops if they are not None
    if excludes is not None:
        drops.extend(excludes)
    
    #Song_pool is the collection of ids of the songs that fall within the bpm and key windows
    song_pool = tags[(tags.tonal_keys.apply(lambda x:key_matcher(song_key, x))) & 
                     (tags.bpm.between(lower_bpm, upper_bpm))].drop(drops, errors = "ignore").index.tolist()
    
    #Query the distance matrix using the reference song and song pool and grab the arg min id
    most_similar_song = dist_matrix.loc[ref_id, song_pool].idxmin()
    
    #Return most similar id
    return most_similar_song


def setlist_generator(ref_id:str, dist_matrix:pd.DataFrame, bpm_window:int= 5, n_songs = 10) -> pd.DataFrame:
    
    
    """
    Generates a setlist of songs based on a starting point song.
    
    Params:
    
    ref_id: Short for reference id, a str representing a song's unique id that's used to query a distance matrix
    
    dist_matrix: Precomputed distance matrix — one of the four embeddings&distance function combinations
    
    bpm_window: An integer used to filtered the pool of possible recommendations so that the function 
    only returns a song of a bpm within n points of the reference track
    
    n_songs: The number of songs for the setlist
    
    Returns
    
    A Dataframe of n_songs in the playlist.
    
    """
    
    #Initializ setlist with the id of the reference track
    setlist = [ref_id]
    
    #excludes is initialized as None but will change after the first iteration
    excludes = None
    
    #Conduct n_songs iterations
    for i in range(n_songs): 
        #For each iteration find the most similar song to the reference song
        
        most_similar_song = song_recommender(ref_id, dist_matrix, 
                                             bpm_window=bpm_window, excludes = excludes)
            
        #add most similar song to the setlist
        setlist.append(most_similar_song)
        
        #Overwrites excludes to be a copy of setlist so that in the next iteration 
        #the song_recommender won't rerecommend the same tracks
        excludes = setlist[:]
        
        #The upcoming song becomes the now playing song.
        ref_id = most_similar_song
        
    #Query tags dataframe using the setlist (list of ids) and the title and artist columns
    return tags.loc[setlist, ["title", "artist"]].assign(order = range(1, len(setlist) + 1))
    

**Song 1 Example: [You're The One - Hyas](https://www.youtube.com/watch?v=Nbb2DTEV2wA&ab_channel=Hyas-Topic)**

Here we have a real dancey synthpop italodisco-esque tune that's guaranteed to get bodies moving on the dance floor.

In [88]:
youretheone_id = '4ea861e21c9062fa9c4150421c71cc30'

Generate setlist using the mule cosine distance matrix

In [89]:
youretheone_setlist = setlist_generator(youretheone_id, effnet_cosine)
youretheone_setlist

Unnamed: 0_level_0,title,artist,order
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4ea861e21c9062fa9c4150421c71cc30,You're The One (Hyas Edit)_PN,Hyas,1
f2e2473be147055547243e46f0777ef8,Perfect Love (The Exorbitant 12 inch extended mix),Linda Jo Rizzo,2
1b891ec736dc0097b96a9bee5e100a4c,Love Letter (Modern Sisters Remix)_PN,Fabio Monesi,3
be636b5e2145ac8680363debb21ace50,You Believe,Alex Virgo & Benjamin Groove,4
d2559b8322a1ba03ec38864233a7bd74,Gods From Other Space,Ilya Santana,5
aacd98f09aa0570903ff541085e98a66,Gorilla Man (Leo af Ekenstam edit),Condry Ziqubu,6
6f6e063b243eeda2bca53f661733c61c,Don't Go Lose It Baby,Palavas,7
8bf264dc346615ad103e9281397723e4,Joia,Nick Garcia,8
75b25c75dfa8acf4b6c727f72dc22c93,Ben Gomori - DM Slide (Dub),William Jorge Alves Caldeira,9
a301313483872bef85b4567279189c6f,Rugare (Faze Action dub mix),FAZE ACTION/ZEKE MANYIKA,10


"Perfect Love" belongs because it also very synthy, 80's electro, has female vocals, and energetic. "Love Letter" also a spot-on inclusion as well. 

When the setlist reaches ["You Believe"](https://www.youtube.com/watch?v=VvDXcxNJyAY) and ["Gods From Other Space"](https://soundcloud.com/ilya-santana/gods-from-other-space-ilya-santana-re-edit) the vibe is still very electro-pop 80s.

However there's quite a pivot from ["Gods From Other Space"](https://soundcloud.com/ilya-santana/gods-from-other-space-ilya-santana-re-edit) to ["Gorilla Man (Leo af Ekenstam edit)"](https://soundcloud.com/sinchicollective/sinchi-edits-11-condry-ziqubu-gorilla-man-leo-af-ekenstam-edit) a galloping house edit of a South African boogie pop tune. It's a curious development that I'm not necessarily opposed to. Sometimes pivots are good for the sake of not boring the crowd with the same set of songs.

The second half of the setlist are also 80s-themed tracks but of the Brazilian and African variety.


While I probably wouldn't play these songs in this exact order, I do think this method could influence ordering at a more general level.

Overall I'm pretty happy with this result.

**Song 2 Example: [Des Promesses - Voilaa](https://www.youtube.com/watch?v=yC_Zy-hW3YU)**

A more relaxing yet still danceable jazzy zouk tune with prominent vocals.

In [93]:
des_promesses_id = "847c9859559964e46ec8b7ffbc581cfe"

In [94]:
des_promesses_setlist = setlist_generator(des_promesses_id, effnet_cosine)
des_promesses_setlist

Unnamed: 0_level_0,title,artist,order
sid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
847c9859559964e46ec8b7ffbc581cfe,Des Promesses (Feat. Pat Kalla),Voilaaa,1
d810cfe48a6cb790055567d026cb1d78,Lírio Pra Xangô (REIS Edit),Trio Mocotó,2
91bfe8892cce4c1ef7a624a3955e964a,Djal Bai Si Camin (Déni-Shain Remix),Antonio Dos Santos,3
d892872603301623deb935dfa179bec0,Rio Dèjà vu (De Gama Re-Groove),Lego Edit,4
e5fcb07ed8aeb3a5817871bcd2333acd,Rio Dèjà vu (Original),Lego Edit,5
012330511d01cc6f6324936d9f161d05,Hot Burn [Daje Funk],Les Inferno,6
dba0b248941846e9efe09f68aff26c6b,Calling From Uganda,Alexny,7
75ac2b181cce428bc4b01d02acab332b,Cahuita (Original Mix)_PN,Alexny,8
c874b34eb8e770b25092bb3b4a4fd780,Cahuita,Alexny,9
2d7c779d07f7261e933481e0109f4e8a,Boogie On (Original Mix),Alexny,10


### Future work

In the [Unsupervised Learning - Dimensionality Reduction](Unsupervised%20Learning%20-%20Dimensionality%20Reduction.ipynb) notebook I use an open source embeddings tool called [Nomic](https://atlas.nomic.ai/) to generate interactive 2D embeddings from my songs. Nomic offers vector search capability for finding similar vectors/embeddings, therefore I do plan in the future to recreate the task in this notebook with Nomic.