# Notebook 6: Calculating Distance to Get Album Recommendations

### Introduction

At this point, I am fairly confident that I can group similar albums next to each other, and separate dissimilar albums. Now I can calculate the distances between albums to get the next most-similar album.

In [1]:
import pandas as pd
import numpy as np
import pickle
import sys

sys.setrecursionlimit(1000000) #to allow pickling

from sklearn.metrics.pairwise import cosine_similarity

In [2]:
with open('../data/components_by_album.pickle', 'rb') as read_file:
    components_by_album = pickle.load(read_file)

### Calculate Distances Between Albums Using Cosine Similarity

In [3]:
album_distances = (pd.concat([components_by_album.reset_index().iloc[:, 0], 
                             pd.DataFrame(cosine_similarity(components_by_album, 
                                                            components_by_album))], axis = 1)
                   .set_index('album_artist'))

In [4]:
album_distances

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,982,983,984,985,986,987,988,989,990,991
album_artist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
album:'Sno Angel Like You artist:Howe Gelb,1.000000,0.196119,-0.126353,-0.349491,0.680758,0.002047,0.663652,-0.590387,0.018596,0.681113,...,-0.447184,0.256640,-0.036927,0.590311,0.373287,0.212593,0.348281,-0.464251,-0.161543,-0.531004
album:(After) [Live] artist:Mount Eerie,0.196119,1.000000,0.151098,-0.118240,0.385248,-0.767658,0.499389,-0.338818,0.400910,-0.359876,...,-0.608348,0.196259,0.484581,0.456130,-0.244360,0.222367,0.685000,-0.182804,0.281368,0.066080
album:1988 artist:Blueprint,-0.126353,0.151098,1.000000,0.610367,-0.609010,0.294768,0.368747,0.000299,-0.745865,0.060013,...,-0.203739,0.864132,-0.623440,-0.660930,0.634253,0.905313,0.641563,0.711170,-0.742812,0.117676
album:1991 [EP] artist:Azealia Banks,-0.349491,-0.118240,0.610367,1.000000,-0.505227,0.130452,-0.008967,0.190758,-0.578058,-0.096567,...,-0.294269,0.437646,-0.389293,-0.598179,0.493837,0.277572,0.409281,0.431441,-0.609681,-0.205330
"album:22, A Million artist:Bon Iver",0.680758,0.385248,-0.609010,-0.505227,1.000000,-0.495114,0.200953,-0.445459,0.634885,0.125029,...,-0.554505,-0.399147,0.564223,0.900792,-0.132118,-0.375139,0.101007,-0.871041,0.432412,-0.544459
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
album:uknowhatimsayin¿ artist:Danny Brown,0.212593,0.222367,0.905313,0.277572,-0.375139,0.325107,0.596381,-0.145375,-0.672521,0.276304,...,-0.212805,0.881816,-0.593259,-0.424812,0.671954,1.000000,0.652536,0.566499,-0.674262,0.035495
album:untitled unmastered. artist:Kendrick Lamar,0.348281,0.685000,0.641563,0.409281,0.101007,-0.378781,0.735278,-0.276315,-0.176588,-0.041926,...,-0.790101,0.696093,-0.029068,-0.022404,0.413872,0.652536,1.000000,0.026822,-0.370425,-0.268157
album:xx artist:The xx,-0.464251,-0.182804,0.711170,0.431441,-0.871041,0.576950,-0.051881,0.074815,-0.781519,0.134648,...,0.443031,0.585077,-0.720607,-0.665477,0.285858,0.566499,0.026822,1.000000,-0.577103,0.563069
album:Ágætis Byrjun artist:Sigur Rós,-0.161543,0.281368,-0.742812,-0.609681,0.432412,-0.666520,-0.188970,0.321298,0.925294,-0.586750,...,0.143715,-0.807199,0.913757,0.511160,-0.860985,-0.674262,-0.370425,-0.577103,1.000000,0.094650


In [5]:
album_distances['combined'] = album_distances.values.tolist()

In [6]:
#max_cosine variable has the index of the closest album, that isn't itself
album_distances['max_cosine'] = album_distances.combined.map(lambda x: np.argsort(x)[::-1][1])

### Check Recommendations

I'll take a quick look at a few albums and their closest similar albums, just to make sure things make sense.

In [7]:
album_distances.iloc[:, 993]

album_artist
album:'Sno Angel Like You artist:Howe Gelb          535
album:(After) [Live] artist:Mount Eerie             735
album:1988 artist:Blueprint                         792
album:1991 [EP] artist:Azealia Banks                872
album:22, A Million artist:Bon Iver                 650
                                                   ... 
album:uknowhatimsayin¿ artist:Danny Brown           118
album:untitled unmastered. artist:Kendrick Lamar    824
album:xx artist:The xx                              928
album:Ágætis Byrjun artist:Sigur Rós                159
album:Ø (Disambiguation) artist:Underoath           553
Name: max_cosine, Length: 992, dtype: int64

In [8]:
album_distances.iloc[[159, 792], 993]

album_artist
album:Centralia artist:Mountains      990
album:The Black Album artist:Jay-Z      2
Name: max_cosine, dtype: int64

In broad strokes, there are similarities here. Sigur Rós's Ágætis Byrjun is similar to Centralia's Mountains - they are both very atmospheric, instrumental albums. Both Blueprint's 1988 and Jay-Z's Black Album are Hip-Hop/Rap. This makes sense.

### Return Most Similar Album

These are the five most similar albums to Blueprint's 1988.

In [9]:
dist = cosine_similarity(np.array(components_by_album.loc['album:1988 artist:Blueprint', ]).reshape(1, -1), components_by_album)[0]

In [10]:
match = np.argsort(dist)[::-1][:10]

In [11]:
components_by_album.reset_index().iloc[match[1:6], 0].values

array(['album:The Black Album artist:Jay-Z',
       'album:Control System artist:Ab-Soul',
       'album:King Push - Darkest Before Dawn: The Prelude artist:Pusha T',
       'album:The Blueprint artist:Jay-Z',
       'album:Emeritus artist:Scarface'], dtype=object)

The five most similar albums to The Breeders' All Nerve include Lykke Li's Wounded Rhymes.

In [12]:
match = (np.argsort(cosine_similarity(np
                                      .array(components_by_album
                                             .loc['album:All Nerve artist:The Breeders', ])
                                      .reshape(1, -1), components_by_album)[0])[::-1][:10])

In [13]:
components_by_album.reset_index().iloc[match[1:6], 0].values

array(['album:Wounded Rhymes artist:Lykke Li',
       'album:Two Dancers artist:Wild Beasts',
       'album:Mwng artist:Super Furry Animals',
       'album:Interstate Gospel artist:Pistol Annies',
       'album:Nomad artist:Bombino'], dtype=object)

The five most similar albums to Janelle Monáe's Dirty Computer includes Solange's A Seat at the Table.

In [14]:
match = (np.argsort(cosine_similarity(np
                                      .array(components_by_album
                                             .loc['album:Dirty Computer artist:Janelle Monáe', ])
                                      .reshape(1, -1), components_by_album)[0])[::-1][:10])

In [15]:
components_by_album.reset_index().iloc[match[1:6], 0].values

array(['album:Transangelic Exodus artist:Ezra Furman',
       'album:Saturn artist:nao',
       'album:Elephant artist:The White Stripes',
       'album:Yankee Hotel Foxtrot artist:Wilco',
       'album:A Seat at the Table artist:Solange'], dtype=object)