Simple recommendation model that uses SciKit-Learn to suggest similar albums. 

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import display, HTML

Reads .csv into dataframe and strip spaces from column names.

In [2]:
df = pd.read_csv("rym_top_5000_all_time.csv")
df.columns = df.columns.str.replace(' ', '')

Convert album names to all caps for easier searching.

In [3]:
def album_upper(row):
    return row["Album"].upper()
df["Album"] = df.apply(album_upper, axis=1)

Combine Artist Name, Genres, and Descriptors as a comma delimmited list in a new column. "Genres" tags repeat to give double weight to albums of similar genres.

In [4]:
tags = ['ArtistName', 'Genres', 'Descriptors']
for tag in tags:
    df[tag] = df[tag].fillna('')

def concat_tags(row):
    return row['ArtistName']+", "+row['Genres']+", "+row['Descriptors']+", "+row['Genres']
df["all_tags"] = df.apply(concat_tags, axis =1)

Creat "diplay_tags" column to display tags with result.

In [5]:
def display_tags(row):
    return row['Genres']+", "+row['Descriptors']
df["display_tags"] = df.apply(display_tags, axis =1)

CountVectorizer creates a matrix storing a count or "score" for each tag it finds in common with other album tags.

Cosine similarity calculates the angle between vectors, represented by "count_matrix". Album with the smallest cosine similarity value will return as most similar. 

In [6]:
cv = CountVectorizer()
count_matrix = cv.fit_transform(df['all_tags'])

cosine_sim = cosine_similarity(count_matrix)

Define functions to get index value from Album Name and to get Album Name, Artist Name, and Display Tags from the index value for displaying the result.

In [7]:
def get_album_from_index(Index):
    return df[df.index == Index]['Album'].values[0]

def get_index_from_album(album):
    return df[df.Album == album]['Album'].index.values.astype(int)[0]

def get_artist_from_index(Index):
    return df[df.index == Index]['ArtistName'].values[0]

def get_tags_from_index(Index):
    return df[df.index == Index]['display_tags'].values[0]

Returns the Top 10 most similar albums to "selected_album" and displays the Album Name, Artist Name, and Display Tags. We can change the example album with any album in the list. Entering an album that does not appear on the list or is misspelled will return an error message.

In [9]:
selected_album = ("Vespertine").upper()

try:
    album_index = get_index_from_album(selected_album)
    suggestion =  list(enumerate(cosine_sim[album_index]))
    sorted_albums = sorted(suggestion,key=lambda x:x[1],reverse=True)[1:]
    album_sug_df = pd.DataFrame(index=range(10), columns=("Album", "Artist", "Tags"))
    print("Input album: " + selected_album + ". Tags: " + get_tags_from_index(album_index) +"\n")
    i=0
    print("Suggested Albums for "+selected_album+" are:")
    for element in sorted_albums:
        album_sug_df.at[i, "Album"] = get_album_from_index(element[0])
        album_sug_df.at[i, "Artist"] = get_artist_from_index(element[0])
        album_sug_df.at[i, "Tags"] = get_tags_from_index(element[0])
        i=i+1
        if i>=10:
            break
            
    display(HTML(album_sug_df.to_html(index=False)))
except:
    print("Not a valid album name. Please check spelling.")

Input album: VESPERTINE. Tags: Art Pop, Electronic, sensual, romantic, winter, sexual, ethereal, atmospheric, lush, introspective, female vocals, soothing

Suggested Albums for VESPERTINE are:


Album,Artist,Tags
HOMOGENIC,Björk,"Art Pop, Electronic, cold, passionate, ethereal, lush, female vocals, atmospheric, introspective, anxious, winter, romantic"
POST,Björk,"Art Pop, Electronic, eclectic, playful, passionate, female vocals, futuristic, quirky, lush, melodic, atmospheric, abstract"
IMPOSSIBLE PRINCESS,Kylie Minogue,"Art Pop, Electronic, introspective, eclectic, female vocals, atmospheric, melancholic, energetic, rhythmic, sensual, melodic, mysterious"
THE SENSUAL WORLD,Kate Bush,"Art Pop, sensual, female vocals, passionate, lush, romantic, melodic, atmospheric, poetic, love, nocturnal"
救済の技法 (KYUUSAI NO GIHOU),平沢進 [Susumu Hirasawa],"Art Pop, Progressive Pop, Electronic, epic, dense, Wall of Sound, passionate, lush, atmospheric, ethereal, orchestral, futuristic, melodic"
DEBUT,Björk,"Art Pop, House, passionate, romantic, female vocals, playful, rhythmic, eclectic, love, party, sensual, lush"
BOYS AND GIRLS,Bryan Ferry,"Sophisti-Pop, Pop Rock, Art Pop, atmospheric, romantic, male vocals, nocturnal, lush, sensual, sexual, urban, love"
THE KICK INSIDE,Kate Bush,"Art Pop, romantic, poetic, love, female vocals, sensual, melodic, passionate, warm, lush, uplifting"
VULNICURA,Björk,"Art Pop, breakup, sombre, melancholic, female vocals, concept album, serious, sad, sentimental, introspective, lush"
PANG,Caroline Polachek,"Art Pop, atmospheric, female vocals, longing, ethereal, breakup, bittersweet, introspective, love, passionate, melodic"
