# Writing a Simple Recommender Algorithm for Tagging Music

In a previous notebook, we have scraped music tags from www.bensound.com and created a boolean based dataset with it. Most recently, we have created a correlation matrix with every unique tag and exported it as a csv file.

Now, we are going to use these correlations to build a program, that takes a tag as an input and recommend the three tags that have the highest correlation with the input tag.

## 1. Load the Data

In [44]:
import pandas as pd

In [71]:
directory = "C:/Users/maxhi/OneDrive/Uni & Work/Programming/Data Science/Music Tagging/Data"
filename = "music_tags_corr_matrix.csv"

In [27]:
music_tags = pd.read_csv("{directory}/{filename}".format(directory = directory,
                                                        filename = filename),
                        index_col = "index")

In [28]:
music_tags.head()

Unnamed: 0_level_0,ukulele,happy,funny,advertising,upbeat,kid,kids,positive,chidren,joy,...,shangai,koto,guzheng,erhu,dizi,voice,sfx,discover,geek,holiday
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ukulele,1.0,0.186441,0.153846,0.086538,0.086957,0.2,0.321429,0.11236,0.083333,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333
happy,0.186441,1.0,0.245902,0.37069,0.391304,0.103448,0.40678,0.464646,0.017241,0.266667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241
funny,0.153846,0.245902,1.0,0.101852,0.142857,0.2,0.387097,0.09375,0.055556,0.090909,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
advertising,0.086538,0.37069,0.101852,1.0,0.177966,0.038835,0.188679,0.46875,0.009901,0.133333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
upbeat,0.086957,0.391304,0.142857,0.177966,1.0,0.073171,0.26,0.315789,0.026316,0.272727,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316


## 2. Algorithm

First, we need a function that takes a tag as an input and returns a df with all correlations and the respective tags

In [29]:
def find_correlations(df, tag):
    
    # Setup empty list
    correlations = []
    columns = []

    # Loop through all column at the row with the tag as its index
    for i, corr in enumerate(df.loc[tag,:]):
        
        # Find the column
        col = df.columns[i]
        
        # Append the correlation to the list
        correlations.append(corr)
        columns.append(col)
    
    # Create a df out of the lists
    results_df = pd.DataFrame({"tag" : columns,
                              "correlation" : correlations})
    
    return results_df

Now, let's use a support function to find the highest n correlations

In [30]:
def find_highest_correlations(corr_df, num_of_values):
    
    # Sort the input df
    corr_df_sorted = corr_df.sort_values(by = ["correlation"], ascending = False)
    
    # Extract the relevant correlations
    corr_df_sliced = corr_df_sorted.iloc[1:num_of_values+1]
    
    return corr_df_sliced
    

Nice! Let's combine the two functions above to get the n highest correlations for a tag we give as an input.

In [31]:
def get_recommendations(df, tag, num_of_recommendations):
    
    corr_df = find_correlations(df, tag)
    
    recommendations_df = find_highest_correlations(corr_df, num_of_recommendations)
    
    print("Recommendations:", list(recommendations_df["tag"]))

Let's see what 3 tags our model recommends for "happy", "sad", and "energetic".

In [45]:
get_recommendations(music_tags, "happy", 3)

Recommendations: ['fun', 'feel good', 'positive']


In [46]:
get_recommendations(music_tags, "sad", 3)

Recommendations: ['moving', 'touching', 'melancholic']


In [47]:
get_recommendations(music_tags, "energetic", 3)

Recommendations: ['energy', 'sport', 'electric']


In [48]:
get_recommendations(music_tags, "agressive", 3)

Recommendations: ['heavy', 'hard', 'extreme']


In [67]:
get_recommendations(music_tags, "anxiety", 3)

Recommendations: ['horror', 'anxiety', 'stress']


In [68]:
import random

In [69]:
ran = random.choice(list(music_tags.columns))
print(ran)

hopeful


In [70]:
get_recommendations(music_tags, ran, 3)

Recommendations: ['inspirational', 'moving', 'slideshow']


These are some great results! The next step for will be to put this notebook into a .py script where you can intuitively give inputs and receive recommendations. A more ambitious goal is to get this model running as a web application.