This Notebook recommends other genres to the user based on a specified genre

First create a DataFrame based on the csv file which contains information about genres

In [1]:
import pandas as pd 
genre_DF = pd.read_csv("data_by_genres.csv")

In [2]:
import math

First for some pre-processing of the data frame

In [3]:
#get abs val of loudness column as all loudness values are negative 
genre_DF['loudness'] = genre_DF['loudness'].abs()
#remove the empty genre
genre_DF = genre_DF[genre_DF['genres'] != ('[]' or ' ' or '')]
#keep track of original genres
genre_df = genre_DF.copy(deep=True)
#now make genres insensitive to spaces for user input
genre_DF['genres'] = genre_DF['genres'].str.replace(' ', '')
# Restrict DF to desired characteristics for comparison
columns = ['acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'valence', 'popularity']
c_df = genre_DF[columns] 
#normalise columns using min-max normalisation
genre_norm = (c_df-c_df.min())/(c_df.max()-c_df.min())
genre_norm.fillna(0, inplace=True) #automatically set na vals to 0

Define a function that returns the normalised vector for the given genre

In [4]:
def normal_genre(genre):
    given_g_norm = genre_norm[genre_DF["genres"] == genre]
    return given_g_norm

Define simple euclidean distance function to calculate the distance regarding all characteristics

In [5]:
def euclidean_distance(row):
    given_g_norm = normal_genre(genre)
    v = 0
    for c in columns:
        v += (row[c] - given_g_norm[c]) ** 2
    return math.sqrt(v)

Define function that sorts genres by similarity to input genre

In [6]:
def similarity():
# Find the similarity index for each other genre
    genre_n_similarity = genre_norm.apply(euclidean_distance, axis=1)
#Sort by ascending value: 0 represents exact similarity
    genre_n_similarity = genre_n_similarity.sort_values(ascending= True)
    return genre_n_similarity

Get the index values of most similar genres with length equal to a given number

In [7]:
def index_list(no_given):
    indexlist = []
    sim_byindex = genre_n_similarity[1:no_given+1].index
    for i in range(0, no_given):
        indexlist.append(int(sim_byindex[i]))
    return indexlist

Function to take user input as no. of genres to recommend

In [8]:
def select_no_genres():
    f1 = input("Please select the number of genres you would like: ")
    while True:
        if not f1.isnumeric():
            f1 = input("Please enter a positive numeric value: ")
        else:
            break
    f1 = int(f1)        
    return f1

Function to take user input as selected genre

In [9]:
def select_genre():
    genre_in = input("Please enter a genre to receive recommendations: ")
    genre_in = genre_in.replace(" ", "").lower()
    while True:
        if genre_in not in genre_DF['genres'].tolist():
            genre_in = input("Please enter a valid genre: ")
            genre_in = genre_in.replace(" ", "").lower()
        else:
            break   
    return genre_in

Here is where the user inputs desired genre and no. to receive recommendations

In [11]:
genre = select_genre()
no_given = select_no_genres()

genre_n_similarity = similarity()
indexlist = index_list(no_given)
print('Here are ' + str(no_given) + ' similar genres you might like: ' + str(genre_df.loc[indexlist]['genres'].tolist()))

Please enter a genre to receive recommendations: Art pop
Please select the number of genres you would like: 12
Here are 12 similar genres you might like: ['experimental pop', 'alternative americana', 'freak folk', 'south carolina indie', 'etherpop', 'bubblegrunge', 'gothenburg indie', 'idol', 'indie pop', 'stomp and holler', 'olympia wa indie', 'lo-fi emo']
