<b><style color = 'blue'>Content-Based Song Recommender System</b>
<h1 style="background-color:DodgerBlue;">Hello World</h1>

The content-based filtering method is based on the analysis of item features. It determines which features are most important for suggesting the songs. For example, if the user has liked a song in the past and its a pop song, then Recommender System will recommend the songs based on the same genre. As you use the system, it adapts and learns the user behavior and suggests the items based on the user's behavior.

In this code we use a Spotify dataset to discover similar songs for recommendation using cosine similarity and sigmoid kernel.

<p style="color:#E62117;"> Content-Based Song Recommender System</p> 

The content-based filtering method is based on the analysis of item features. It determines which features are most important for suggesting the items to the users. For example, if the user has liked a song in the past and the feature of that song is its theme, then Recommender System will recommend the similar songs based on the same theme. So the system adapts and learns the user behavior and suggests the items based on that behavior.
We will use the Spotify dataset to discover similar songs for recommendation, using cosine similarity.

In [32]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

from sklearn import preprocessing

In [25]:
# Our dataframe
df=pd.read_csv("./data/song.csv")

df.head(25)

Unnamed: 0.1,Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,target,song_title,artist
0,0,0.0102,0.833,204600,0.434,0.0219,2,0.165,-8.795,1,0.431,150.062,4.0,0.286,1,Mask Off,Future
1,1,0.199,0.743,326933,0.359,0.00611,1,0.137,-10.401,1,0.0794,160.083,4.0,0.588,1,Redbone,Childish Gambino
2,2,0.0344,0.838,185707,0.412,0.000234,2,0.159,-7.148,1,0.289,75.044,4.0,0.173,1,Xanny Family,Future
3,3,0.604,0.494,199413,0.338,0.51,5,0.0922,-15.236,1,0.0261,86.468,4.0,0.23,1,Master Of None,Beach House
4,4,0.18,0.678,392893,0.561,0.512,5,0.439,-11.648,0,0.0694,174.004,4.0,0.904,1,Parallel Lines,Junior Boys
5,5,0.00479,0.804,251333,0.56,0.0,8,0.164,-6.682,1,0.185,85.023,4.0,0.264,1,Sneakin’,Drake
6,6,0.0145,0.739,241400,0.472,7e-06,1,0.207,-11.204,1,0.156,80.03,4.0,0.308,1,Childs Play,Drake
7,7,0.0202,0.266,349667,0.348,0.664,10,0.16,-11.609,0,0.0371,144.154,4.0,0.393,1,Gyöngyhajú lány,Omega
8,8,0.0481,0.603,202853,0.944,0.0,11,0.342,-3.626,0,0.347,130.035,4.0,0.398,1,I've Seen Footage,Death Grips
9,9,0.00208,0.836,226840,0.603,0.0,7,0.571,-7.792,1,0.237,99.994,4.0,0.386,1,Digital Animal,Honey Claws


In this dataset, we have 15 columns:

1. Acosticness meausures from 0.0 to 1.0 and shows whether the song is acoustic.
2. Danceability describes how suitable a song is for dancing.
3. Duration_ms is the duration of the song track in milliseconds.
4. Energy represents a perceptual measure of intensity and activity.
5. Instrumentalness predicts whether a track contains vocals or not.
6. Loudness of a track in decibels(dB).
7. Liveness detects the presence of an audience in the recording.
8. Speechiness detects the presence of spoken words in a track
9. Time_signature is an estimated overall time signature of a track.
10. Key represents what key the track is in. Integers map to pitches using standard Pitch Class notation.
11. Valence measures from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
12. Target value describes the encoded value of 0 and 1. 0 means listener has not saved the song and 1 means listener have saved the song.
13. Tempo is in beats per minute (BPM).
14. Mode indicates the modality(major or minor) of the song.
15. Song_title is the name of the song.
16. Artist is the singer of the song.


In [10]:
# df.info()
# df.dtypes

In [12]:
#Feature Scaling
# Before building the model, first normalize or scale the dataset.
# For scaling we use a MinMaxScaler of Scikit-learn library.

feature_cols=['acousticness', 'danceability', 'duration_ms', 'energy',
              'instrumentalness', 'key', 'liveness', 'loudness', 'mode',
              'speechiness', 'tempo', 'time_signature', 'valence',]

scaler = MinMaxScaler()
normalized_df =scaler.fit_transform(df[feature_cols])

print(normalized_df[:2])

[[0.01024843 0.82482599 0.19073524 0.4263629  0.02243852 0.18181818
  0.15386234 0.74114059 1.         0.51444066 0.59603317 0.75
  0.26243209]
 [0.19999772 0.72041763 0.3144808  0.35008137 0.00626025 0.09090909
  0.12439486 0.69216224 1.         0.07100517 0.6544742  0.75
  0.57793565]]


<b>Building Recommender System using Cosine Similarity</b>

In this section, we want to build a content-based recommender system using similarity measures such as Cosine and Sigmoid Kernel.
The goal is to find the similarities among items(songs) features and pick the top 10 most similar songs and recommend them to the user.

Cosine similarity measures the cosine angle between two feature vectors. Its value implies that how two records are related to each other. Cosine similarity can be computed for the non-equal size of text documents.

In [31]:
# Create a pandas series with song titles as indices and indices as series values.
indices = pd.Series(df.index, index=df['song_title']).drop_duplicates()

# Create cosine similarity matrix based on given matrix
# Cosine similarity measures the cosine angle between two text vectors
cosine = cosine_similarity(normalized_df)

def generate_recommendation(song_title, model_type=cosine):
    """
    Purpose: Function for song recommendations 
    Inputs: song title and type of similarity model
    Output: Pandas series of recommended songs
    """
    # Get song indices
    index=indices[song_title]
    # Get list of songs for given songs
    score=list(enumerate(model_type[indices[song_title]]))
    # Sort the most similar songs
    similarity_score = sorted(score,key = lambda x:x[1],reverse = True)
    # Select the top-10 recommend songs
    similarity_score = similarity_score[1:11]
    top_songs_index = [i[0] for i in similarity_score]
    # Top 10 recommende songs
    top_songs=df['song_title'].iloc[top_songs_index]
    return top_songs

In [26]:
print("Recommended Songs:")
# Choose a song from the dataframe
print(generate_recommendation('Imma Ride',cosine).values)

Recommended Songs:
['International Players Anthem' 'Digital Animal'
 'Bubble Butt [Radio Mix] [feat. Bruno Mars, 2 Chainz, Tyga & Mystic]'
 'Blood On the Money' 'Osaka Loop Line' 'Shabba' 'My Little Secret'
 'Primetime (feat. Miguel)' 'Started From the Bottom' 'Sanctified']


In [29]:
print("Recommended Songs:")
# Choose a song from the dataframe
print(generate_recommendation('Bouncin',cosine).values)

Recommended Songs:
['Evil Friends (feat. Danny Brown) - Jake One Remix'
 'The Buzz (feat. Big K.R.I.T., Mataya & Young Tapz) - Bonus Track'
 'The Buzz (feat. Mataya & Young Tapz)' 'Take Over Control - Radio Edit'
 'Dirt And Grime' 'Plottin' 'Spill The Wine' "Somebody's Watching Me"
 'Stoner' 'Something About You']
