# Spotify Music Recommendation System

In this notebook we are going to use spotify's API fetch the data of a playlist and build the music recommendation system


#### Steps to fetch the data

##### 1. Get the Client ID and Client Secret
1. In order to get started with the spotify API, first we need to create an a/c on spotify, if you have one, there's no need.
2. After creating the spotify a/c go to the [developer dashboard](https://developer.spotify.com/dashboard), enter details in respective fields and create an app.
3. After creating the app copy the Client ID and Client Secret, we need these credentials to fetch the data


#### 2. Getting access token
Now, after getting the credentials we will use these credentials to fetch the access token by making a POST request to this URL (https://accounts.spotify.com/api/token)

In [1]:
import requests
import base64

In [2]:
CLIENT_ID = '<your client id>'
CLIENT_SECRET = '<your client secret>'

In [3]:
# Base64 encode the client id and client secret
client_credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
client_credentials_base64 = base64.b64encode(client_credentials.encode())

In [4]:
# Request the access token
token_url = 'https://accounts.spotify.com/api/token'
headers = {
    'Authorization' : f'Basic {client_credentials_base64.decode()}'
}

data = {
    'grant_type' : 'client_credentials'
}

response = requests.post(token_url, data=data, headers=headers)

if response.status_code == 200:
    access_token = response.json()['access_token']
    print("Access token obtained")
else:
    print("Error while obtainint access token")
    exit()

Access token obtained


#### 4. Use the Spotipy library to fetch the playlist data

Now, I will write a function to fetch the music data of a given playlist of spotify. To accomplish this task we need to use **[Spotipy](https://spotipy.readthedocs.io/en/2.22.1/)** library which provides access to spotify's web API.

This function will fetch the details that we want to use for recommendation and will return a pandas dataframe.

In [5]:
# Import necessary library to fetch the data and create a dataframe
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyOAuth

In [6]:
# A function to fetch the playlist data and return a dataframe
def get_playlist_data(playlist_id, access_token):
    '''Function to fetch the playlist data'''
    # Setting up the spotipy lib with access token
    sp = spotipy.Spotify(auth=access_token)
    
    # Get the tracks from the playlist
    playlist_tracks = sp.playlist_tracks(playlist_id, fields='items(track(id, name, artists, album(id, name)))')
    
    # Extracting relavent information and store in a list of dictionaries
    music_data = []
    for track_info in playlist_tracks['items']:
        track = track_info['track']
        track_name = track['name']
        artists = ', '.join([artist['name'] for artist in track['artists']])
        album_name = track['album']['name']
        album_id = track['album']['id']
        track_id = track['id']
        
        # Get audio features for the track
        audio_features = sp.audio_features(track_id)[0] if track_id != 'Not available' else None
        
        # Get release date of the album
        try:
            album_info = sp.album(album_id) if album_id != 'Not available' else None
            release_date = album_info['release_date'] if album_info else None
        except:
            release_date = None
            
        # Get popularity of the track
        try:
            track_info = sp.track(track_id) if track_id != 'Not available' else None
            popularity = track_info['popularity'] if track_info else None
        except:
            popularity = None
            
        # Add additional track information to the track data
        track_data = {
            'Track Name': track_name,
            'Artists': artists,
            'Album Name': album_name,
            'Album ID': album_id,
            'Track ID': track_id,
            'Popularity': popularity,
            'Release Date': release_date,
            'Duration (ms)': audio_features['duration_ms'] if audio_features else None,
            'Explicit': track_info.get('explicit', None),
            'External URLs': track_info.get('external_urls', {}).get('spotify', None),
            'Danceability': audio_features['danceability'] if audio_features else None,
            'Energy': audio_features['energy'] if audio_features else None,
            'Key': audio_features['key'] if audio_features else None,
            'Loudness': audio_features['loudness'] if audio_features else None,
            'Mode': audio_features['mode'] if audio_features else None,
            'Speechiness': audio_features['speechiness'] if audio_features else None,
            'Acousticness': audio_features['acousticness'] if audio_features else None,
            'Instrumentalness': audio_features['instrumentalness'] if audio_features else None,
            'Liveness': audio_features['liveness'] if audio_features else None,
            'Valence': audio_features['valence'] if audio_features else None,
            'Tempo': audio_features['tempo'] if audio_features else None,
        }
        
        music_data.append(track_data)
    
    df = pd.DataFrame(music_data)
    return df

#### Now we can use this function to fetch the data of a playlist. This function takes two arguments 1st one is playlist ID and the second one is access token.

**PlayList ID**: it is the "37i9dQZF1DXbVhgADFy3im" in the playlist URL https://open.spotify.com/playlist/37i9dQZF1DXbVhgADFy3im <br>
__Access token__: it is the token which we fetched earlier

In [7]:
playlist_id = '37i9dQZF1DXbVhgADFy3im'

# Calling the function to get the music data from the playlist
music_df = get_playlist_data(playlist_id, access_token)
music_df

Unnamed: 0,Track Name,Artists,Album Name,Album ID,Track ID,Popularity,Release Date,Duration (ms),Explicit,External URLs,...,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo
0,Cheques,Shubh,Still Rollin,5AivaZj0CiQJoDWqVH2pbh,4eBvRhTJ2AcxCsbfTUjoRp,88,2023-05-19,183757,False,https://open.spotify.com/track/4eBvRhTJ2AcxCsb...,...,0.627,4,-8.939,0,0.0533,0.2620,0.000012,0.2690,0.356,89.998
1,Mahiye Jinna Sohna,Darshan Raval,Mahiye Jinna Sohna,4fiPkVR8M247hQBOYLkwBq,2ncqKdTj6dz7tWoTMMrAtq,85,2023-06-22,181250,False,https://open.spotify.com/track/2ncqKdTj6dz7tWo...,...,0.540,1,-5.754,1,0.0406,0.7360,0.000032,0.1640,0.331,92.027
2,True Stories,"AP Dhillon, Shinda Kahlon",True Stories,7JABHOzxcWGwmihIUq13dl,28lvraTNIN8qiTpoIK7m8Z,82,2023-06-16,117973,False,https://open.spotify.com/track/28lvraTNIN8qiTp...,...,0.658,1,-8.906,1,0.2680,0.2990,0.000074,0.2090,0.443,147.936
3,"Janiye (from the Netflix Film ""Chor Nikal Ke B...","Vishal Mishra, Rashmeet Kaur","Janiye (from the Netflix Film ""Chor Nikal Ke B...",0kZKLq2WZQWvXvbxvK6YoC,0645eBDehHcqfiF15hscQV,84,2023-03-17,223390,False,https://open.spotify.com/track/0645eBDehHcqfiF...,...,0.444,8,-11.447,0,0.0644,0.4400,0.000000,0.1460,0.360,76.032
4,Heeriye (feat. Arijit Singh),"Jasleen Royal, Arijit Singh, Dulquer Salmaan",Heeriye (feat. Arijit Singh),1wt2WZBZZ9GhM0AC61l7SS,5PUXKVVVQ74C3gl5vKy9Li,88,2023-07-25,194857,False,https://open.spotify.com/track/5PUXKVVVQ74C3gl...,...,0.494,6,-9.628,0,0.0256,0.5160,0.003840,0.0759,0.558,105.007
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,Kali Kali Zulfon Ke (Lo-Fi),"Madhur Sharma, Swapnil Tare, Nusrat Fateh Ali ...",Kali Kali Zulfon Ke (Lo-Fi),1zLlgNA1tyiabDhDwWhyPG,3mUkVsovIuscQlSoMhmAZt,76,2023-03-31,77253,False,https://open.spotify.com/track/3mUkVsovIuscQlS...,...,0.463,7,-7.075,1,0.0375,0.8160,0.000000,0.1550,0.150,152.353
71,Yaar Ka Sataya Hua Hai,"B Praak, Jaani, Nawazuddin Siddiqui",Yaar Ka Sataya Hua Hai,2AqA5Q47R0kLjUb3tjX63T,2Y2l0h051Vk4qUG2ZH7KKy,75,2023-07-03,267692,False,https://open.spotify.com/track/2Y2l0h051Vk4qUG...,...,0.734,6,-4.252,0,0.0449,0.2710,0.000000,0.0788,0.523,130.014
72,Into Your Arms (feat. Ava Max),"Witt Lowry, Ava Max",Into Your Arms (feat. Ava Max),0boA7GUGSxOMWwdnnTR0yI,0b11D9D0hMOYCIMN3OKreM,75,2018-06-08,186022,True,https://open.spotify.com/track/0b11D9D0hMOYCIM...,...,0.788,3,-4.113,0,0.2150,0.0134,0.000000,0.3630,0.228,170.039
73,Moon Rise,"Guru Randhawa, Sanjoy",Man Of The Moon,0jasm0jnhQ6Y6OUYTI1NL6,3oWv5qDKYN7MH6FdlglMN5,75,2022-08-22,174057,False,https://open.spotify.com/track/3oWv5qDKYN7MH6F...,...,0.713,3,-6.910,0,0.1220,0.3650,0.000153,0.2280,0.597,92.039


#### As we can see we have our data frame ready, it has 75 rows and 21 columns.

####  Now let's check if the the data is any null values or not

In [8]:
music_df.isnull().sum()

Track Name          0
Artists             0
Album Name          0
Album ID            0
Track ID            0
Popularity          0
Release Date        0
Duration (ms)       0
Explicit            0
External URLs       0
Danceability        0
Energy              0
Key                 0
Loudness            0
Mode                0
Speechiness         0
Acousticness        0
Instrumentalness    0
Liveness            0
Valence             0
Tempo               0
dtype: int64

#### Moving further, let's build a recommendation system.

In [9]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from datetime import datetime

data = music_df

#### While providing music recommendations to users, it is important to recommend the latest releases. For this, we need to give more weight to the latest releases in the recommendations.

Let's write a function to assign weights to a song
This function will takes release date of a music track as an input and returns the weight which is calculated by following method:<br>
The weight is calculated based on the time span. The idea is that as the time span increases, the weight decreases.<br>
This is achieved by following formula **1 / (time_span+1)** ➡ adding 1 to avoid divide by 0 error. <br>
So, the most recent releases will have higer a weight, while older ones will have a lower weight.

In [10]:
# Function to calculate weighted popularity score based on release date
def calculate_weighted_popularity(release_date):
    # convert the release date to date time object
    release_date = datetime.strptime(release_date, '%Y-%m-%d')
    
    # Calculating the time span between release date and today's date
    time_span = datetime.now() - release_date
    
    # Calculate the weighted popularity score based on time span (e.g., more recent releases have higher weight)
    weight = 1/(time_span.days+1) # Adding 1 to avoid divide by zero error
    return weight

#### Now let's normalize the music features using Min-Max scaling method.

In [11]:
# Normalize the music features using Min-Max scaling
scaler = MinMaxScaler()
music_features = music_df[['Danceability', 'Energy', 'Key', 'Loudness', 'Mode', 'Speechiness', 'Acousticness', 
                          'Instrumentalness', 'Liveness', 'Valence', 'Tempo']].values
music_features_scaled = scaler.fit_transform(music_features)

#### We will create a hybrid recommendation system for music recommendations.  
#### The 1st approach will be based on recommending music based on music audio features.
#### The 2nd approach will be based on recommending music based on recommending music based on weighted property.

Let's write a function to generate music recommendations using 1st approach i.e based on the music audio features <br>

To generate the recommendations we will use following features:<br>
1. 'Danceability'
2. 'Energy'
3. 'Key'
4. 'Loudness'
5. 'Mode'
6. 'Speechiness'
7. 'Acousticness' 
8. 'Instrumentalness'
9. 'Liveness'
10. 'Valence'
11. 'Tempo' 
<br><br><br>This function will take two arguments one is the *song name* and the other is *no. of recommendations* to generate. The function works as follows: <br>

The function calculates the similarity scores between the audio features of the input song and all other songs in the dataset. It uses __Cosine Similarity__, a common measure used in content-based filtering. The _cosine_similarity_ function from scikit-learn is employed to compute these similarity scores.


In [12]:
# a function to get content based recommendations based on the usic features
def content_based_recommendations(input_song_name, num_recommendations=5):
    if input_song_name not in music_df['Track Name'].values:
        print(f"'{input_song_name}' not found in the dataset. Please enter a valid song name.")
        return
    
    # Get the index of the input song in the music DataFrame
    input_song_index = music_df[music_df['Track Name'] == input_song_name].index[0]
    
    # Calculate the similarity scores based on music features (cosine similarity)
    similarity_scores = cosine_similarity([music_features_scaled[input_song_index]], music_features_scaled)
    
    # Get the indices of the most similar songs
    similar_song_indices = similarity_scores.argsort()[0][::-1][1:num_recommendations + 1]
    
    # Get the names of the most similar songs based on content-based filtering
    content_based_recommendations = music_df.iloc[similar_song_indices][['Track Name', 'Artists', 'Album Name', 'Release Date', 'Popularity']]
    
    return content_based_recommendations

#### Now, let's write a function to generate recommendations based on weighted popularity and combine it with the recommendations of the content-based filtering method using the hybrid approach

In [13]:
# A function to get hybrid recommendations based on weighted popularity

def hybrid_recommendations(input_song_name, num_recommendations=5, alpha=0.5):
    if input_song_name not in music_df['Track Name'].values:
        print(f"'{input_song_name}' not found in the dataset. Please enter a valid song name.")
        return
    
    # Get content-based recommendations
    content_based_rec = content_based_recommendations(input_song_name, num_recommendations)
    
    # Get the popularity score of the input song
    popularity_score = music_df.loc[music_df['Track Name'] == input_song_name, 'Popularity'].values[0]
    
    # Calculate the weighted popularity score
    weighted_popularity_score = popularity_score * calculate_weighted_popularity(music_df.loc[music_df['Track Name'] == input_song_name, 'Release Date'].values[0])

    # Combine content-based and popularity-based recommendations based on weighted popularity
    hybrid_recommendations = content_based_rec
    hybrid_recommendations = hybrid_recommendations.append({
        'Track Name': input_song_name,
        'Artists': music_df.loc[music_df['Track Name'] == input_song_name, 'Artists'].values[0],
        'Album Name': music_df.loc[music_df['Track Name'] == input_song_name, 'Album Name'].values[0],
        'Release Date': music_df.loc[music_df['Track Name'] == input_song_name, 'Release Date'].values[0],
        'Popularity': weighted_popularity_score
    }, ignore_index=True)
    
    # Sort the hybrid recommendations based on weighted popularity score
    hybrid_recommendations = hybrid_recommendations.sort_values(by='Popularity', ascending=False)
    
    # Remove the input song from the recommendations
    hybrid_recommendations = hybrid_recommendations[hybrid_recommendations['Track Name'] != input_song_name]
    
    return hybrid_recommendations

The hybrid approach aims to provide more personalized and relevant recommendations by considering both the content similarity of songs and their weighted popularity. The function takes input_song_name as the input, representing the name of the song for which recommendations are to be generated. The function first calls the content_based_recommendations function to get content-based recommendations for the input song. The num_recommendations parameter determines the number of content-based recommendations to be retrieved.

The function calculates the popularity score of the input song by retrieving the popularity value from the music_df DataFrame. It also calculates the weighted popularity score using the calculate_weighted_popularity function (previously defined) based on the release date of the input song. The alpha parameter controls the relative importance of content-based and popularity-based recommendations.

The content-based recommendations obtained earlier are stored in the content_based_rec DataFrame. The function combines the content-based recommendations with the input song’s information (track name, artists, album name, release date, and popularity) and its weighted popularity score. This step creates a DataFrame named hybrid_recommendations that includes both the content-based recommendations and the input song’s data.

The hybrid_recommendations DataFrame is then sorted in descending order based on the weighted popularity score. This step ensures that the most popular and relevant songs appear at the top of the recommendations. The input song is then removed from the recommendations to avoid suggesting the same song as part of the recommendations.

#### Now, let's test our recommendatio system.

In [14]:
# Testing

input_song = 'True Stories'
recommendations = hybrid_recommendations(input_song, num_recommendations=4)

print(f"Hybrid recommended songs for '{input_song}':")
print(recommendations)

Hybrid recommended songs for 'True Stories':
                                   Track Name  \
3                   Unholy (feat. Kim Petras)   
2  Hukum - Thalaivar Alappara (From "Jailer")   
0                                      Baller   
1                                      Chorni   

                           Artists  \
3            Sam Smith, Kim Petras   
2  Anirudh Ravichander, Super Subu   
0                      Shubh, Ikky   
1         DIVINE, Sidhu Moose Wala   

                                   Album Name Release Date  Popularity  
3                   Unholy (feat. Kim Petras)   2022-09-22        87.0  
2  Hukum - Thalaivar Alappara (From "Jailer")   2023-07-17        85.0  
0                                      Baller   2022-09-09        81.0  
1                                      Chorni   2023-07-07        79.0  


  hybrid_recommendations = hybrid_recommendations.append({


### Credits:
[Aman Kharwal's Music Recommendation System using Python](https://thecleverprogrammer.com/2023/07/31/music-recommendation-system-using-python/)