In [17]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)

In [18]:
client_id = "09e4eae8c15f432f81aaad305c1a3907"
client_secret = "03e6751957224d94b8fb609f5b6c9438"
uri = "http://localhost/"

import spotipy
from spotipy.oauth2 import SpotifyOAuth

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=client_id,
                                               client_secret=client_secret,
                                               redirect_uri=uri,
                                               scope="user-library-read"))

results = sp.current_user_saved_tracks()


Using `localhost` as redirect URI without a port. Specify a port (e.g. `localhost:8080`) to allow automatic retrieval of authentication code instead of having to copy and paste the URL your browser is redirected to.


Enter the URL you were redirected to: й


SpotifyOauthError: error: invalid_request, error_description: code must be supplied

# My own tailored playlist recommendation in Spotify

*Author: Antonina Anastasova @aanastasova*

---

## Table of Contents

- [Project Overview](#ch1)
    - [Introduction]()
    - [Building a content-based music recommendation system](#ch12)
    - [Content-based filtering](#ch13)
    - [Motivation](#ch13) # better user experience
    
- [Explaratory Data Analysis of the data (EDA)](#ch2) # better to do more EDA, lecturer likes plots and stuff
    - [Data Familiarization](#ch21)
    - [Distribution Analysis](#ch22)
    - [Feature Correlation](#ch23)
    - [Cold Start Analysis?](#ch24)

- [Content-based filtering and Dynamic Segmentation](#ch3)
    - [Exploration of content-based filtering](#ch31)
    - [Dynamic segmentation - personal touch into my playlist](#ch32)
   
- [User Profile](#ch4)
    - [Explore Spotify data using the Spotify API- listening history, playlists, etc.](#ch41)
    - [Uncover patterns and user interactions](#ch42) # musical identity

- [Feature Engineering and Vector Extraction](#ch5)
    - [Extract audio features, metadata](#ch51)
    - [Create feature vectors for each track](#ch52)

- [Extract recommendations using Cosine Similarity Calculation]()
    - [Apply cosine similarity between personal songs data and data for recommendation]()
    - [Adjust similarities scores to cater user segments]()
    - [Identidy user-specific preferences]()

- [Playlist creation]()
    - [Create playlists using the Spotify Web API]()
    - [Add recommended playlists]()

- [Evaluation and enhacement]() # look into this

- [Conclusion]()
    
 
- [Bibliography](#bibi)

---

## Project Overview

### Introduction

This project aims to create content-based music recommendation system. The main goal is to offer personalized music suggestions that closely match individual preferences. This aspiration arises from the need to enhance user satisfaction by providing recommendations that truly resonate.

An important step involves conducting a detailed analysis of the dataset, known as Exploratory Data Analysis (EDA). This process involves closely examining how data is spread out, how different pieces of data relate to each other, and patterns in how user interacts with music. This is done using both graphs and analytical methods.

After that, we focus on two key things: content-based filtering and dynamic segmentation. Content-based filtering means using various characteristics of songs to find similarities between them. Dynamic segmentation, on the other hand, adds a personal touch to creating playlists that align with individual music preferences.

In order to better define user segments we go into understanding my user profile using the Spotify API. This involves looking at personal history, own playlists, likes. The idea is to get insights into my personal music style and preferences, and patterns.

For the purpose to create recommendation playlist using content-based filtering, we are creating feature vectors, which are like codes that capture different aspects of songs. These vectors help us compare and find similarities between songs.

The next step involves using mathematical calculations, specifically the Cosine Similarity Calculation, to find out how similar your favorite songs are to other recommended songs. We adjust these calculations based on different user groups we have discerned.

The final result of this project is creating playlists. Using the Spotify Web API, we craft personalized playlists filled with recommended songs that match my musical taste.

We could evaluate and refine our recommendations to ensure their accuracy. Through ongoing testing, we analyze how well the suggested tracks align with my preferences. User feedback is also valued in this process. 

In essence, this project encompasses a blend of technology, analysis, and musical passion, all aimed at crafting an outstanding music recommendation experience. 

## Plan for project:

Sources:

https://towardsdatascience.com/part-iii-building-a-song-recommendation-system-with-spotify-cf76b52705e7 - use the recommendation system flow 

15-08

- Write intro
- See data
- Have a detailed plan for analysis and steps of project table of content

Maybe use this data ->
https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks

Notes:

- Cannot do CF (collaborative filtering) as I can't access other user's data due to privacy and ethics compliance
- Find more sources for content-based filtering - PUT HERE
 https://www.kaggle.com/code/prathamsharma123/spotify-eda-recommendation-system
- Metric to evaluate model
- Vector 


16-08

- EDA and Clustering (why do we do that, to determine important features in order to build recommendation system)
- Do one homework and quiz
- EDA of data


## EDA

- Feature Correlation use YellowBrick- this way we have a targer variable to see correlation

Metadata for the df - https://www.kaggle.com/code/prathamsharma123/spotify-eda-recommendation-system/notebook

In [6]:
## check data

#### Tracks data

In [7]:
tracks = pd.read_csv("tracks.csv")

In [8]:
tracks

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,35iwgR4jXetI318WEWsa1Q,Carve,6,126903,0,['Uli'],['45tIt06XoI0Iio4LBEVpls'],1922-02-22,0.645,0.445,0,-13.338,1,0.4510,0.674,0.744000,0.151,0.127,104.851,3
1,021ht4sdgPcrDgSk7JTbKY,Capítulo 2.16 - Banquero Anarquista,0,98200,0,['Fernando Pessoa'],['14jtPCOoNZwquk5wd9DxrY'],1922-06-01,0.695,0.263,0,-22.136,1,0.9570,0.797,0.000000,0.148,0.655,102.009,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
586670,45XJsGpFTyzbzeWK8VzR8S,A Day At A Time,58,142003,0,"['Gentle Bones', 'Clara Benin']","['4jGPdu95icCKVF31CcFKbS', '5ebPSE9YI5aLeZ1Z2g...",2021-03-05,0.696,0.615,10,-6.212,1,0.0345,0.206,0.000003,0.305,0.438,90.029,4
586671,5Ocn6dZ3BJFPWh4ylwFXtn,Mar de Emociones,38,214360,0,['Afrosound'],['0i4Qda0k4nf7jnNHmSNpYv'],2015-07-01,0.686,0.723,6,-7.067,1,0.0363,0.105,0.000000,0.264,0.975,112.204,4


In [9]:
tracks.describe()

Unnamed: 0,popularity,duration_ms,explicit,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
count,586672.000000,5.866720e+05,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000,586672.000000
mean,27.570053,2.300512e+05,0.044086,0.563594,0.542036,5.221603,-10.206067,0.658797,0.104864,0.449863,0.113451,0.213935,0.552292,118.464857,3.873382
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75%,41.000000,2.638670e+05,0.000000,0.686000,0.748000,8.000000,-6.482000,1.000000,0.076300,0.785000,0.009550,0.278000,0.769000,136.321000,4.000000
max,100.000000,5.621218e+06,1.000000,0.991000,1.000000,11.000000,5.376000,1.000000,0.971000,0.996000,1.000000,1.000000,1.000000,246.381000,5.000000


In [38]:
tracks1 = tracks[1:20]

In [39]:
tracks1

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
1,021ht4sdgPcrDgSk7JTbKY,Capítulo 2.16 - Banquero Anarquista,0,98200,0,['Fernando Pessoa'],['14jtPCOoNZwquk5wd9DxrY'],1922-06-01,0.695,0.263,0,-22.136,1,0.957,0.797,0.0,0.148,0.655,102.009,1
2,07A5yehtSnoedViJAZkNnc,Vivo para Quererte - Remasterizado,0,181640,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.434,0.177,1,-21.18,1,0.0512,0.994,0.0218,0.212,0.457,130.418,5
3,08FmqUhxtyLTn6pAh6bk45,El Prisionero - Remasterizado,0,176907,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.321,0.0946,7,-27.961,1,0.0504,0.995,0.918,0.104,0.397,169.98,3
4,08y9GfoqCWfOGsKdwojr5e,Lady of the Evening,0,163080,0,['Dick Haymes'],['3BiJGZsyX9sJchTqcSA7Su'],1922,0.402,0.158,3,-16.9,0,0.039,0.989,0.13,0.311,0.196,103.22,4
5,0BRXJHRNGQ3W4v9frnSfhu,Ave Maria,0,178933,0,['Dick Haymes'],['3BiJGZsyX9sJchTqcSA7Su'],1922,0.227,0.261,5,-12.343,1,0.0382,0.994,0.247,0.0977,0.0539,118.891,4
6,0Dd9ImXtAtGwsmsAD69KZT,La Butte Rouge,0,134467,0,['Francis Marty'],['2nuMRGzeJ5jJEKlfS7rZ0W'],1922,0.51,0.355,4,-12.833,1,0.124,0.965,0.0,0.155,0.727,85.754,5
7,0IA0Hju8CAgYfV1hwhidBH,La Java,0,161427,0,['Mistinguett'],['4AxgXfD7ISvJSTObqm4aIE'],1922,0.563,0.184,4,-13.757,1,0.0512,0.993,1.6e-05,0.325,0.654,133.088,3
8,0IgI1UCz84pYeVetnl1lGP,Old Fashioned Girl,0,310073,0,['Greg Fieler'],['5nWlsH5RDgFuRAiDeOFVmf'],1922,0.488,0.475,0,-16.222,0,0.0399,0.62,0.00645,0.107,0.544,139.952,4
9,0JV4iqw2lSKJaHBQZ0e5zK,Martín Fierro - Remasterizado,0,181173,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-29,0.548,0.0391,6,-23.228,1,0.153,0.996,0.933,0.148,0.612,75.595,3
10,0OYGe21oScKJfanLyM7daU,Capítulo 2.8 - Banquero Anarquista,0,99100,0,['Fernando Pessoa'],['14jtPCOoNZwquk5wd9DxrY'],1922-06-01,0.676,0.235,11,-22.447,0,0.96,0.794,0.0,0.21,0.724,96.777,3


In [40]:
tracks.dtypes

id                   object
name                 object
popularity            int64
duration_ms           int64
explicit              int64
artists              object
id_artists           object
release_date         object
danceability        float64
energy              float64
key                   int64
loudness            float64
mode                  int64
speechiness         float64
acousticness        float64
instrumentalness    float64
liveness            float64
valence             float64
tempo               float64
time_signature        int64
dtype: object

In [41]:
print(tracks1.info())

TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

In [20]:
# Assuming 'tracks' is your DataFrame
missing_values = tracks.isna().sum()

# Display columns with missing values

missing_values

Columns with missing values:
name    71
dtype: int64


id                   0
name                71
popularity           0
duration_ms          0
explicit             0
artists              0
id_artists           0
release_date         0
danceability         0
energy               0
key                  0
loudness             0
mode                 0
speechiness          0
acousticness         0
instrumentalness     0
liveness             0
valence              0
tempo                0
time_signature       0
dtype: int64

In [36]:
rows_with_missing_values = tracks[tracks.isna().any(axis=1)]
rows_with_missing_values

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
226336,4iH7negBYMfj2z0wDNmgdx,,28,264973,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1994-01-01,0.512,0.578,0,-12.280,0,0.0299,0.0433,0.000064,0.5160,0.692,156.465,1
510975,04d5kbLvSAIBt3pGcljdhC,,0,184293,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1922-04-01,0.426,0.285,11,-11.970,1,0.0466,0.9950,0.264000,0.2930,0.583,135.661,4
510976,05tRkgyxVdwMePGqOXMDYU,,0,191587,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1922-04-01,0.344,0.186,0,-13.495,1,0.0745,0.9950,0.000000,0.1150,0.290,79.591,1
510978,0YAMRgAQH6tkTh4sWNXr8L,,0,191573,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1922-04-01,0.316,0.257,3,-13.611,0,0.0549,0.9950,0.769000,0.5190,0.529,68.682,3
510979,1K6MQQxmFpPb66ZnaiIpHX,,0,167602,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1922-04-01,0.558,0.283,1,-12.847,1,0.0646,0.9960,0.000000,0.4530,0.608,70.379,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
517206,6OH9mz9aFbGlbf74cBwYWD,,2,209760,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1962-02-01,0.506,0.598,7,-4.672,0,0.0635,0.7710,0.000001,0.6910,0.800,91.172,4
517215,15RqFDA86slfzujSQMEX4i,,2,257280,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1962-02-01,0.612,0.615,5,-5.609,1,0.0551,0.8540,0.000023,0.0541,0.809,90.536,4
520127,0hKA9A2JPtFdg0fiMhyjQD,,6,194081,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1974-12-31,0.471,0.369,4,-12.927,0,0.1460,0.9680,0.001100,0.1410,0.766,94.063,4
525238,1kR4gIb7nGxHPI3D2ifs59,,26,289440,0,[''],['0LyfQWJT6nXafLPZqxe9Of'],1998-01-05,0.501,0.583,7,-9.460,0,0.0605,0.6900,0.003960,0.0747,0.734,138.391,4


In [42]:
tracks["release_date"] = pd.to_datetime(tracks["release_date"], format='%Y-%m-%d')

In [44]:
tracks.head(4)

Unnamed: 0,id,name,popularity,duration_ms,explicit,artists,id_artists,release_date,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,35iwgR4jXetI318WEWsa1Q,Carve,6,126903,0,['Uli'],['45tIt06XoI0Iio4LBEVpls'],1922-02-22,0.645,0.445,0,-13.338,1,0.451,0.674,0.744,0.151,0.127,104.851,3
1,021ht4sdgPcrDgSk7JTbKY,Capítulo 2.16 - Banquero Anarquista,0,98200,0,['Fernando Pessoa'],['14jtPCOoNZwquk5wd9DxrY'],1922-06-01,0.695,0.263,0,-22.136,1,0.957,0.797,0.0,0.148,0.655,102.009,1
2,07A5yehtSnoedViJAZkNnc,Vivo para Quererte - Remasterizado,0,181640,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.434,0.177,1,-21.18,1,0.0512,0.994,0.0218,0.212,0.457,130.418,5
3,08FmqUhxtyLTn6pAh6bk45,El Prisionero - Remasterizado,0,176907,0,['Ignacio Corsini'],['5LiOoJbxVSAMkBS2fUm3X2'],1922-03-21,0.321,0.0946,7,-27.961,1,0.0504,0.995,0.918,0.104,0.397,169.98,3


In [43]:
tracks.dtypes

id                          object
name                        object
popularity                   int64
duration_ms                  int64
explicit                     int64
artists                     object
id_artists                  object
release_date        datetime64[ns]
danceability               float64
energy                     float64
key                          int64
loudness                   float64
mode                         int64
speechiness                float64
acousticness               float64
instrumentalness           float64
liveness                   float64
valence                    float64
tempo                      float64
time_signature               int64
dtype: object

In [None]:
from sklearn import datasets
from yellowbrick.target import FeatureCorrelation


features = ["release_date", "danceability","energy","key","loudness","mode"."speechiness","acousticness","instrumentalness",
            "liveness","valence","tempo"]

X, y = tra[features], tracks['popularity']

# Create a list of the feature names
features = np.array(data['feature_names'])

# Instantiate the visualizer
visualizer = FeatureCorrelation(labels=features)

visualizer.fit(X, y)        # Fit the data to the visualizer
visualizer.show()    

### Spotify API take my spotify data

In [7]:
help(sp)

Help on Spotify in module spotipy.client object:

class Spotify(builtins.object)
 |  Spotify(auth=None, requests_session=True, client_credentials_manager=None, oauth_manager=None, auth_manager=None, proxies=None, requests_timeout=5, status_forcelist=None, retries=3, status_retries=3, backoff_factor=0.3, language=None)
 |  
 |  Example usage::
 |  
 |      import spotipy
 |  
 |      urn = 'spotify:artist:3jOstUTkEu2JkjvRdBA5Gu'
 |      sp = spotipy.Spotify()
 |  
 |      artist = sp.artist(urn)
 |      print(artist)
 |  
 |      user = sp.user('plamere')
 |      print(user)
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |      Make sure the connection (pool) gets closed
 |  
 |  __init__(self, auth=None, requests_session=True, client_credentials_manager=None, oauth_manager=None, auth_manager=None, proxies=None, requests_timeout=5, status_forcelist=None, retries=3, status_retries=3, backoff_factor=0.3, language=None)
 |      Creates a Spotify API client.
 |      
 |      :param 

In [3]:

for idx, item in enumerate(results['items']):
    track = item['track']
    print(idx, track['artists'][0]['name'], " – ", track['name'])

0 dad sports  –  dog cuddles
1 The Marías  –  Care For You
2 Amy Winehouse  –  What Is It About Men
3 BEACHPEOPLE  –  tonight
4 Milmine  –  Emerald Bay
5 RUBII  –  Jammin
6 SAULT  –  Wildfires
7 Buddy  –  Trouble On Central
8 niquo  –  Mr. Moon
9 The Butlers  –  No Good Nina
10 Grigovor  –  Довиждане
11 Martha Mudtoter  –  Pessimistic
12 Glass Animals  –  Youth
13 swim good now  –  Since U Asked (feat. Merival)
14 Chris IDH  –  Kukos
15 Frizzy P & Mr Cole  –  Blue
16 CoryaYo  –  1995
17 extremely bad man  –  Stay, Pt. 2
18 Young Bull  –  Egyptian Joint (All I Need)
19 Dirty Doering  –  Casino Aquatique
