# Lab | Unsupervised learning intro

### Instructions

--------------------------------------------------------------------------------------------------
It's the moment to perform clustering on the songs you collected. Remember that the ultimate goal of this little project is to improve the recommendations of artists. Clustering the songs will allow the recommendation system to limit the scope of the recommendations to only songs that belong to the same cluster - songs with similar audio features.

The experiments you did with the Spotify API and the Billboard web scraping will allow you to create a pipeline such that when the user enters a song, you:

1. Check whether or not the song is in the Billboard Hot 200.
2. Collect the audio features from the Spotify API.

After that, you want to send the Spotify audio features of the submitted song to the clustering model, which should return a cluster number.

We want to have as many songs as possible to create the clustering model, so we will add the songs you collected to a bigger dataset available on Kaggle containing 160 thousand songs.

--------------------------------------------------------------------------------------------------------

In [13]:
import numpy as np
import pandas as pd
from sklearn import datasets

spotify_track_features = pd.read_csv(r"C:\Users\mafal\Documents\ironhack\labs\lab-unsupervised-learning-intro\spotify_track_features.csv")
spotify_track_features

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.695,0.700,1,-1.587,0,0.0332,0.104000,0.000000,0.192,0.461,94.959,audio_features,2ikmBwZKZr0ahGcX4x8qtj,spotify:track:2ikmBwZKZr0ahGcX4x8qtj,https://api.spotify.com/v1/tracks/2ikmBwZKZr0a...,https://api.spotify.com/v1/audio-analysis/2ikm...,183296,4
1,0.593,0.741,4,-4.353,0,0.0359,0.022100,0.000000,0.393,0.460,96.978,audio_features,2MxErftY5S07dFtIdxQOSF,spotify:track:2MxErftY5S07dFtIdxQOSF,https://api.spotify.com/v1/tracks/2MxErftY5S07...,https://api.spotify.com/v1/audio-analysis/2MxE...,220670,4
2,0.660,0.765,2,-6.217,1,0.0299,0.125000,0.000956,0.235,0.681,123.051,audio_features,19meO0ADnoTjRuBMXZCdbs,spotify:track:19meO0ADnoTjRuBMXZCdbs,https://api.spotify.com/v1/tracks/19meO0ADnoTj...,https://api.spotify.com/v1/audio-analysis/19me...,175333,4
3,0.577,0.891,0,-4.672,1,0.0359,0.001230,0.000000,0.114,0.846,144.989,audio_features,3OPyobYAM5MgTm35AJV99O,spotify:track:3OPyobYAM5MgTm35AJV99O,https://api.spotify.com/v1/tracks/3OPyobYAM5Mg...,https://api.spotify.com/v1/audio-analysis/3OPy...,155707,4
4,0.531,0.693,6,-5.203,0,0.0374,0.009310,0.000003,0.119,0.555,157.960,audio_features,4nDfJDZaUVtwOSnGROb2GN,spotify:track:4nDfJDZaUVtwOSnGROb2GN,https://api.spotify.com/v1/tracks/4nDfJDZaUVtw...,https://api.spotify.com/v1/audio-analysis/4nDf...,164453,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3003,0.406,0.807,7,-3.871,1,0.0507,0.008350,0.000000,0.118,0.290,159.713,audio_features,38ODYA4I5jEhFr4xJJd1RG,spotify:track:38ODYA4I5jEhFr4xJJd1RG,https://api.spotify.com/v1/tracks/38ODYA4I5jEh...,https://api.spotify.com/v1/audio-analysis/38OD...,226861,3
3004,0.502,0.961,9,-4.389,1,0.0905,0.000075,0.000002,0.124,0.281,110.028,audio_features,3oGNDHK33fp1GMqU9e4HQ7,spotify:track:3oGNDHK33fp1GMqU9e4HQ7,https://api.spotify.com/v1/tracks/3oGNDHK33fp1...,https://api.spotify.com/v1/audio-analysis/3oGN...,234550,4
3005,0.639,0.832,1,-4.976,1,0.1180,0.003080,0.000382,0.121,0.482,119.045,audio_features,2jPqRiw1kJvxDKIibCPhHu,spotify:track:2jPqRiw1kJvxDKIibCPhHu,https://api.spotify.com/v1/tracks/2jPqRiw1kJvx...,https://api.spotify.com/v1/audio-analysis/2jPq...,166467,4
3006,0.741,0.810,11,-5.808,0,0.1650,0.002650,0.018400,0.131,0.799,132.076,audio_features,0mH0iiNINYULYFwszeqWnW,spotify:track:0mH0iiNINYULYFwszeqWnW,https://api.spotify.com/v1/tracks/0mH0iiNINYUL...,https://api.spotify.com/v1/audio-analysis/0mH0...,125455,4


In [14]:
tracks_ids = spotify_track_features['id']
tracks_ids

0       2ikmBwZKZr0ahGcX4x8qtj
1       2MxErftY5S07dFtIdxQOSF
2       19meO0ADnoTjRuBMXZCdbs
3       3OPyobYAM5MgTm35AJV99O
4       4nDfJDZaUVtwOSnGROb2GN
                 ...          
3003    38ODYA4I5jEhFr4xJJd1RG
3004    3oGNDHK33fp1GMqU9e4HQ7
3005    2jPqRiw1kJvxDKIibCPhHu
3006    0mH0iiNINYULYFwszeqWnW
3007    12dYN0rS95O37qxZh3pOLV
Name: id, Length: 3008, dtype: object

In [15]:
!pip install spotipy



In [16]:
# Testing spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

#Initialize SpotiPy with user credentias
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="c43ae4f18c0d4b2c8b04c93649fa4b72",
                                                           client_secret="1f7865f1e7aa439e9e997bc38b591855"))

In [17]:
sp.track(spotify_track_features.iloc[0]['id'])

{'album': {'album_type': 'single',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0yb46jwm7gqbZXVXZQ8Z1e'},
    'href': 'https://api.spotify.com/v1/artists/0yb46jwm7gqbZXVXZQ8Z1e',
    'id': '0yb46jwm7gqbZXVXZQ8Z1e',
    'name': 'Bishop Briggs',
    'type': 'artist',
    'uri': 'spotify:artist:0yb46jwm7gqbZXVXZQ8Z1e'}],
  'available_markets': ['AR',
   'AU',
   'AT',
   'BE',
   'BO',
   'BR',
   'BG',
   'CA',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DK',
   'DO',
   'DE',
   'EC',
   'EE',
   'SV',
   'FI',
   'FR',
   'GR',
   'GT',
   'HN',
   'HK',
   'HU',
   'IS',
   'IE',
   'IT',
   'LV',
   'LT',
   'LU',
   'MY',
   'MT',
   'MX',
   'NL',
   'NZ',
   'NI',
   'NO',
   'PA',
   'PY',
   'PE',
   'PH',
   'PL',
   'PT',
   'SG',
   'SK',
   'ES',
   'SE',
   'CH',
   'TW',
   'TR',
   'UY',
   'US',
   'GB',
   'AD',
   'LI',
   'MC',
   'ID',
   'JP',
   'TH',
   'VN',
   'RO',
   'IL',
   'ZA',
   'SA',
   'AE',
   'BH',
   'QA',
   'OM

In [18]:
# Creating a list of track ids
track_ids = []
track_ids = spotify_track_features['id']
track_ids

0       2ikmBwZKZr0ahGcX4x8qtj
1       2MxErftY5S07dFtIdxQOSF
2       19meO0ADnoTjRuBMXZCdbs
3       3OPyobYAM5MgTm35AJV99O
4       4nDfJDZaUVtwOSnGROb2GN
                 ...          
3003    38ODYA4I5jEhFr4xJJd1RG
3004    3oGNDHK33fp1GMqU9e4HQ7
3005    2jPqRiw1kJvxDKIibCPhHu
3006    0mH0iiNINYULYFwszeqWnW
3007    12dYN0rS95O37qxZh3pOLV
Name: id, Length: 3008, dtype: object

In [22]:
track_ids = [track['track']['uri'] for track in tracks]
track_ids

NameError: name 'tracks' is not defined

In [21]:
from pandas import json_normalize  # or pd.json_normalize depending on pandas version

# Adding all the tracks features to a df
start = 0
spotify_track_data = pd.DataFrame()

for stop in range(0, len(track_ids)+100, 100):
    if start != stop:
        print(start, stop)
        new_df = pd.json_normalize(sp.track(track_ids[start:stop]))
        
        spotify_track_data = pd.concat([new_df, spotify_track_data])
        start = stop

spotify_track_data.reset_index()         

0 100


TypeError: expected string or bytes-like object, got 'Series'