# Ejemplo del uso de la API de Spotify

En este notebooks se extraeran informacion relevante a las canciones usando la API de Spotify y la libreria Spotipy.

## Obtenemos acceso a la API

Conectando nuestra cuenta de spotify con Spotify Developer [aqui](https://developer.spotify.com/dashboard/)  

Una vez que estemos en el dashboard, damos clic en el boton "Create a Client ID" para llevar la forma de creación de una app.

Si deseamos acceder a datos relacionados a los usuarios (perfiles,playlists) debemos de [registar](https://developer.spotify.com/documentation/general/guides/app-settings/#register-your-app) la app. Si no la registramos podemos acceder solo a los siguientes datos:  

| Grupos de datos | Metodos | Data Endpoints |
|-------|------|------|
|Albums | Get albums. Get album's tracks| Album,Name, Album type|
| Artists | Get an artist, get artist's albums, get artist's top tracks | Artist Name, Genres, Popularity Score, Followers|
| Browse | Get Availabe, Genre Seeds, Get a Browse Category, Get a Category's Playlists | Genre Seeds, Search Seeds, Category's Playslists New Release Track Information |
| Tracks | Get a Track or Several Tracks, Get Audio Analysis for a track, Get Audio Features for Several Tracks, Get Audio Features for a track | Album info, artist info, available markets, popularity, track number, explicit, ISRC, Spotify URi, Spotify ID |

Podemos usar la [Spotify Web API Console](https://developer.spotify.com/console/) para probar varios metodos y ver su output en JSON.

## Usaremos Spotipy, una libreria para la API de Spotify

[Documentacion de la libreria](https://spotipy.readthedocs.io/en/latest/#)

In [2]:
#pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

cid="ba769ad5ef5c4081b86660038a1fffcf"
secret="aa364ad8a78243db908111a8cdc595e7"

client_credentials_manager=SpotifyClientCredentials(
    client_id=cid,client_secret=secret)
sp=spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Obtenemos los ID de las canciones

Recopilamos 2000 canciones de la API.  
El endpoint de busqueda tiene algunas limitaciones:
* limit: se regresa un maximo de 50 resultados por query.
* offset: Estes es el index del primer resultado a regresar.

In [3]:
#creamos las listas donde almacenaremos los resultados
artist_name = []
track_name = []
popularity = []
track_id = []

for i in range(0,2000,50):
    track_results = sp.search(q='year:2020', type='track', limit=50,offset=i)
    for i, t in enumerate(track_results['tracks']['items']):
        artist_name.append(t['artists'][0]['name'])
        track_name.append(t['name'])
        track_id.append(t['id'])
        popularity.append(t['popularity'])

Exploramos de los datos.

In [4]:
print("numero de elementos",len(track_id))

numero de elementos 2000


Creamos un dataframe

In [5]:
import pandas as pd

df_tracks = pd.DataFrame({'artist_name':artist_name,'track_name':track_name,'track_id':track_id,'popularity':popularity})
print(df_tracks.shape)
df_tracks.head()

(2000, 4)


Unnamed: 0,artist_name,track_name,track_id,popularity
0,SZA,Good Days,3YJJjQPAbDT7mGpX3WtQ9A,92
1,Hammock,Longest Year - 2020,51TAp7p7B3xboCl5HWmEkF,35
2,24kGoldn,Mood (feat. iann dior),3tjFYV6RSFtuktYl3ZtYcq,95
3,Hammock,Dark Beyond the Blue - 2020,79hqdNpnWPVVxmZiDfKaIL,40
4,CJ,Whoopty,5vGLcdRuSbUhD8ScwsGSdA,93


En ocasiones, se devuelve la misma canción bajo diferentes tracks IDs (single, como parte de un album, etc.)  

Eliminamos los duplicados.

In [6]:
#Agrupamos las entradas por artist_name, track_name y checamos duplicados.
grouped = df_tracks.groupby(['artist_name','track_name'], as_index=True).size()
grouped[grouped > 1].count()

54

Tenemos 54 duplicados.

In [7]:
df_tracks.drop_duplicates(subset=['artist_name','track_name'], inplace=True)

In [8]:
# Volvemos a agrpar para verificar que no se tienen duplicados
grouped_after_dropping = df_tracks.groupby(['artist_name','track_name'], as_index=True).size()
grouped_after_dropping[grouped_after_dropping > 1].count()

0

In [9]:
df_tracks.shape

(1944, 4)

## Obtenemos las caracteristicas del audio

Con el endpoint de caracteristicas de audio obtendremos las caracteristicas para todas las canciones.

In [10]:
rows = []
batchsize = 100
None_counter = 0

for i in range(0,len(df_tracks['track_id']),batchsize):
    batch = df_tracks['track_id'][i:i+batchsize]
    feature_results = sp.audio_features(batch)
    for i, t in enumerate(feature_results):
        if t == None:
            None_counter = None_counter + 1
        else:
            rows.append(t)
            
print('Numero de canciones que no tuvieron caracteristicas disponibles:',None_counter)

Numero de canciones que no tuvieron caracteristicas disponibles: 0


Cargamos las caracteristicas en un dataframe

In [11]:
df_audio_features = pd.DataFrame.from_dict(rows,orient='columns')
print("Tamaño de los datos", df_audio_features.shape)
df_audio_features.head()

Tamaño de los datos (1944, 18)


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.436,0.655,1,-8.37,0,0.0583,0.499,8e-06,0.688,0.412,121.002,audio_features,3YJJjQPAbDT7mGpX3WtQ9A,spotify:track:3YJJjQPAbDT7mGpX3WtQ9A,https://api.spotify.com/v1/tracks/3YJJjQPAbDT7...,https://api.spotify.com/v1/audio-analysis/3YJJ...,279204,4
1,0.0692,0.285,2,-14.854,1,0.0482,0.7,0.673,0.199,0.0312,78.572,audio_features,51TAp7p7B3xboCl5HWmEkF,spotify:track:51TAp7p7B3xboCl5HWmEkF,https://api.spotify.com/v1/tracks/51TAp7p7B3xb...,https://api.spotify.com/v1/audio-analysis/51TA...,533707,4
2,0.7,0.722,7,-3.558,0,0.0369,0.221,0.0,0.272,0.756,90.989,audio_features,3tjFYV6RSFtuktYl3ZtYcq,spotify:track:3tjFYV6RSFtuktYl3ZtYcq,https://api.spotify.com/v1/tracks/3tjFYV6RSFtu...,https://api.spotify.com/v1/audio-analysis/3tjF...,140526,4
3,0.2,0.282,2,-19.803,0,0.0468,0.729,0.887,0.0837,0.0372,74.381,audio_features,79hqdNpnWPVVxmZiDfKaIL,spotify:track:79hqdNpnWPVVxmZiDfKaIL,https://api.spotify.com/v1/tracks/79hqdNpnWPVV...,https://api.spotify.com/v1/audio-analysis/79hq...,325680,3
4,0.711,0.598,3,-7.621,0,0.182,0.19,0.0,0.15,0.737,140.175,audio_features,5vGLcdRuSbUhD8ScwsGSdA,spotify:track:5vGLcdRuSbUhD8ScwsGSdA,https://api.spotify.com/v1/tracks/5vGLcdRuSbUh...,https://api.spotify.com/v1/audio-analysis/5vGL...,123429,4


In [12]:
#Unimos ambos dataframes
df_audio_features.rename(columns={'id': 'track_id'}, inplace=True)
df = pd.merge(df_tracks,df_audio_features,on='track_id',how='inner')
print("Tamaño de los datos:", df_audio_features.shape)
df.head()

Tamaño de los datos: (1944, 18)


Unnamed: 0,artist_name,track_name,track_id,popularity,danceability,energy,key,loudness,mode,speechiness,...,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms,time_signature
0,SZA,Good Days,3YJJjQPAbDT7mGpX3WtQ9A,92,0.436,0.655,1,-8.37,0,0.0583,...,8e-06,0.688,0.412,121.002,audio_features,spotify:track:3YJJjQPAbDT7mGpX3WtQ9A,https://api.spotify.com/v1/tracks/3YJJjQPAbDT7...,https://api.spotify.com/v1/audio-analysis/3YJJ...,279204,4
1,Hammock,Longest Year - 2020,51TAp7p7B3xboCl5HWmEkF,35,0.0692,0.285,2,-14.854,1,0.0482,...,0.673,0.199,0.0312,78.572,audio_features,spotify:track:51TAp7p7B3xboCl5HWmEkF,https://api.spotify.com/v1/tracks/51TAp7p7B3xb...,https://api.spotify.com/v1/audio-analysis/51TA...,533707,4
2,24kGoldn,Mood (feat. iann dior),3tjFYV6RSFtuktYl3ZtYcq,95,0.7,0.722,7,-3.558,0,0.0369,...,0.0,0.272,0.756,90.989,audio_features,spotify:track:3tjFYV6RSFtuktYl3ZtYcq,https://api.spotify.com/v1/tracks/3tjFYV6RSFtu...,https://api.spotify.com/v1/audio-analysis/3tjF...,140526,4
3,Hammock,Dark Beyond the Blue - 2020,79hqdNpnWPVVxmZiDfKaIL,40,0.2,0.282,2,-19.803,0,0.0468,...,0.887,0.0837,0.0372,74.381,audio_features,spotify:track:79hqdNpnWPVVxmZiDfKaIL,https://api.spotify.com/v1/tracks/79hqdNpnWPVV...,https://api.spotify.com/v1/audio-analysis/79hq...,325680,3
4,CJ,Whoopty,5vGLcdRuSbUhD8ScwsGSdA,93,0.711,0.598,3,-7.621,0,0.182,...,0.0,0.15,0.737,140.175,audio_features,spotify:track:5vGLcdRuSbUhD8ScwsGSdA,https://api.spotify.com/v1/tracks/5vGLcdRuSbUh...,https://api.spotify.com/v1/audio-analysis/5vGL...,123429,4


Guardamos el dataframe en un archivo csv.

In [13]:
df.to_csv('SpotifyAudioFeatures2020.csv')