# Data extraction

## Access data:

Spotipy es una biblioteca de Python que permite interactuar con la API de Spotify. Se utiliza para acceder y manipular datos de Spotify, como obtener información de canciones, artistas, álbumes, listas de reproducción y realizar acciones como reproducir pistas, crear listas de reproducción y mucho más.

In [1]:
# Para acceder a los datos de Spotify necesitamos leer las credenciales que nos da la API: clientid y client_secret
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

with open("api.txt") as f:
        secret_ls = f.readlines()
        client_id = secret_ls[0][:-1]
        secret = secret_ls[1]

In [2]:
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

## Import playlists:

Se han seleccionado una serie de playlists creadas por Spotify según los estados de ánimo que se pretenden analizar:

In [4]:
# 1/ happy playlist link: "Happy hits!"
happy_pl_link = "https://open.spotify.com/playlist/37i9dQZF1DXdPec7aLTmlC?si=6d0d8a354930406b"
happy_URI = happy_pl_link.split("/")[-1].split("?")[0]
# 2/ sad playlist link: "Heartache"
sad_pl_link = "https://open.spotify.com/playlist/37i9dQZF1DX46zHlTTyolY?si=c5d5c06130e84367"
sad_URI = sad_pl_link.split("/")[-1].split("?")[0]
# 3/ anger playlist link: "Walk like a badass"
ang_pl_link = "https://open.spotify.com/playlist/37i9dQZF1DX1tyCD9QhIWF?si=45e2a041dbbf4cbb"
anger_URI = ang_pl_link.split("/")[-1].split("?")[0]
# 4/ sppoky playlist link: "Spooky"
spooky_pl_link = "https://open.spotify.com/playlist/37i9dQZF1DX5hol82XuK24?si=29cf60e5500b488a"
spooky_URI = spooky_pl_link.split("/")[-1].split("?")[0]
# 5/ focus playlist link: "Deep Focus"
focus_pl_link = "https://open.spotify.com/playlist/37i9dQZF1DWZeKCadgRdKQ?si=635dd64ee7fa477b"
focus_URI = focus_pl_link.split("/")[-1].split("?")[0]

playlist_URI = [happy_URI, sad_URI, anger_URI, spooky_URI, focus_URI]

Esta forma no funciona porque itera en cada canción y la redefine cada vez que pasa por ella en el bucle:

In [6]:
'''import pandas as pd

playlist_URI = [happy_URI, sad_URI, anger_URI, spooky_URI, focus_URI]

feature_names = ['danceability', 'energy', 'key', 'loudness', 'mode',
                 'speechiness', 'acousticness', 'instrumentalness',
                 'liveness', 'valence', 'tempo', 'duration_ms']

df = pd.DataFrame()

# Recorre cada playlist URI
for uri in playlist_URI:
    # Obtiene las canciones de la playlist
    results = sp.playlist_tracks(uri)
    tracks = results['items']
    # Recorre cada canción de la playlist
    for track in tracks:
        # Nombre de la playlist
        df['playlist_name'] = sp.playlist(uri)['name']
        
        # Nombre del artista
        df['artist_name'] = track['track']['artists'][0]['name']
        
        # Popularidad de la canción
        df['popularity'] = track['track']['popularity']
        
        # Obtiene las features de la canción
        features = sp.audio_features(track['track']['uri'])
        for feature_name in feature_names:
            df[feature_name] = features[0][feature_name]
print(df)'''

Empty DataFrame
Columns: [playlist_name, artist_name, popularity, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms]
Index: []


In [5]:
import pandas as pd

playlist_URI = [happy_URI, sad_URI, anger_URI, spooky_URI, focus_URI]

feature_names = ['danceability', 'energy', 'key', 'loudness', 'mode',
                 'speechiness', 'acousticness', 'instrumentalness',
                 'liveness', 'valence', 'tempo', 'duration_ms']

# Lista para almacenar los diccionarios de cada canción
song_data = []

# Recorre cada playlist URI
for uri in playlist_URI:
    # Obtiene las canciones de la playlist
    results = sp.playlist_tracks(uri)
    tracks = results['items']
    
    # Recorre cada canción de la playlist
    for track in tracks:
        # Crea un diccionario para la canción actual
        song_info = {}
        
        # Nombre de la playlist
        song_info['playlist_name'] = sp.playlist(uri)['name']
        
        # Nombre de la canción
        song_info['song_name'] = track['track']['name']
        
        # URI de la canción
        song_info['song_uri'] = track['track']['uri']
        
        # Nombre del artista
        song_info['artist_name'] = track['track']['artists'][0]['name']
        
        # Popularidad de la canción
        song_info['popularity'] = track['track']['popularity']
        
        # Obtiene las features de la canción
        features = sp.audio_features(track['track']['uri'])
        
        # Agrega las características al diccionario
        for feature_name in feature_names:
            song_info[feature_name] = features[0][feature_name]
        
        # Agrega el diccionario a la lista de canciones
        song_data.append(song_info)

# Crea el DataFrame a partir de la lista de canciones
df = pd.DataFrame(song_data)

# Imprime el DataFrame
print(df)


    playlist_name                                song_name  \
0     Happy Hits!                          I'm Good (Blue)   
1     Happy Hits!                          I Ain't Worried   
2     Happy Hits!  Dance The Night (From Barbie The Album)   
3     Happy Hits!                          Blinding Lights   
4     Happy Hits!                         Watermelon Sugar   
..            ...                                      ...   
460    Deep Focus                              remembrance   
461    Deep Focus                               Are You Up   
462    Deep Focus                                waveforms   
463    Deep Focus                         miniature things   
464    Deep Focus                      Abroad and Overseas   

                                 song_uri   artist_name  popularity  \
0    spotify:track:4uUG5RXrOk84mYEfFvj3cK  David Guetta          95   
1    spotify:track:4h9wh7iOZ0GGn8QVp4RAOB   OneRepublic          95   
2    spotify:track:1vYXt7VSjH9JIM5oRRo7vA 

# Preprocesamiento

### Modificación de valores de la columna playlist_name por estado de ánimo:

In [6]:
df['playlist_name'].unique()

array(['Happy Hits!', 'Heartache', 'Walk Like A Badass', 'Spooky',
       'Deep Focus'], dtype=object)

In [7]:
# Cambiamos los valores de la columna de playlist_name por cada estado de ánimo:
df.loc[df['playlist_name'] == 'Happy Hits!', 'playlist_name'] = 'Happy'
df.loc[df['playlist_name'] == 'Heartache', 'playlist_name'] = 'Sad'
df.loc[df['playlist_name'] == 'Walk Like A Badass', 'playlist_name'] = 'Anger'
df.loc[df['playlist_name'] == 'Spooky', 'playlist_name'] = 'Fear'
df.loc[df['playlist_name'] == 'Deep Focus', 'playlist_name'] = 'Focus'
# Cambiamos el nombre de la columna por Mood o estado de ánimo:
df = df.rename({'playlist_name': 'Mood'}, axis=1)

### Info general y descripción del df:

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 465 entries, 0 to 464
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Mood              465 non-null    object 
 1   song_name         465 non-null    object 
 2   song_uri          465 non-null    object 
 3   artist_name       465 non-null    object 
 4   popularity        465 non-null    int64  
 5   danceability      465 non-null    float64
 6   energy            465 non-null    float64
 7   key               465 non-null    int64  
 8   loudness          465 non-null    float64
 9   mode              465 non-null    int64  
 10  speechiness       465 non-null    float64
 11  acousticness      465 non-null    float64
 12  instrumentalness  465 non-null    float64
 13  liveness          465 non-null    float64
 14  valence           465 non-null    float64
 15  tempo             465 non-null    float64
 16  duration_ms       465 non-null    int64  
dt

In [7]:
df.describe()

Unnamed: 0,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms
count,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0,450.0
mean,62.317778,0.556267,0.487121,5.311111,-10.211711,0.648889,0.061014,0.445754,0.295955,0.157851,0.363042,113.515611,196423.537778
std,17.743882,0.150213,0.28645,3.602908,6.174619,0.477849,0.055972,0.391267,0.399973,0.117311,0.254685,29.986572,56893.384509
min,14.0,0.132,0.00954,0.0,-32.03,0.0,0.0243,2.6e-05,0.0,0.0292,0.0309,35.366,100467.0
25%,52.0,0.46025,0.2125,2.0,-14.07825,0.0,0.03385,0.0361,2e-06,0.099025,0.14525,90.09475,160663.75
50%,60.5,0.56,0.485,5.0,-8.193,1.0,0.0402,0.3725,0.00109,0.111,0.3035,113.01,184054.5
75%,79.0,0.6635,0.75275,9.0,-5.342,1.0,0.062725,0.874,0.836,0.15775,0.53675,130.01225,224634.0
max,95.0,0.954,0.987,11.0,-1.789,1.0,0.519,0.994,0.973,0.755,0.965,203.639,518747.0


### Valores nulos:

In [9]:
# Ya se ve en df.info que no aparecen valores nulos pero hacemos una pequeña comprobación:
df.isnull().sum().sum()

0

### Valores duplicados:

In [10]:
# No tiene sentido que en esta base de datos haya valores duplicados, pero 
df.duplicated().sum().sum()

0

# Exportación a csv:

In [11]:
df.to_csv('datos.csv', index=False)