# Data Analysis Extra - Playlists

Como análisis extra de última hora pensé que podría analizar las playslist que crearon los usuarios durante la pandemia, podría dar indicadores de qué tipo de música prefiere escuchar durante una pandemia desde otro punto de vista y comparar con análisis anteriores

El proceso a realizar es buscar las playlist que tengan en el nombre alguna palabra relacionada con la pandemia (me he limitado al español) y extraer todas sus canciones para luego hacer un proceso de análisis de Features y Genre.

In [1]:
# Ver : https://towardsdatascience.com/what-music-do-people-listen-to-during-the-coronavirus-a-report-using-data-science-1a2035d12430
# Leer: https://blog.chartmetric.com/covid-19-effect-on-the-global-music-business-part-1-genre/
# https://gist.github.com/ilias1111/e503bbab0a98c20377686cc75ffad451


## Preparación Previa

In [6]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import seaborn as sns
from datetime import datetime

# Meter credenciales de Spotify
passw = pd.read_csv("pass_spotify.txt", sep = ',', encoding="utf-8")
client_id = passw.columns[0]
client_secret = passw.columns[1]

client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Vemos como está montada la estructura de información en la API de Spotify, para buscar una palabra en los nombres de las playlists

In [5]:
# 
print(sp.search('q=fernando simón', type='playlist'))
# ["covid","cuarentena", 'pandemia','covid19', "coronavirus", 'fernando simón']

{'playlists': {'href': 'https://api.spotify.com/v1/search?query=q%3Dfernando+sim%C3%B3n&type=playlist&offset=0&limit=10', 'items': [{'collaborative': False, 'description': '', 'external_urls': {'spotify': 'https://open.spotify.com/playlist/01bawF2DM0YzefGbC54Ua1'}, 'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1', 'id': '01bawF2DM0YzefGbC54Ua1', 'images': [{'height': None, 'url': 'https://i.scdn.co/image/ab67706c0000bebb80c823de1e04b1fe7f2073ce', 'width': None}], 'name': 'la playlist que escucha fernando simón', 'owner': {'display_name': 'mikenuggets', 'external_urls': {'spotify': 'https://open.spotify.com/user/079077tae8iz2j9qk6431p9s2'}, 'href': 'https://api.spotify.com/v1/users/079077tae8iz2j9qk6431p9s2', 'id': '079077tae8iz2j9qk6431p9s2', 'type': 'user', 'uri': 'spotify:user:079077tae8iz2j9qk6431p9s2'}, 'primary_color': None, 'public': None, 'snapshot_id': 'Mjc0LGVlNGQ1YTY3YWIzNGZmMjhiYzBiYzcxM2MxZDBkZWY4YzBjNDhlNmU=', 'tracks': {'href': 'https://api.spotify.co

Creamos un script ya completo para buscar las palabras y sacar solo el total de listas que contiene esta palabra en alguna parte de su información

In [9]:
for word in ["fernando simón"]:
    print(sp.search('q="{}"'.format(word), type='playlist')['playlists']['items'])


[{'collaborative': False, 'description': '', 'external_urls': {'spotify': 'https://open.spotify.com/playlist/01bawF2DM0YzefGbC54Ua1'}, 'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1', 'id': '01bawF2DM0YzefGbC54Ua1', 'images': [{'height': None, 'url': 'https://i.scdn.co/image/ab67706c0000bebb80c823de1e04b1fe7f2073ce', 'width': None}], 'name': 'la playlist que escucha fernando simón', 'owner': {'display_name': 'mikenuggets', 'external_urls': {'spotify': 'https://open.spotify.com/user/079077tae8iz2j9qk6431p9s2'}, 'href': 'https://api.spotify.com/v1/users/079077tae8iz2j9qk6431p9s2', 'id': '079077tae8iz2j9qk6431p9s2', 'type': 'user', 'uri': 'spotify:user:079077tae8iz2j9qk6431p9s2'}, 'primary_color': None, 'public': None, 'snapshot_id': 'Mjc0LGVlNGQ1YTY3YWIzNGZmMjhiYzBiYzcxM2MxZDBkZWY4YzBjNDhlNmU=', 'tracks': {'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1/tracks', 'total': 271}, 'type': 'playlist', 'uri': 'spotify:playlist:01bawF2DM0YzefGbC54Ua1'}

Hacemos una extracción masiva de dichas playlist

In [83]:
# Tarda alrededor de 1 minuto en completarse
# Documentación: https://developer.spotify.com/documentation/web-api/reference/playlists/get-playlist/
# ['items']["tracks"]['items'][0]["added_at"]

# Función para extrar determinados features de la playlist
def batch_proccess(x,lista):
    for i in x['playlists']['items']:
        lista.append({"name" : i['name'], "total":i["tracks"]['total'], 'id':i['id'], "uri":i["uri"], "URL":i["tracks"]['href']})

# Sacar un listado de las playlist, en bloques de 50, que es el máximo que te permite la API
Time1 = datetime.now()

list_of_playlists = []
for term in ["covid","pandemia","cuarentena",'covid19', "coronavirus", 'fernando simón']:
    for i in range(0,2000, 50):
        try:
            init_data = sp.search('q="{}"'.format(term), type='playlist', limit=50, offset=i)
            batch_proccess(init_data,list_of_playlists)
        except:
            print("Error")
Time2 = datetime.now()

print("Tiempo ejecución:", Time2 -Time1)

# Meto las playlists en un dataframe y echo un ojo
playlists = pd.DataFrame(list_of_playlists).drop_duplicates().reset_index(drop=True)
print('Número de playlists:', playlists.count())
playlists.head(20)

Tiempo ejecución: 0:00:37.114007
Número de playlists: name     5594
total    5594
id       5594
uri      5594
URL      5594
dtype: int64


Unnamed: 0,name,total,id,uri,URL
0,quarantine Vibes | Covid 19 |,45,6zjC569rXoYGOLwPUt37WW,spotify:playlist:6zjC569rXoYGOLwPUt37WW,https://api.spotify.com/v1/playlists/6zjC569rX...
1,Calma,151,37i9dQZF1DWY5LGZYBBHHz,spotify:playlist:37i9dQZF1DWY5LGZYBBHHz,https://api.spotify.com/v1/playlists/37i9dQZF1...
2,COVID-19 Quarantine Party,29,1Tn6OrkJhOYO5u6y16erqB,spotify:playlist:1Tn6OrkJhOYO5u6y16erqB,https://api.spotify.com/v1/playlists/1Tn6OrkJh...
3,Canciones Mas Virales que el Coronavirus 🦠🎶,106,4Y2h9TZNZersxSYeXujCNu,spotify:playlist:4Y2h9TZNZersxSYeXujCNu,https://api.spotify.com/v1/playlists/4Y2h9TZNZ...
4,Drunksouls COVID19 quarantine discovery: music...,49,64KUBwKGkUEqIU08HDhJ94,spotify:playlist:64KUBwKGkUEqIU08HDhJ94,https://api.spotify.com/v1/playlists/64KUBwKGk...
5,chill beats to quarantine to,124,5bPe4ZISvKWU5yTzsSO9lX,spotify:playlist:5bPe4ZISvKWU5yTzsSO9lX,https://api.spotify.com/v1/playlists/5bPe4ZISv...
6,Quarantine covid - 19,100,7qJzyN6atsM49NOWnqGbEK,spotify:playlist:7qJzyN6atsM49NOWnqGbEK,https://api.spotify.com/v1/playlists/7qJzyN6at...
7,MÁS PERREO MENOS COVID,99,1RYr3ngXNoPVtl7blr9Kaf,spotify:playlist:1RYr3ngXNoPVtl7blr9Kaf,https://api.spotify.com/v1/playlists/1RYr3ngXN...
8,CORONAVIRUS QUARANTINE PARTY COVID-19 2020,55,3r9u5bXzAYEvc0xBKkSnAx,spotify:playlist:3r9u5bXzAYEvc0xBKkSnAx,https://api.spotify.com/v1/playlists/3r9u5bXzA...
9,Plenas que no pudimos bailar por COVID,81,2U4J1vAVC8DvV3fPfrgXVy,spotify:playlist:2U4J1vAVC8DvV3fPfrgXVy,https://api.spotify.com/v1/playlists/2U4J1vAVC...


In [17]:
playlists_sample= playlists[0:50]
playlists_sample

Unnamed: 0,name,total,id,uri,URL
0,quarantine Vibes | Covid 19 |,45,6zjC569rXoYGOLwPUt37WW,spotify:playlist:6zjC569rXoYGOLwPUt37WW,https://api.spotify.com/v1/playlists/6zjC569rX...
1,Calma,151,37i9dQZF1DWY5LGZYBBHHz,spotify:playlist:37i9dQZF1DWY5LGZYBBHHz,https://api.spotify.com/v1/playlists/37i9dQZF1...
2,COVID-19 Quarantine Party,29,1Tn6OrkJhOYO5u6y16erqB,spotify:playlist:1Tn6OrkJhOYO5u6y16erqB,https://api.spotify.com/v1/playlists/1Tn6OrkJh...
3,Canciones Mas Virales que el Coronavirus 🦠🎶,106,4Y2h9TZNZersxSYeXujCNu,spotify:playlist:4Y2h9TZNZersxSYeXujCNu,https://api.spotify.com/v1/playlists/4Y2h9TZNZ...
4,Drunksouls COVID19 quarantine discovery: music...,49,64KUBwKGkUEqIU08HDhJ94,spotify:playlist:64KUBwKGkUEqIU08HDhJ94,https://api.spotify.com/v1/playlists/64KUBwKGk...
5,chill beats to quarantine to,124,5bPe4ZISvKWU5yTzsSO9lX,spotify:playlist:5bPe4ZISvKWU5yTzsSO9lX,https://api.spotify.com/v1/playlists/5bPe4ZISv...
6,Quarantine covid - 19,100,7qJzyN6atsM49NOWnqGbEK,spotify:playlist:7qJzyN6atsM49NOWnqGbEK,https://api.spotify.com/v1/playlists/7qJzyN6at...
7,MÁS PERREO MENOS COVID,99,1RYr3ngXNoPVtl7blr9Kaf,spotify:playlist:1RYr3ngXNoPVtl7blr9Kaf,https://api.spotify.com/v1/playlists/1RYr3ngXN...
8,CORONAVIRUS QUARANTINE PARTY COVID-19 2020,55,3r9u5bXzAYEvc0xBKkSnAx,spotify:playlist:3r9u5bXzAYEvc0xBKkSnAx,https://api.spotify.com/v1/playlists/3r9u5bXzA...
9,Plenas que no pudimos bailar por COVID,80,2U4J1vAVC8DvV3fPfrgXVy,spotify:playlist:2U4J1vAVC8DvV3fPfrgXVy,https://api.spotify.com/v1/playlists/2U4J1vAVC...


Scripts para sacar las canciones y artistas de las playlists

In [18]:
# IMPORTANTE, Tarda alrededor de 30min en sacarlo todo, por eso ejecutamos antes una muestra sólo

#FALTA JUNTAR ARTISTAS CON NOMBRES Y VER CÓMO LO JUNTAMOS EN AL DTAFRAME FINAL
# Sacar las canciones de las listas para tenerlas todas juntas
# Tarda bastante tiempo, alrededor de XX

playlists_extract = playlists

# Para sacar canciones
Time1 = datetime.now()
songs = []
for uri in playlists_extract['uri']:

        lenght = sp.playlist_tracks(uri)['total']
        for i in range(0,lenght, 50):
            init_data = sp.playlist_tracks(uri, limit=50, offset=i)
            try:
                for k in init_data['items']:
                    songs.append(k['track']["id"])
            except:
                pass

Time2 = datetime.now()
print("Tiempo ejecución:", Time2 -Time1)


# Para sacar artistas
Time1 = datetime.now()
artists = []
for uri in playlists_sample['uri']:

        lenght = sp.playlist_tracks(uri)['total']
        for i in range(0,lenght, 50):
            init_data = sp.playlist_tracks(uri, limit=50, offset=i)
            try:
                for k in init_data['items']:
                    artists.append(k['track']['artists'][0]['name'])
            except:
                pass

Time2 = datetime.now()
print("Tiempo ejecución:", Time2 -Time1)

print(len(songs))

Tiempo ejecución: 0:00:23.969576
Tiempo ejecución: 0:00:24.760783
8194


In [22]:
#Vamos a hacer un de las canciones más repetidas
from collections import Counter
data_playlist_songs = pd.DataFrame.from_dict(Counter(songs), orient='index').reset_index().rename(columns={"index":"spotify_id", 0:"count"})
data_playlist_songs.sort_values(by=['count'], ascending=False)

Unnamed: 0,spotify_id,count
197,6I9VzXrHxO9rA9A5euc8Ak,12
1930,3JeLRXZaxfF5HAR9DRarN6,11
211,1B75hgRqe7A4fwee3g3Wmu,11
196,2xLMifQCjDGFmkHkpNLD9h,10
1949,4OWFKyIuXMuDmdPq5ws7E5,10
...,...,...
2615,5XcZRgJv3zMhTqCyESjQrF,1
2613,56JyMaElW79S7TDWh1Zw1m,1
2612,1xOXXYh6lTW8laxlW7JP2J,1
2609,70N86eeUCGLpJm2jbucrfe,1


In [64]:
# Hay un test previo de extracción de FEATURES y GENRE
test_list = songs
test_list_artist = artists

## Extracción Features

Sacamos las features de cada canción y metemos la información en un dataframe

In [71]:
# IMPORTANTE!!! Esta celda tarda en ejecutarse mucho tiempo

# Función para sacar las features de canciones que me interesan desde la id
def getTrackFeatures(id):
  meta = sp.track(id)
  features = sp.audio_features(id)

  # meta
  name = meta['name']
  album = meta['album']['name']
  artist = meta['album']['artists'][0]['name']
  release_date = meta['album']['release_date']
  length = meta['duration_ms']
  popularity = meta['popularity']

  # features
  acousticness = features[0]['acousticness']
  danceability = features[0]['danceability']
  energy = features[0]['energy']
  instrumentalness = features[0]['instrumentalness']
  liveness = features[0]['liveness']
  loudness = features[0]['loudness']
  speechiness = features[0]['speechiness']
  valence = features[0]['valence']
  tempo = features[0]['tempo']
  time_signature = features[0]['time_signature']
  id = features[0]['id']

  track = [name, album, artist, release_date, length, popularity,
           acousticness, danceability, energy, instrumentalness,
           liveness, loudness, speechiness, valence, tempo, time_signature, id]
  return track


# Loop para sacar features de todas las canciones de una lista
# OJO, tarda bastante dependendo del número de canciones, por lo que antes hago un testeo
import time
list_toextract = test_list

tracks = []

Time1 = datetime.now()


for i in range(len(list_toextract)):
    time.sleep(.5)
    track = getTrackFeatures(list_toextract[i])
    tracks.append(track)
    
Time2 = datetime.now()

print("Tiempo 1:", Time2 -Time1)

# Metemos la info en Dataframe
data_features = pd.DataFrame(tracks, columns = ['name', 'album', 'artist', 'release_date',
                                             'length', 'popularity','acousticness', 'danceability', 'energy',
                                             'instrumentalness', 'liveness', 'loudness',
                                             'speechiness', 'valence','tempo', 'time_signature', 'id'])

data_features_final = data_features [['name','artist','album','release_date','length', 'popularity',
                                             'acousticness', 'danceability', 'energy',
                                             'instrumentalness', 'liveness', 'loudness',
                                             'speechiness', 'valence','tempo', 'time_signature', 'id']]

data_features_final = data_features_final.rename(columns = {'id':'spotify_id'})


# Finalmente meto el año de Release date (lanzamiento de la canción) en un nueva columna que me va a vernir en uno de los análisis
data_features_final['release_date_year'] = pd.to_datetime(data_features_final['release_date'])
data_features_final['release_date_year'] = pd.DatetimeIndex(data_features_final['release_date']).year
print(data_features_final.shape)
data_features_final.head(4)

#Normalizamos features
features_to_normalize = ['length', 'popularity', 'loudness', 'tempo', 'speechiness']

data_features_final[features_to_normalize] = data_features_final[features_to_normalize].apply(lambda x:(x-x.min()) / (x.max()-x.min()))

data_features_final.head(2)

Tiempo 1: 0:05:45.355921
(600, 18)


Unnamed: 0,name,artist,album,release_date,length,popularity,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,time_signature,spotify_id,release_date_year
0,Mess Around (feat. B00sted),Enrey,Mess Around (feat. B00sted),2018-11-05,0.281041,0.0,0.0969,0.643,0.842,0.0,0.103,0.91606,0.157725,0.131,0.36218,4,2uNXNNIs1gpxVac4s9oW2s,2018
1,Show You,JJ Young,Show You,2019-09-02,0.16219,0.17,0.0651,0.584,0.779,0.0,0.0584,0.899471,0.227468,0.216,0.773158,4,3NodDyiiU5ncq4EHsdE6Tf,2019


## Extracción genre

In [75]:
# IMPORTANTE!!! Esta celda tarda en ejecutarse mucho tiempo

# Selecciono la lista final a extraer completa
list_toextract_artist = test_list_artist

# Función para extraerlos, en lotes de 50 porque es el máximo que te deja la API de Spotify
def extract_artist(artist_list):
    count_artist= 0
    data = ['test11111111', 'tes111111', 'test111111', 0, 0]
    df_final = pd.DataFrame([data], columns=["id", "name", "genres", 'followers', 'popularity'])
    
    API_LIMIT = 50
    search_list = []

    for artist in artist_list:
        count_artist += 1
        search_results = sp.search(q = 'artist:"{}"'.format(artist), type="artist",  limit=API_LIMIT)
        
        count_oneartist= 0 # Se añade un contador par quedarnos sólo con la primera búsqueda que he comprobado que siempre es la buena.
        for t in search_results['artists']['items']:
            count_oneartist += 1
            if count_oneartist ==2:
                break
            else:
                search_list += [[t["id"], t["name"], t["genres"], t['followers']['total'], t['popularity']]]
        
        df_search = pd.DataFrame(search_list, columns=["id", "name", "genres", 'followers', 'popularity'])
        
        df_final = df_final.append(df_search) # El problema de esto es que crea duplicados pero no nos afecta en lo que necesitamos
    print('Artist count:', count_artist)  
    return df_final

# Extracción de los datos
Time1 = datetime.now()
df_genres = extract_artist(list_toextract_artist)
Time2 = datetime.now()
print("Tiempo ejecución:", Time2 -Time1)

print(df_genres.shape)


# Tarda en ejecutarse dependiendo del número de artistas, menos de 1 minuto
# Creamos una copia de la columna genres para mantenerla y la trasnformo a strings
df_genres['genres_together'] = df_genres['genres'].copy()
df_genres['genres_together'] = [','.join(i) if isinstance(i, list) else i for i in df_genres['genres_together']]

# Como la columna genres es una lista con los diferentes géneros, he preferido separarlo en columnas y mantener ciertas columnas

df_genres2 = pd.concat([df_genres['genres'].apply(pd.Series), df_genres[['genres_together',
                                                                         'id',
                                                                         "name",
                                                                         'followers',
                                                                         'popularity']]], axis = 1)
# Quitamos los artistas duplicados
df_genres3 = df_genres2.drop_duplicates(df_genres2.columns[~df_genres2.columns.isin(['id'])])

#Renombramos columnas
df_genres4 = df_genres3.rename(columns={0:'genre1',
                   1:'genre2',
                   2:'genre3',
                   3:'genre4',
                   4:'genre5',
                   5:'genre6',
                   6:'genre7',
                   7:'genre8',
                   8:'genre9',
                   9:'genre10',
                   'genres_together':'genres_together',
                   'id':'artist_id',
                   'name':'artist',
                   'followers':'artist_followers',
                   'popularity':'artist_popularity'}).copy()
print(df_genres4.shape)
print(len(df_genres4['artist'].unique()))
df_genres4.head(10)

Artist count: 600
Tiempo ejecución: 0:00:52.907718
(179926, 5)
(468, 18)
468


Unnamed: 0,genre1,genre2,genre3,genre4,genre5,genre6,genre7,genre8,genre9,genre10,10,11,12,genres_together,artist_id,artist,artist_followers,artist_popularity
0,test111111,,,,,,,,,,,,,test111111,test11111111,tes111111,0,0
0,,,,,,,,,,,,,,,3QfW1vNOZSUUE1aQeOAC8X,Enrey,1123,18
1,,,,,,,,,,,,,,,65xpK3aDoQFRyk2YV15ywx,JJ Young,99,7
2,,,,,,,,,,,,,,,2DPMosZa5TBJmERNH6oPC3,Muenster,9112,29
3,,,,,,,,,,,,,,,1sVpPtWTMrItmmy2y4CznH,Ashlii,648,16
4,,,,,,,,,,,,,,,43p4wOmaJgejZzTzENNYva,George D. Van Currant,459,22
7,,,,,,,,,,,,,,,11tDOoVVUQlHM6qYasQMkx,Limber De Coco,148,7
8,,,,,,,,,,,,,,,2oquQxGby0jWLStBm5MbCS,Joshbrappin,109,3
9,,,,,,,,,,,,,,,2xgy3yLWfWfYwooshMSZQQ,Philly Swain,12684,51
10,,,,,,,,,,,,,,,3JKndn6eZWMJ72rqQguTtq,Rpm 8ight 6ix,113,13


In [69]:
df_genres4.isnull().sum(axis = 0)

genre1               131
genre2               272
genre3               340
genre4               378
genre5               405
genre6               425
genre7               436
genre8               444
genre9               453
genre10              460
10                   462
11                   464
12                   466
genres_together        0
artist_id              0
Artist                 0
artist_followers       0
artist_popularity      0
dtype: int64

In [78]:
# Export songs list
data_glo = pd.merge(data_features_final, df_genres4, on=['artist'], how='inner')
data_glo.head(5)
data_glo.to_csv("data_ana_playlist_songs.csv", sep = ',')

In [79]:
# data_ana_playlist = pd.read_csv("data_ana_playlist_songs.csv", sep = ',')