# Data Analysis Extra - Playlists

Tras los análisis anteriores, de los que no obtube lso resultados que esperaba, decidí hacer un análisis de última hora con las playslist que crearon los usuarios durante la pandemia. Tal vez aquí haya indicadores más claros de un cambio en el tipo de música antes de la pandemia y durante.

El proceso a realizar es buscar las playlists que tengan en el nombre alguna palabra relacionada con la pandemia (me he limitado al español) y extraer todas sus canciones para luego hacer un proceso de análisis de Features y Genre.

In [392]:
# Ver : https://towardsdatascience.com/what-music-do-people-listen-to-during-the-coronavirus-a-report-using-data-science-1a2035d12430
# Leer: https://blog.chartmetric.com/covid-19-effect-on-the-global-music-business-part-1-genre/
# https://gist.github.com/ilias1111/e503bbab0a98c20377686cc75ffad451


## Preparación Previa

In [393]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import seaborn as sns
from datetime import datetime
import time
import altair as alt

# Meter credenciales de Spotify
passw = pd.read_csv("pass_spotify.txt", sep = ',', encoding="utf-8")
client_id = passw.columns[0]
client_secret = passw.columns[1]

client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Cargamos funciones de anteriores Notebooks simplificadas para extraer features

In [394]:
##### Función para sacar las features de canciones que me interesan desde la id
def getTrackFeatures(id):
  meta = sp.track(id)
  features = sp.audio_features(id)

  # meta
  name = meta['name']
  album = meta['album']['name']
  artist = meta['album']['artists'][0]['name']
  release_date = meta['album']['release_date']
  length = meta['duration_ms']
  popularity = meta['popularity']

  # features
  acousticness = features[0]['acousticness']
  danceability = features[0]['danceability']
  energy = features[0]['energy']
  instrumentalness = features[0]['instrumentalness']
  liveness = features[0]['liveness']
  loudness = features[0]['loudness']
  speechiness = features[0]['speechiness']
  valence = features[0]['valence']
  tempo = features[0]['tempo']
  time_signature = features[0]['time_signature']
  id = features[0]['id']

  track = [name, album, artist, release_date, length, popularity,
           acousticness, danceability, energy, instrumentalness,
           liveness, loudness, speechiness, valence, tempo, time_signature, id]
  return track


##### Loop para sacar features de todas las canciones de una lista, devuelve un dataframe

def extract_songs(list_toextract):
    import time
    tracks = []
    for i in range(len(list_toextract)):
        time.sleep(.5)
        track = getTrackFeatures(list_toextract[i])
        tracks.append(track)

    # Metemos la info en Dataframe
    data_features = pd.DataFrame(tracks, columns = ['name', 'album', 'artist', 'release_date',
                                                 'length', 'popularity','acousticness', 'danceability', 'energy',
                                                 'instrumentalness', 'liveness', 'loudness',
                                                 'speechiness', 'valence','tempo', 'time_signature', 'id'])

    data_features_final = data_features [['name','artist','album','release_date','length', 'popularity',
                                                 'acousticness', 'danceability', 'energy',
                                                 'instrumentalness', 'liveness', 'loudness',
                                                 'speechiness', 'valence','tempo', 'time_signature', 'id']]

    data_features_final = data_features_final.rename(columns = {'id':'spotify_id'})

    # Finalmente meto el año de Release date (lanzamiento de la canción) en un nueva columna que me va a vernir en uno de los análisis
    data_features_final['release_date_year'] = pd.to_datetime(data_features_final['release_date'])
    data_features_final['release_date_year'] = pd.DatetimeIndex(data_features_final['release_date']).year
    print(data_features_final.shape)
    data_features_final.head(4)

    #Normalizamos features
    features_to_normalize = ['length', 'popularity', 'loudness', 'tempo', 'speechiness']

    data_features_final[features_to_normalize] = data_features_final[features_to_normalize].apply(lambda x:(x-x.min()) / (x.max()-x.min()))

    return data_features_final
    



# Pregunta 4: ¿Qué tipo de música contienen las playlists que se han creado para el tiempo de pandemia?, ¿han cambiado?

Lo primero que vamos a hacer es buscar playlists que se hayan creado "para" el coronavirus. Para eso, nos vamos a fijar en sus títulos: si contienen alguna palabra de las siguientes, nos las quedamos: "covid", "pandemia", "cuarentena",'covid19', "coronavirus", 'fernando simón'.

Posteriormente haremos el proceso de extracción de todo: playlists, canciones y features de canción. Del total de canciones(hay muchas) de las playlist del coronavirus nos quedaremos solo con las más repetidas y en dos lotes, uno peuqeño y otro más grande. La idea es comparar estos dos lotes con el Top100 del 2019, que es una playlist con las canciones más escuchadas durante del 2019.

In [395]:
# Vemos como está montada la estructura de información en la API de Spotify, para buscar una palabra en los nombres de las playlists
print(sp.search('q=fernando simón', type='playlist'))
# ["covid","cuarentena", 'pandemia','covid19', "coronavirus", 'fernando simón']

{'playlists': {'href': 'https://api.spotify.com/v1/search?query=q%3Dfernando+sim%C3%B3n&type=playlist&offset=0&limit=10', 'items': [{'collaborative': False, 'description': '', 'external_urls': {'spotify': 'https://open.spotify.com/playlist/01bawF2DM0YzefGbC54Ua1'}, 'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1', 'id': '01bawF2DM0YzefGbC54Ua1', 'images': [{'height': None, 'url': 'https://i.scdn.co/image/ab67706c0000bebb80c823de1e04b1fe7f2073ce', 'width': None}], 'name': 'la playlist que escucha fernando simón', 'owner': {'display_name': 'mikenuggets', 'external_urls': {'spotify': 'https://open.spotify.com/user/079077tae8iz2j9qk6431p9s2'}, 'href': 'https://api.spotify.com/v1/users/079077tae8iz2j9qk6431p9s2', 'id': '079077tae8iz2j9qk6431p9s2', 'type': 'user', 'uri': 'spotify:user:079077tae8iz2j9qk6431p9s2'}, 'primary_color': None, 'public': None, 'snapshot_id': 'Mjc0LGVlNGQ1YTY3YWIzNGZmMjhiYzBiYzcxM2MxZDBkZWY4YzBjNDhlNmU=', 'tracks': {'href': 'https://api.spotify.co

Creamos un script ya completo para buscar las palabras y sacar solo el total de listas que contiene esta palabra en alguna parte de su información

In [396]:
for word in ["fernando simón"]:
    print(sp.search('q="{}"'.format(word), type='playlist')['playlists']['items'])

[{'collaborative': False, 'description': '', 'external_urls': {'spotify': 'https://open.spotify.com/playlist/01bawF2DM0YzefGbC54Ua1'}, 'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1', 'id': '01bawF2DM0YzefGbC54Ua1', 'images': [{'height': None, 'url': 'https://i.scdn.co/image/ab67706c0000bebb80c823de1e04b1fe7f2073ce', 'width': None}], 'name': 'la playlist que escucha fernando simón', 'owner': {'display_name': 'mikenuggets', 'external_urls': {'spotify': 'https://open.spotify.com/user/079077tae8iz2j9qk6431p9s2'}, 'href': 'https://api.spotify.com/v1/users/079077tae8iz2j9qk6431p9s2', 'id': '079077tae8iz2j9qk6431p9s2', 'type': 'user', 'uri': 'spotify:user:079077tae8iz2j9qk6431p9s2'}, 'primary_color': None, 'public': None, 'snapshot_id': 'Mjc0LGVlNGQ1YTY3YWIzNGZmMjhiYzBiYzcxM2MxZDBkZWY4YzBjNDhlNmU=', 'tracks': {'href': 'https://api.spotify.com/v1/playlists/01bawF2DM0YzefGbC54Ua1/tracks', 'total': 271}, 'type': 'playlist', 'uri': 'spotify:playlist:01bawF2DM0YzefGbC54Ua1'}

Hacemos una extracción masiva de dichas playlist

## Extraer playlists del coronavirus

In [397]:
# Tarda alrededor de 1 minuto en completarse
# Documentación: https://developer.spotify.com/documentation/web-api/reference/playlists/get-playlist/
# ['items']["tracks"]['items'][0]["added_at"]

# Función para extrar determinados features de la playlist
def batch_proccess(x,lista):
    for i in x['playlists']['items']:
        lista.append({"name" : i['name'], "total":i["tracks"]['total'], 'id':i['id'], "uri":i["uri"], "URL":i["tracks"]['href']})

# Sacar un listado de las playlist, en bloques de 50, que es el máximo que te permite la API
Time1 = datetime.now()

list_of_playlists = []
for term in ["covid","pandemia","cuarentena",'covid19', "coronavirus", 'fernando simón']:
    for i in range(0,2000, 50):
        try:
            init_data = sp.search('q="{}"'.format(term), type='playlist', limit=50, offset=i)
            batch_proccess(init_data,list_of_playlists)
        except:
            print("Error")
Time2 = datetime.now()

print("Tiempo ejecución:", Time2 -Time1)

# Meto las playlists en un dataframe y echo un ojo
playlists = pd.DataFrame(list_of_playlists).drop_duplicates().reset_index(drop=True)
print('Número de playlists:', playlists.count())
playlists.head(20)

Tiempo ejecución: 0:00:44.717344
Número de playlists: name     5557
total    5557
id       5557
uri      5557
URL      5557
dtype: int64


Unnamed: 0,name,total,id,uri,URL
0,quarantine Vibes | Covid 19 |,45,6zjC569rXoYGOLwPUt37WW,spotify:playlist:6zjC569rXoYGOLwPUt37WW,https://api.spotify.com/v1/playlists/6zjC569rX...
1,Calma,151,37i9dQZF1DWY5LGZYBBHHz,spotify:playlist:37i9dQZF1DWY5LGZYBBHHz,https://api.spotify.com/v1/playlists/37i9dQZF1...
2,COVID-19 Quarantine Party,29,1Tn6OrkJhOYO5u6y16erqB,spotify:playlist:1Tn6OrkJhOYO5u6y16erqB,https://api.spotify.com/v1/playlists/1Tn6OrkJh...
3,Canciones Mas Virales que el Coronavirus 🦠🎶,106,4Y2h9TZNZersxSYeXujCNu,spotify:playlist:4Y2h9TZNZersxSYeXujCNu,https://api.spotify.com/v1/playlists/4Y2h9TZNZ...
4,Drunksouls COVID19 quarantine discovery: music...,49,64KUBwKGkUEqIU08HDhJ94,spotify:playlist:64KUBwKGkUEqIU08HDhJ94,https://api.spotify.com/v1/playlists/64KUBwKGk...
5,Plenas que no pudimos bailar por COVID,81,2U4J1vAVC8DvV3fPfrgXVy,spotify:playlist:2U4J1vAVC8DvV3fPfrgXVy,https://api.spotify.com/v1/playlists/2U4J1vAVC...
6,chill beats to quarantine to,124,5bPe4ZISvKWU5yTzsSO9lX,spotify:playlist:5bPe4ZISvKWU5yTzsSO9lX,https://api.spotify.com/v1/playlists/5bPe4ZISv...
7,Quarantine covid - 19,100,7qJzyN6atsM49NOWnqGbEK,spotify:playlist:7qJzyN6atsM49NOWnqGbEK,https://api.spotify.com/v1/playlists/7qJzyN6at...
8,MÁS PERREO MENOS COVID,99,1RYr3ngXNoPVtl7blr9Kaf,spotify:playlist:1RYr3ngXNoPVtl7blr9Kaf,https://api.spotify.com/v1/playlists/1RYr3ngXN...
9,CORONAVIRUS QUARANTINE PARTY COVID-19 2020,55,3r9u5bXzAYEvc0xBKkSnAx,spotify:playlist:3r9u5bXzAYEvc0xBKkSnAx,https://api.spotify.com/v1/playlists/3r9u5bXzA...


## Extraer canciones de una cantidad de playlists del coronavirus

In [398]:
# Número de playlists de las que extraer canciones
playlists_extract = playlists[0:4000]

In [399]:
# IMPORTANTE, Tarda alrededor de 30min en sacarlo todo, por eso ejecutamos antes una muestra sólo

#FALTA JUNTAR ARTISTAS CON NOMBRES Y VER CÓMO LO JUNTAMOS EN AL DTAFRAME FINAL
# Sacar las canciones de las listas para tenerlas todas juntas
# Tarda bastante tiempo, alrededor de XX

# Para sacar canciones
Time1 = datetime.now()
songs = []
for uri in playlists_extract['uri']:

        lenght = sp.playlist_tracks(uri)['total']
        for i in range(0,lenght, 50):
            init_data = sp.playlist_tracks(uri, limit=50, offset=i)
            try:
                for k in init_data['items']:
                    songs.append(k['track']["id"])
            except:
                pass

Time2 = datetime.now()
print("Tiempo ejecución:", Time2 -Time1)

print(len(songs))

Tiempo ejecución: 0:25:49.776226
467881


Hacemos un ranking de canciones más repetidas

In [400]:
#Vamos a hacer un de las canciones más repetidas
from collections import Counter
data_playlist_songs = pd.DataFrame.from_dict(Counter(songs), orient='index').reset_index().rename(columns={"index":"spotify_id", 0:"count"})
data_playlist_songs = data_playlist_songs.sort_values(by=['count'], ascending=False)
data_playlist_songs

Unnamed: 0,spotify_id,count
251,2DEZmgHKAvm41k4J3R2E9Y,466
326,0SqqAgdovOE24BzxIClpjw,451
6184,,373
400,7k4t7uLgtOxPwTpFmtJNTY,363
1634,6NfrH0ANGmgBXyxgV2PeXt,294
...,...,...
92374,5fgHGI9EBaQkggzi6Zaq30,1
92375,00Jrzl6fENQd3XEWGDtowy,1
92376,2sKj6wgBUnAQmRwDZkE6ND,1
92377,5MkZkFLnUrsMrVKQwqv5zV,1


Por curiosidad, podemos echar un vistazo a cuáles son las top15 de la cuarentena. Hay que tener en cuenta de que el listado de playlist era de caracter mundial.

In [401]:
#Seleccionamos los 15 primeros
list_toextract_exitos = data_playlist_songs["spotify_id"].tolist()
list_toextract_exitos = list_toextract_exitos[0:16]
print(len(list_toextract_exitos))

# Quitar Nones, hay uno suelto
list_toextract_exitos = list(filter(None, list_toextract_exitos)) 
print(len(list_toextract_exitos))

# Extraer canciones
top_songs_pandemia = extract_songs(list_toextract_exitos)
top_songs_pandemia

16
15
(15, 18)


Unnamed: 0,name,artist,album,release_date,length,popularity,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,time_signature,spotify_id,release_date_year
0,Safaera,Bad Bunny,YHLQMDLG,2020-02-28,1.0,0.266667,0.0103,0.607,0.829,0.0,0.107,0.745668,1.0,0.685,0.031083,4,2DEZmgHKAvm41k4J3R2E9Y,2020
1,Yo Perreo Sola,Bad Bunny,YHLQMDLG,2020-02-28,0.104727,0.266667,0.021,0.86,0.758,6.5e-05,0.344,0.573041,0.067771,0.453,0.039431,4,0SqqAgdovOE24BzxIClpjw,2020
2,Tusa,KAROL G,Tusa,2019-11-07,0.314509,0.466667,0.295,0.803,0.715,0.000134,0.0574,0.87188,0.756024,0.574,0.079795,4,7k4t7uLgtOxPwTpFmtJNTY,2019
3,La Difícil,Bad Bunny,YHLQMDLG,2020-02-28,0.038933,0.066667,0.0861,0.685,0.848,7e-06,0.0783,0.668256,0.116867,0.761,0.860935,4,6NfrH0ANGmgBXyxgV2PeXt,2020
4,Amarillo,J Balvin,Colores,2020-03-19,0.0,0.0,0.013,0.641,0.857,0.00534,0.0695,0.48323,0.76506,0.961,0.294381,5,6zEgnpM0qYmHLDnh8WPejL,2020
5,Blinding Lights,The Weeknd,After Hours,2020-03-20,0.307815,1.0,0.00146,0.514,0.73,9.5e-05,0.0897,0.450008,0.038554,0.334,0.77304,4,0VjIjW4GlUZAMYd2vXMi3b,2020
6,Azul,J Balvin,Colores,2020-03-19,0.350691,0.333333,0.0816,0.843,0.836,0.00138,0.0532,1.0,0.067771,0.65,0.009726,4,2lCkncy6bIB0LTMT7kvrD1,2020
7,Say So,Doja Cat,Hot Pink,2019-11-07,0.583224,0.333333,0.256,0.787,0.673,4e-06,0.0904,0.665713,0.334337,0.786,0.177723,4,3Dv1eDb0MEgF93GpLXlucZ,2019
8,death bed (coffee for your head),Powfu,death bed (coffee for your head),2020-02-08,0.113502,0.666667,0.731,0.726,0.431,0.0,0.696,0.0,0.26506,0.348,0.505547,4,7eJMfftS33KTjuF7lTsMCx,2020
9,Supalonely,BENEE,STELLA & STEVE,2019-11-15,0.478358,0.333333,0.305,0.863,0.631,3e-05,0.123,0.64791,0.019277,0.817,0.356339,4,4nK5YrxbMGZstTLbvj6Gxw,2019


Podemos ver datos que ya sabíamos, como que Bad Bunny lo petó, pero también vemos otros curiosos como la aparición de antiguos éxitos como "U Can't Touch This" o "Don't Stand So Close To Me", que estñan relacionados con al distancia social

Vamos a seguir con nuestro análisis haciendo paquetes con los top100 y top10000 

In [402]:
# Cogemos las 100 canciones que salen más veces y luego las 10000 que salen más veces para extraer sus features
data_playlist_songs_100 = data_playlist_songs[0:100]
data_playlist_songs_100 = data_playlist_songs_100["spotify_id"].tolist()
data_playlist_songs_100 = list(filter(None, data_playlist_songs_100)) 

data_playlist_songs_10000 = data_playlist_songs[0:10000]
data_playlist_songs_10000 = data_playlist_songs_10000["spotify_id"].tolist()
data_playlist_songs_10000 = list(filter(None, data_playlist_songs_10000)) 
len(data_playlist_songs_10000)

9999

## Extracción Features de canciones de Top100 y Top10000 de las playlists de la pandemia

Sacamos las features de cada canción y metemos la información en un dataframe

In [404]:
# Ejecutamos funciones de extraer canciones y normalizar de PAQUETE 1: 100
# IMPORTANTE!!! Tiempo de ejecución:
Time1 = datetime.now()

data_features_songs100 = extract_songs(data_playlist_songs_100)

Time2 = datetime.now()

# Exportamos
data_features_songs100.to_csv("data_ana_playlist_songs100.csv", sep = ',')

print("Tiempo ejecución extracción:", Time2 -Time1)
data_features_songs100.shape

(99, 18)
Tiempo ejecución extracción: 0:00:57.005631


(99, 18)

Echamos un vistazo a las medias, que en ste caso no podemos ponderar porque no tenemos los Streams totales

In [405]:
#Ejecutamos funciones de extraer canciones y normalizar de PAQUETE 2: 10000
# IMPORTANTE!!! Tiempo de ejecución:
Time1= datetime.now()

data_features_songs10000 = extract_songs(data_playlist_songs_10000)

Time2= datetime.now()

# Ha tardado casi 1 hora en extraerse así que lo guardamos CSV
data_features_songs10000.to_csv("data_ana_playlist_songs10000.csv", sep = ',')

print("Tiempo ejecución extracción 100:", Time2 -Time1)

(9999, 18)
Tiempo ejecución extracción 100: 1:35:57.610616


In [408]:
data_features_songs10000

Unnamed: 0,name,artist,album,release_date,length,popularity,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,time_signature,spotify_id,release_date_year
0,Safaera,Bad Bunny,YHLQMDLG,2020-02-28,0.193178,0.87,0.01030,0.607,0.829,0.000000,0.1070,0.805253,0.404951,0.685,0.283458,4,2DEZmgHKAvm41k4J3R2E9Y,2020
1,Yo Perreo Sola,Bad Bunny,YHLQMDLG,2020-02-28,0.104320,0.87,0.02100,0.860,0.758,0.000065,0.3440,0.767371,0.053486,0.453,0.288479,4,0SqqAgdovOE24BzxIClpjw,2020
2,Tusa,KAROL G,Tusa,2019-11-07,0.125141,0.90,0.29500,0.803,0.715,0.000134,0.0574,0.832950,0.312968,0.574,0.312755,4,7k4t7uLgtOxPwTpFmtJNTY,2019
3,La Difícil,Bad Bunny,YHLQMDLG,2020-02-28,0.097790,0.84,0.08610,0.685,0.848,0.000007,0.0783,0.788266,0.071996,0.761,0.782560,4,6NfrH0ANGmgBXyxgV2PeXt,2020
4,Amarillo,J Balvin,Colores,2020-03-19,0.093926,0.83,0.01300,0.641,0.857,0.005340,0.0695,0.747663,0.316375,0.961,0.441815,5,6zEgnpM0qYmHLDnh8WPejL,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,33,Polo G,THE GOAT,2020-05-15,0.102825,0.73,0.12600,0.732,0.462,0.000000,0.1200,0.591879,0.435612,0.117,0.783735,4,2iW6Hb6kfrz2K7EhPibFiq,2020
9995,Return to Oz - ARTBAT Remix,Monolink,Return to Oz (ARTBAT Remix),2019-08-09,0.326646,0.64,0.56300,0.854,0.930,0.697000,0.0815,0.684770,0.024415,0.192,0.449388,4,0gwNGHcBRYtF7mvgUczVo1,2019
9996,Stronger,Kanye West,Graduation,2007-09-11,0.205230,0.02,0.00728,0.625,0.726,0.000000,0.3180,0.677689,0.150579,0.483,0.330018,4,6C7RJEIUDqKkJRZVWdkfkH,2007
9997,Crosstown Traffic,Jimi Hendrix,Electric Ladyland,1968-10-25,0.085867,0.60,0.24800,0.486,0.963,0.000002,0.1580,0.752512,0.121054,0.533,0.384598,4,1ntxpzIUbSsizvuAy6lTYY,1968


## Extracción de canciones y features de playlist Top2019

Sacamos canciones y features del TopTrack 2019 de Spotify, que es una lista concreta de Spotify.

In [406]:
# Toptrack2019
songs_top19 = []
uri = 'spotify:playlist:37i9dQZF1DWTqOoGdpVCf0'
lenght = sp.playlist_tracks(uri)['total']
for i in range(0,lenght, 50):
    init_data = sp.playlist_tracks(uri, limit=50, offset=i)
    try:
        for k in init_data['items']:
            songs_top19.append(k['track']["id"])
    except:
        pass

#Ejecuto función    
Time1 = datetime.now()

data_features_songs100_2019 = extract_songs(songs_top19)

Time2 = datetime.now()

# Ha tardado casi 1 hora en extraerse así que lo guardamos CSV
data_features_songs100_2019.to_csv("data_ana_playlist_songs100_2019.csv", sep = ',')

print("Tiempo ejecución extracción top2019:", Time2 -Time1)
print(data_features_songs100_2019.shape)
data_features_songs100_2019.head(2)

(100, 18)
Tiempo ejecución extracción top2019: 0:00:57.643427
(100, 18)


Unnamed: 0,name,artist,album,release_date,length,popularity,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,time_signature,spotify_id,release_date_year
0,Someone You Loved,Lewis Capaldi,Divinely Uninspired To A Hellish Extent,2019-05-17,0.28095,1.0,0.751,0.501,0.405,0.0,0.105,0.74949,0.006459,0.446,0.306367,4,7qEHsqek33rTcFNT9PFqLf,2019
1,bad guy,Billie Eilish,"WHEN WE ALL FALL ASLEEP, WHERE DO WE GO?",2019-03-29,0.330765,0.977528,0.328,0.701,0.425,0.13,0.1,0.300611,0.770601,0.562,0.505711,4,2Fxmhks0bxGSBdJ92vM42m,2019


## Calculo de media por stream de otro Notebook

In [283]:
data_ana_corona = pd.read_csv("data_global_coronaperiod.csv", sep = ',')

# Quitamos algunas columnas que no usaremos y sobran
data_ana_corona = data_ana_corona.drop(columns=['Unnamed: 0', 'Position'])
print('Tamaño quitando columnas que sobran: ', data_ana_corona.shape)

# Quitamos los duplicados por canciones, no nos sirven de momento para los análisis
data_ana_corona = data_ana_corona.drop_duplicates(subset='spotify_id').copy()
print('Tamaño sin duplicados de canciones: ', data_ana_corona.shape)

# Para analizar teniendo en cuenta los streams totales y que esté ponderado, calculamos los pesos de cada canción respecto a los streams totales
# Importante hacerlo una vez quitados los duplicados
# Versión optimizada corta y sin warnings

data_ana_corona_2020 = data_ana_corona.loc[data_ana_corona['year'] == 2020]
data_ana_corona_2019 = data_ana_corona.loc[data_ana_corona['year'] == 2019]
data_ana_corona_2018 = data_ana_corona.loc[data_ana_corona['year'] == 2018]
data_ana_corona_2017 = data_ana_corona.loc[data_ana_corona['year'] == 2017]

data_ana_corona.loc[data_ana_corona.year == 2020,
                    'streamstotal_weights'] = data_ana_corona_2020['Streamstotal']/data_ana_corona_2020['Streamstotal'].sum()
data_ana_corona.loc[data_ana_corona.year == 2019,
                    'streamstotal_weights'] = data_ana_corona_2019['Streamstotal']/data_ana_corona_2019['Streamstotal'].sum()
data_ana_corona.loc[data_ana_corona.year == 2018,
                    'streamstotal_weights'] = data_ana_corona_2018['Streamstotal']/data_ana_corona_2018['Streamstotal'].sum()
data_ana_corona.loc[data_ana_corona.year == 2017, 
                    'streamstotal_weights'] = data_ana_corona_2017['Streamstotal']/data_ana_corona_2017['Streamstotal'].sum()

Tamaño quitando columnas que sobran:  (48066, 36)
Tamaño sin duplicados de canciones:  (1581, 36)


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [389]:
# Rehago la función
def mediapon_features (feature, dataframe_period):
    media_compar = dataframe_period.loc[:, ['year', feature, 'streamstotal_weights']]
    media_compar["mediaSpecial"] = media_compar[feature] * media_compar['streamstotal_weights']
    media_compar = media_compar.groupby(['year']).sum()
    media_compar.reset_index(inplace=True)
    media_compar = media_compar.drop(columns=[feature, 'streamstotal_weights', 'year'])
    media_compar["features"] = feature
    
   
    return media_compar

'''features = ['length', 'popularity','acousticness', 'danceability', 'energy', 'instrumentalness',
         'liveness', 'loudness', 'speechiness', 'valence', 'tempo']

medias = []
for x in features:
    y = mediapon_features(x, data_ana_corona)[3:]
    medias.append(y)
medias
'''
media_lenght = mediapon_features('length', data_ana_corona)[3:]
media_popularity = mediapon_features('popularity', data_ana_corona)[3:]
media_acousticness = mediapon_features('acousticness', data_ana_corona)[3:]
media_danceability = mediapon_features('danceability', data_ana_corona)[3:]
media_instrumentalness = mediapon_features('instrumentalness', data_ana_corona)[3:]
media_liveness = mediapon_features('liveness', data_ana_corona)[3:]
media_speechiness = mediapon_features('speechiness', data_ana_corona)[3:]
media_valence = mediapon_features('valence', data_ana_corona)[3:]
media_loudness = mediapon_features('loudness', data_ana_corona)[3:]
media_tempo = mediapon_features('tempo', data_ana_corona)[3:]
data_ana_special = pd.concat([media_lenght,
           media_popularity,
           media_acousticness,
           media_danceability,
           media_instrumentalness,
           media_liveness,
           media_speechiness,
           media_valence,
           media_loudness,
           media_tempo,
          ], axis=0)

##### FALTA SUMARLO AL DATAFRAME FINAL TODO JUNTO!!!!!

## Análisis de playlists

In [409]:
# Cargamos dataframes guardados en extracción anterior
data_ana_playlist_songs100 = pd.read_csv("data_ana_playlist_songs100.csv", sep = ',')
data_ana_playlist_songs100 = data_ana_playlist_songs100.drop(columns=['Unnamed: 0', 'time_signature', 'release_date', 'release_date_year'])

data_ana_playlist_songs100_2019 = pd.read_csv("data_ana_playlist_songs100_2019.csv", sep = ',')
data_ana_playlist_songs100_2019 = data_ana_playlist_songs100_2019.drop(columns=['Unnamed: 0', 'time_signature', 'release_date','release_date_year'])

data_ana_playlist_songs10000 = pd.read_csv("data_ana_playlist_songs10000.csv", sep = ',')
data_ana_playlist_songs10000 = data_ana_playlist_songs10000.drop(columns=['Unnamed: 0', 'time_signature', 'release_date','release_date_year'])


In [411]:
data_ana_playlist_means_songs10000

NameError: name 'data_ana_playlist_means_songs10000' is not defined

Calculamos las medias y las exportamos para hacer el análisis visual desde Tableau

In [412]:
#Calculamos medias 
data_ana_playlist_means_songs100 = data_ana_playlist_songs100.mean().reset_index().rename(columns={"index": "features", 0: "mediaTop100_pandemia"})
data_ana_playlist_means_songs100_2019 = data_ana_playlist_songs100_2019.mean().reset_index().rename(columns={"index": "features", 0: "mediaTop100_2019"})
data_ana_playlist_means_songs10000 = data_ana_playlist_songs10000.mean().reset_index().rename(columns={"index": "features", 0: "mediaTop10000_pandemia"})

#Hacemos merge
data_ana_playlist_means_all = pd.merge(data_ana_playlist_means_songs100, data_ana_playlist_means_songs100_2019, on='features')

#Exportamos para analizar en Tableau
#data_ana_playlist_means_all.to_csv("tableau_graph_and_analysis/data_ana_playlist_songs_totableau.csv", sep = ',')

<img src="tableau_graph_and_analysis/dashboard_playlist.png" style="width: 1000px;">

## Conclusiones:

Durante la pandemia se ha escuchado música 