## Top Spotify Songs in 73 countries (Daily Updated)

https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated

Interesting Task Ideas:
1. Identify the most popular genres of music in different countries over time.
2. Analyze the change in rankings of songs over time to identify trends and patterns.
3. Investigate whether there is a correlation between song popularity and its danceability or energy level.
4. Explore the relationship between the explicitness of songs and their popularity.
5. Analyze the relationship between song features (such as acousticness or instrumentalness) and their popularity.
6. Predict the future popularity of songs based on historical data and machine learning algorithms.
7. Compare the top songs in different countries to identify cultural music preferences.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('./universal_top_spotify_songs.csv')

df.info()       # Tiene un total de 40.000 filas y 25 columnas. La columna con mayor número de nulos es country

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40172 entries, 0 to 40171
Data columns (total 25 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   spotify_id          40172 non-null  object 
 1   name                40171 non-null  object 
 2   artists             40171 non-null  object 
 3   daily_rank          40172 non-null  int64  
 4   daily_movement      40172 non-null  int64  
 5   weekly_movement     40172 non-null  int64  
 6   country             39620 non-null  object 
 7   snapshot_date       40172 non-null  object 
 8   popularity          40172 non-null  int64  
 9   is_explicit         40172 non-null  bool   
 10  duration_ms         40172 non-null  int64  
 11  album_name          40171 non-null  object 
 12  album_release_date  40171 non-null  object 
 13  danceability        40172 non-null  float64
 14  energy              40172 non-null  float64
 15  key                 40172 non-null  int64  
 16  loud

In [10]:
df.head()
"""
¿Qué veo? 
La columna de artistas es una cadena de string donde, si hay una colaboración, se muestra una enumeración de artistas 
separados por comas. El principal suele ser el primero. 

"""

'\n¿Qué veo? \nLa columna de artistas es una cadena de string donde, si hay una colaboración, se muestra una enumeración de artistas \nseparados por comas. El principal suele ser el primero. \n\n'

In [16]:
df.head()

Unnamed: 0,spotify_id,name,artists,daily_rank,daily_movement,weekly_movement,country,snapshot_date,popularity,is_explicit,...,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,7x9aauaA9cu6tyfpHnqDLo,Seven (feat. Latto) (Explicit Ver.),"Jung Kook, Latto",1,1,2,,2023-10-28,97,True,...,11,-4.107,1,0.0434,0.311,0.0,0.0815,0.89,124.997,4
1,2HafqoJbgXdtjwCOvNEF14,Si No Estás,iñigo quintero,2,-1,2,,2023-10-28,98,False,...,5,-8.72,1,0.0285,0.827,0.0,0.138,0.524,98.224,4
2,3rUGC1vUpkDG9CZFHMur1t,greedy,Tate McRae,3,0,-1,,2023-10-28,99,True,...,6,-3.18,0,0.0319,0.256,0.0,0.114,0.844,111.018,1
3,7iQXYTyuG13aoeHxGG28Nh,PERRO NEGRO,"Bad Bunny, Feid",4,1,3,,2023-10-28,95,True,...,5,-2.248,1,0.262,0.0887,2.2e-05,0.179,0.345,96.057,4
4,4MjDJD8cW7iVeWInc2Bdyj,MONACO,Bad Bunny,5,-1,-4,,2023-10-28,96,True,...,4,-5.009,0,0.068,0.15,0.000402,0.58,0.13,139.056,4


In [12]:
df['country'].value_counts()   

country
ZA    555
RO    553
AT    553
DO    553
GR    553
     ... 
CZ    550
JP    550
UY    550
ES    550
LU    457
Name: count, Length: 72, dtype: int64

In [13]:
df['name'].value_counts()   # Aquí ya puedes ver las que más se repiten: Paint the town red, greedy, si no estás
                            # strangers y seven.

name
Paint The Town Red                         485
greedy                                     484
Si No Estás                                471
Strangers                                  439
Seven (feat. Latto) (Explicit Ver.)        356
                                          ... 
immer busy                                   1
Get Close                                    1
Januar til December                          1
Bite My Head Off (feat. Paul McCartney)      1
Deimantai                                    1
Name: count, Length: 2208, dtype: int64

In [3]:
df1 = df[['name', 'artists', 'country']]  # Selecciono las columnas que me interesan en un nuevo df

In [4]:
df1.head()
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40172 entries, 0 to 40171
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     40171 non-null  object
 1   artists  40171 non-null  object
 2   country  39620 non-null  object
dtypes: object(3)
memory usage: 941.7+ KB


In [28]:
df1.isnull()    

Unnamed: 0,name,artists,country
0,False,False,True
1,False,False,True
2,False,False,True
3,False,False,True
4,False,False,True
...,...,...,...
40167,False,False,False
40168,False,False,False
40169,False,False,False
40170,False,False,False


In [29]:
null_cols = df1.isnull().sum()
null_cols

name         1
artists      1
country    552
dtype: int64

In [5]:
df2 = df1.dropna().reset_index()   # Me marco todos los valores nulos, estén donde estén y reseteo el index

In [6]:
df2.head()

Unnamed: 0,index,name,artists,country
0,50,iPlan,"Dlala Thukzin, Zaba, Sykes",ZA
1,51,Imnandi lento (feat. Tman Xpress),"Mellow & Sleazy, SjavasDaDeejay, TitoM, Tman X...",ZA
2,52,Water,Tyla,ZA
3,53,Isgubhu,"Sam Deep, Njelic, Aymos",ZA
4,54,Awukhuzeki,"DJ Stokie, Omit ST, Sobzeen, Zee_nhle",ZA


In [7]:
name_country = df.groupby("name")["country"].nunique().reset_index()   
# En name_country meto el resultado de agrupar nombre y el conteno de country, y resetear el index

In [10]:
name_country = name_country.sort_values(by="country", ascending=False)
# Ahora ordeno los valores por total de country en orden descendente

In [11]:
name_country.head() # Con este resultado veo que la canción 'Si No Estás' figura en el TOP50 de 43 países

Unnamed: 0,name,country
1230,Paint The Town Red,45
1840,greedy,45
1453,Si No Estás,43
1522,Strangers,39
1436,Seven (feat. Latto) (Explicit Ver.),37


In [None]:
# Averiguar en qué países es tendencia 'Si no estás'

In [12]:
df2.head()    # Regreso a df2 porque es el dataframe limpio sin nulos pero me muestra el país de cada una

Unnamed: 0,index,name,artists,country
0,50,iPlan,"Dlala Thukzin, Zaba, Sykes",ZA
1,51,Imnandi lento (feat. Tman Xpress),"Mellow & Sleazy, SjavasDaDeejay, TitoM, Tman X...",ZA
2,52,Water,Tyla,ZA
3,53,Isgubhu,"Sam Deep, Njelic, Aymos",ZA
4,54,Awukhuzeki,"DJ Stokie, Omit ST, Sobzeen, Zee_nhle",ZA


In [14]:
cancion = 'Si No Estás'


In [17]:
# Si canción está en columna name, devuelveme country
df.query('name' == 'Si No Estás')

ValueError: expr must be a string to be evaluated, <class 'bool'> given