## Descripción del Dataset

This dataset presents a comprehensive compilation of the most streamed songs on Spotify in 2024. It provides extensive insights into each track's attributes, popularity, and presence on various music platforms, offering a valuable resource for music analysts, enthusiasts, and industry professionals. The dataset includes information such as track name, artist, release date, ISRC, streaming statistics, and presence on platforms like YouTube, TikTok, and more.

Columns:

* Track Name: Name of the song.
* Album Name: Name of the album the song belongs to.
* Artist: Name of the artist(s) of the song.
* Release Date: Date when the song was released.
* ISRC: International Standard Recording Code for the song.
* All Time Rank: Ranking of the song based on its all-time popularity.
* Track Score: Score assigned to the track based on various factors.
* Spotify Streams: Total number of streams on Spotify.
* Spotify Playlist Count: Number of Spotify playlists the song is included in.
* Spotify Playlist Reach: Reach of the song across Spotify playlists.
* Spotify Popularity: Popularity score of the song on Spotify.
* YouTube Views: Total views of the song's official video on YouTube.
* YouTube Likes: Total likes on the song's official video on YouTube.
* TikTok Posts: Number of TikTok posts featuring the song.
* TikTok Likes: Total likes on TikTok posts featuring the song.
* TikTok Views: Total views on TikTok posts featuring the song.
* YouTube Playlist Reach: Reach of the song across YouTube playlists.
* Apple Music Playlist Count: Number of Apple Music playlists the song is included in.
* AirPlay Spins: Number of times the song has been played on radio stations.
* SiriusXM Spins: Number of times the song has been played on SiriusXM.
* Deezer Playlist Count: Number of Deezer playlists the song is included in.
* Deezer Playlist Reach: Reach of the song across Deezer playlists.
* Amazon Playlist Count: Number of Amazon Music playlists the song is included in.
* Pandora Streams: Total number of streams on Pandora.
* Pandora Track Stations: Number of Pandora stations featuring the song.
* Soundcloud Streams: Total number of streams on Soundcloud.
* Shazam Counts: Total number of times the song has been Shazamed.
* TIDAL Popularity: Popularity score of the song on TIDAL.
* Explicit Track: Indicates whether the song contains explicit content.

In [34]:
import pandas as pd

In [35]:
df = pd.read_csv("./datasets/spotify_songs_2024.csv",encoding="unicode_escape")
df.head()

Unnamed: 0,Track,Album Name,Artist,Release Date,ISRC,All Time Rank,Track Score,Spotify Streams,Spotify Playlist Count,Spotify Playlist Reach,...,SiriusXM Spins,Deezer Playlist Count,Deezer Playlist Reach,Amazon Playlist Count,Pandora Streams,Pandora Track Stations,Soundcloud Streams,Shazam Counts,TIDAL Popularity,Explicit Track
0,MILLION DOLLAR BABY,Million Dollar Baby - Single,Tommy Richman,4/26/2024,QM24S2402528,1,725.4,390470936,30716,196631588,...,684,62.0,17598718,114.0,18004655,22931,4818457.0,2669262,,0
1,Not Like Us,Not Like Us,Kendrick Lamar,5/4/2024,USUG12400910,2,545.9,323703884,28113,174597137,...,3,67.0,10422430,111.0,7780028,28444,6623075.0,1118279,,1
2,i like the way you kiss me,I like the way you kiss me,Artemas,3/19/2024,QZJ842400387,3,538.4,601309283,54331,211607669,...,536,136.0,36321847,172.0,5022621,5639,7208651.0,5285340,,0
3,Flowers,Flowers - Single,Miley Cyrus,1/12/2023,USSM12209777,4,444.9,2031280633,269802,136569078,...,2182,264.0,24684248,210.0,190260277,203384,,11822942,,0
4,Houdini,Houdini,Eminem,5/31/2024,USUG12403398,5,423.3,107034922,7223,151469874,...,1,82.0,17660624,105.0,4493884,7006,207179.0,457017,,1


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4600 entries, 0 to 4599
Data columns (total 29 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Track                       4600 non-null   object 
 1   Album Name                  4600 non-null   object 
 2   Artist                      4595 non-null   object 
 3   Release Date                4600 non-null   object 
 4   ISRC                        4600 non-null   object 
 5   All Time Rank               4600 non-null   object 
 6   Track Score                 4600 non-null   float64
 7   Spotify Streams             4487 non-null   object 
 8   Spotify Playlist Count      4530 non-null   object 
 9   Spotify Playlist Reach      4528 non-null   object 
 10  Spotify Popularity          3796 non-null   float64
 11  YouTube Views               4292 non-null   object 
 12  YouTube Likes               4285 non-null   object 
 13  TikTok Posts                3427 

In [37]:
# Descripción de las columnas numericas
df.describe()

Unnamed: 0,Track Score,Spotify Popularity,Apple Music Playlist Count,Deezer Playlist Count,Amazon Playlist Count,TIDAL Popularity,Explicit Track
count,4600.0,3796.0,4039.0,3679.0,3545.0,0.0,4600.0
mean,41.844043,63.501581,54.60312,32.310954,25.348942,,0.358913
std,38.543766,16.186438,71.61227,54.274538,25.989826,,0.479734
min,19.4,1.0,1.0,1.0,1.0,,0.0
25%,23.3,61.0,10.0,5.0,8.0,,0.0
50%,29.9,67.0,28.0,15.0,17.0,,0.0
75%,44.425,73.0,70.0,37.0,34.0,,1.0
max,725.4,96.0,859.0,632.0,210.0,,1.0


## Data cleaning

### Release Date

In [38]:
# Transformar de tipo object a tipo datetime
df["Release Date"] = pd.to_datetime(df["Release Date"])
df["Release Date"] 

0      2024-04-26
1      2024-05-04
2      2024-03-19
3      2023-01-12
4      2024-05-31
          ...    
4595   2017-09-05
4596   2018-07-27
4597   2020-02-28
4598   2016-11-08
4599   2017-04-11
Name: Release Date, Length: 4600, dtype: datetime64[ns]

In [39]:
# Chequeando que los resultados sean consistentes
df["Release Date"].loc[0].strftime('%a %d %b %Y')

'Fri 26 Apr 2024'

### All Time Rank

In [40]:
# Se detectaron datos con coma. Se eliminan las comas y
# se transforma tipo de dato a 'int'
df["All Time Rank"] = df["All Time Rank"].str.replace(",","")
df = df.astype({"All Time Rank":"int"})

### Columnas de datos de Spotify

Son columnas que deben ser numericas pero se detectan como `object`. Estas son *Spotify Streams*, *Spotify Playlist Count* y *Spotify Playlist Reach*. Estas columnas se detectan como `object` debido a que contienen comas como separador de miles, por lo que son representados como cadenas de texto. El procedimiento de limpieza consiste en eliminar las comas y transformar al tipo de dato de entero que corresponda.

Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.

In [41]:
# Se detectaron datos con coma. Se eliminan las comas y se transforma tipo de dato a 'int'
# Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.
df["Spotify Streams"] = df["Spotify Streams"].str.replace(",","")
df["Spotify Playlist Count"] = df["Spotify Playlist Count"].str.replace(",","")
df["Spotify Playlist Reach"] = df["Spotify Playlist Reach"].str.replace(",","")
df = df.astype(
    {
        "Spotify Streams":"Int64",
        "Spotify Playlist Count":"Int64",
        "Spotify Playlist Reach":"Int64"
    }
)


### Columnas de datos de Youtube

Son columnas que deben ser numericas pero se detectan como object. Estas son *YouTube Views* y *YouTube Likes*. Estas columnas se detectan como `object` debido a que contienen comas como separador de miles, por lo que son representados como cadenas de texto. El procedimiento de limpieza consiste en eliminar las comas y transformar al tipo de dato de entero que corresponda.

Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.

In [44]:
# Se detectaron datos con coma. Se eliminan las comas y se transforma tipo de dato a 'int'
# Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.
df["YouTube Views"] = df["YouTube Views"].str.replace(",","")
df["YouTube Likes"] = df["YouTube Likes"].str.replace(",","")
df = df.astype(
    {
        "YouTube Views":"Int64",
        "YouTube Likes":"Int64",
    }
)

### Columnas de datos de TikTok

Son columnas que deben ser numericas pero se detectan como object. Estas son *TikTok Posts*, *TikTok Likes* y *TikTok Views*. Estas columnas se detectan como `object` debido a que contienen comas como separador de miles, por lo que son representados como cadenas de texto. El procedimiento de limpieza consiste en eliminar las comas y transformar al tipo de dato de entero que corresponda.

Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.

In [46]:
# Se detectaron datos con coma. Se eliminan las comas y se transforma tipo de dato a 'int'
# Se transforman los datos a tipo `Int64`, un tipo de dato de entero que permite la incorporación de datos nulos.
df["TikTok Posts"] = df["TikTok Posts"].str.replace(",","")
df["TikTok Likes"] = df["TikTok Likes"].str.replace(",","")
df["TikTok Views"] = df["TikTok Views"].str.replace(",","")
df = df.astype(
    {
        "TikTok Posts":"Int64",
        "TikTok Likes":"Int64",
        "TikTok Views":"Int64"
    }
)

In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4600 entries, 0 to 4599
Data columns (total 29 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   Track                       4600 non-null   object        
 1   Album Name                  4600 non-null   object        
 2   Artist                      4595 non-null   object        
 3   Release Date                4600 non-null   datetime64[ns]
 4   ISRC                        4600 non-null   object        
 5   All Time Rank               4600 non-null   int32         
 6   Track Score                 4600 non-null   float64       
 7   Spotify Streams             4487 non-null   Int64         
 8   Spotify Playlist Count      4530 non-null   Int64         
 9   Spotify Playlist Reach      4528 non-null   Int64         
 10  Spotify Popularity          3796 non-null   float64       
 11  YouTube Views               4292 non-null   Int64       