In [1]:
import pandas as pd
import matplotlib.pyplot as plt

## üéß Explorando dados do Spotify
Aqui, vamos explorar dados extra√≠dos da conta de um usu√°rio do Spotify para entender um pouco do seu perfil e gosto musical. Trabalharemos com dois datasets:
1. **playlists_tracks**: cont√©m a listagem de todas as suas playlists p√∫blicas e as m√∫sicas contidas nelas
2. **genres_artists**: cont√©m a listagem de artistas presentes nas playlists e seus g√™neros

In [28]:
genres_artists = pd.read_csv('datasets/genres_artists.csv')
playlists_tracks = pd.read_csv('datasets/playlists_tracks.csv')

In [29]:
genres_artists.head()

Unnamed: 0.1,Unnamed: 0,artist_name,genres
0,0,Rage Against The Machine,"rap metal, rap rock, alternative metal, nu metal"
1,1,System Of A Down,"nu metal, metal, alternative metal, rap metal,..."
2,2,Metallica,"metal, thrash metal, rock, heavy metal, hard rock"
3,3,Dead Kennedys,"punk, hardcore punk, skate punk, proto-punk"
4,4,The Hives,garage rock


In [30]:
playlists_tracks.head()

Unnamed: 0.1,Unnamed: 0,playlist_id,playlist_name,track_id,track_name,artist_names,album_name,album_release_date,popularity,duration_ms
0,0,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,59WN2psjkt1tyaxjspN8fp,Killing In The Name,Rage Against The Machine,Rage Against The Machine - XX (20th Anniversar...,1992,79,313573
1,1,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,2DlHlPMa4M17kufBvI2lEN,Chop Suey!,System Of A Down,Toxicity,2001-09-04,85,210240
2,2,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,3DwQ7AH3xGD9h65ezslm6q,Enter Sandman - Remastered 2021,Metallica,Metallica (Remastered Deluxe Box Set),2021-09-10,83,331560
3,3,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,1VNWaY3uNfoeWqb5U8x2QX,Lonely Day,System Of A Down,Hypnotize,2005-11-22,82,167906
4,4,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,6capuHDLxgoUUjDD0UhAgN,Holiday in Cambodia,Dead Kennedys,Give Me Convenience or Give Me Death,1987-06-15,53,224866


### Leitura e Vis√£o Geral dos Dados ###

In [31]:
genres_artists.shape

(934, 3)

In [32]:
playlists_tracks.shape

(8661, 10)

In [33]:
genres_artists.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 934 entries, 0 to 933
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unnamed: 0   934 non-null    int64 
 1   artist_name  934 non-null    object
 2   genres       660 non-null    object
dtypes: int64(1), object(2)
memory usage: 22.0+ KB


In [34]:
playlists_tracks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8661 entries, 0 to 8660
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          8661 non-null   int64 
 1   playlist_id         8661 non-null   object
 2   playlist_name       8661 non-null   object
 3   track_id            8661 non-null   object
 4   track_name          8659 non-null   object
 5   artist_names        8659 non-null   object
 6   album_name          8659 non-null   object
 7   album_release_date  8661 non-null   object
 8   popularity          8661 non-null   int64 
 9   duration_ms         8661 non-null   int64 
dtypes: int64(3), object(7)
memory usage: 676.8+ KB


In [38]:
#Quantidade de m√∫sicas por playlist
playlists_tracks['playlist_name'].value_counts()

playlist_name
Caminhando sem padr√£o nenhum                  311
Rockzim pra cantar junto                      300
cantando feat. rebolando                      274
tudo misturado curtindo uma chuva             261
Radio em '00                                  241
                                             ... 
indie mtv 6h em 2008                           18
Nunca canso de ouvir                           16
Instrumental                                   16
M√∫sicas de ouvir 2x pra aproveitar direito     16
Euro trip #3                                    6
Name: count, Length: 89, dtype: int64

### Limpeza e transforma√ß√µes

Antes de analisarmos as playlists e os g√™neros, vamos preparar nossos dados:
1. Extrair apenas o **ano** da data de lan√ßamento dos √°lbuns  
2. Remover m√∫sicas com **nome ausente**  
3. Transformar a tabela de g√™neros para que cada **g√™nero tenha uma linha pr√≥pria**

In [40]:
# 1. Extrair o ano
playlists_tracks['album_released_date_ajusted'] = pd.to_datetime(
    playlists_tracks['album_release_date'], errors="coerce") 

In [43]:
playlists_tracks.columns

Index(['Unnamed: 0', 'playlist_id', 'playlist_name', 'track_id', 'track_name',
       'artist_names', 'album_name', 'album_release_date', 'popularity',
       'duration_ms', 'album_released_date_ajusted'],
      dtype='object')

In [44]:
playlists_tracks['album_released_year'] = playlists_tracks['album_released_date_ajusted'].dt.year 

In [46]:
playlists_tracks.columns

Index(['Unnamed: 0', 'playlist_id', 'playlist_name', 'track_id', 'track_name',
       'artist_names', 'album_name', 'album_release_date', 'popularity',
       'duration_ms', 'album_released_date_ajusted', 'album_released_year'],
      dtype='object')

In [47]:
playlists_tracks.head()

Unnamed: 0.1,Unnamed: 0,playlist_id,playlist_name,track_id,track_name,artist_names,album_name,album_release_date,popularity,duration_ms,album_released_date_ajusted,album_released_year
0,0,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,59WN2psjkt1tyaxjspN8fp,Killing In The Name,Rage Against The Machine,Rage Against The Machine - XX (20th Anniversar...,1992,79,313573,1992-01-01,1992.0
1,1,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,2DlHlPMa4M17kufBvI2lEN,Chop Suey!,System Of A Down,Toxicity,2001-09-04,85,210240,NaT,
2,2,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,3DwQ7AH3xGD9h65ezslm6q,Enter Sandman - Remastered 2021,Metallica,Metallica (Remastered Deluxe Box Set),2021-09-10,83,331560,NaT,
3,3,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,1VNWaY3uNfoeWqb5U8x2QX,Lonely Day,System Of A Down,Hypnotize,2005-11-22,82,167906,NaT,
4,4,2VhIiQbHHMKRPEeCukaeZt,Climinha de presencial,6capuHDLxgoUUjDD0UhAgN,Holiday in Cambodia,Dead Kennedys,Give Me Convenience or Give Me Death,1987-06-15,53,224866,NaT,


In [50]:
# 2. Excluir linhas com track_name ausente
playlists_tracks_clean = playlists_tracks.dropna(subset=['track_name'])
playlists_tracks_clean.shape

(8659, 12)

In [49]:
# 3. Explodir os g√™neros (1 linha por artista e g√™nero)


(8659, 12)

### Visualiza√ß√£o com Matplotlib ###

#### Quais playlists t√™m mais m√∫sicas? ####

#### Como evolui a quantidade de m√∫sicas lan√ßadas por ano nas playlists? ####

#### Quais s√£o os artistas com maior popularidade m√©dia? ####

#### Quais s√£o os artistas mais frequentes nas playlists? ####

#### Quais g√™neros s√£o mais frequentes entre os artistas das playlists? ####

#### Qual √© a dura√ß√£o m√©dia das m√∫sicas por playlist? ####

#### Exerc√≠cio: Existe rela√ß√£o entre o ano de lan√ßamento e a dura√ß√£o m√©dia das m√∫sicas? ####

#### Exerc√≠cio: Qu√£o diversa em g√™neros √© cada playlist? ####
Em outras palavras, quais playlists t√™m mais g√™neros distintos entre os artistas que aparecem nelas?