# **Introdução ao Pandas**

Pandas é uma biblioteca de código aberto em Python que oferece estruturas de dados de alto desempenho e ferramentas de análise de dados. Ela é amplamente utilizada na ciência de dados e em análises estatísticas devido à sua eficiência, flexibilidade e facilidade de uso. Desenvolvida por Wes McKinney em 2008, a Pandas tem se tornado a principal escolha para manipulação e análise de dados em Python.

A importância da biblioteca Pandas para a ciência de dados é imensa, e isso se deve a várias razões:

+ **Estruturas de Dados Flexíveis:** A Pandas fornece duas estruturas de dados principais: Series e DataFrame. A Series é uma matriz unidimensional rotulada capaz de armazenar qualquer tipo de dado. O DataFrame é uma estrutura de dados tabular bidimensional semelhante a uma planilha do Excel, onde cada coluna pode ser de um tipo diferente. Essas estruturas de dados são altamente flexíveis e permitem manipular e analisar dados de maneira eficaz.

+ **Manipulação de Dados:** Pandas oferece uma ampla gama de funcionalidades para manipulação de dados, incluindo filtragem, seleção, ordenação, agrupamento e fusão de dados. Com métodos poderosos e intuitivos, os usuários podem manipular conjuntos de dados de maneira eficiente, realizando operações complexas em apenas algumas linhas de código.

+ **Limpeza de Dados:** Antes de qualquer análise de dados, é crucial limpar e preparar os dados. Pandas oferece ferramentas para lidar com dados ausentes, duplicados, inconsistentes e até mesmo outliers. Isso permite que os cientistas de dados preparem os dados para análise de forma rápida e eficiente.

+ **Integração com Outras Ferramentas:** Pandas é frequentemente utilizada em conjunto com outras bibliotecas populares de ciência de dados em Python, como NumPy, Matplotlib, Seaborn e Scikit-learn. Essa integração perfeita permite uma análise de dados completa, desde a limpeza e manipulação até a visualização e modelagem.

+ **Visualização de Dados:** Embora não seja uma biblioteca de visualização, Pandas se integra facilmente com outras bibliotecas de visualização de dados em Python, permitindo que os usuários criem gráficos e plots informativos para explorar e comunicar seus resultados.

+ **Eficiência Computacional:** Pandas é altamente otimizada para desempenho e escalabilidade. Ela é construída sobre a biblioteca NumPy, o que significa que muitas operações são executadas em código de baixo nível, proporcionando desempenho excepcional, mesmo para conjuntos de dados grandes.



**Importando a biblioteca**

In [1]:
import pandas as pd
pd.options.mode.copy_on_write = True

**Carregando os dados**

O método `pd.read_csv()` é uma função da biblioteca Pandas em Python que permite ler arquivos CSV (Comma-Separated Values) e convertê-los em um DataFrame do Pandas. Ele é usado para importar dados de arquivos CSV diretamente para o ambiente de trabalho do Python para análise de dados.

Quando você chama `pd.read_csv('nome_do_arquivo.csv')`, o Pandas lê o arquivo CSV especificado e cria um DataFrame, onde cada linha do arquivo CSV é uma linha no DataFrame e os valores separados por vírgula em cada linha são automaticamente divididos em colunas.

Este método oferece uma variedade de parâmetros opcionais para personalizar a leitura do arquivo CSV, permitindo que você especifique separadores de campos diferentes de vírgula, tratamento de valores ausentes, tipos de dados para colunas, entre outros.

Em resumo, o método `pd.read_csv()` é uma ferramenta essencial para importar dados de arquivos CSV para o Python e começar a trabalhar com eles usando Pandas.

In [2]:
df = pd.read_csv("./top10s.csv",
                 encoding="ISO-8859-1")

**Visualizando a estrutura do DataFrame**

O método `df.head()` é uma função da biblioteca Pandas em Python que é utilizada para visualizar as primeiras linhas de um DataFrame. Quando chamado em um DataFrame (`df`), ele retorna por padrão as cinco primeiras linhas do DataFrame, permitindo uma rápida visualização dos dados. Isso é útil para ter uma ideia inicial da estrutura e do conteúdo dos dados, especialmente em conjuntos de dados grandes. Além disso, é possível passar um argumento opcional para especificar o número de linhas a serem exibidas, como por exemplo `df.head(4)` para mostrar as dez primeiras linhas. Em suma, o método `df.head()` é uma ferramenta essencial para a exploração inicial de dados em Pandas.

In [3]:
df.head(4)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79


**Visualizando as informações do DataFrame**

O método `df.info()` é uma função fundamental da biblioteca Pandas em Python, usada para fornecer um resumo conciso e informativo sobre um DataFrame. Quando chamado em um DataFrame, o método `df.info()` exibe informações essenciais, como o tipo de dados de cada coluna, a quantidade de valores não nulos em cada coluna e a quantidade total de memória utilizada pelo DataFrame. Essa função é útil para entender rapidamente a estrutura dos dados, identificar possíveis problemas de tipo de dados ou valores ausentes e otimizar o uso de memória. Em resumo, o método `df.info()` é uma ferramenta valiosa para a análise inicial de conjuntos de dados em Pandas.

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  603 non-null    int64 
 1   title       603 non-null    object
 2   artist      603 non-null    object
 3   top genre   603 non-null    object
 4   year        603 non-null    int64 
 5   bpm         603 non-null    int64 
 6   nrgy        603 non-null    int64 
 7   dnce        603 non-null    int64 
 8   dB          603 non-null    int64 
 9   live        603 non-null    int64 
 10  val         603 non-null    int64 
 11  dur         603 non-null    int64 
 12  acous       603 non-null    int64 
 13  spch        603 non-null    int64 
 14  pop         603 non-null    int64 
dtypes: int64(12), object(3)
memory usage: 70.8+ KB


In [5]:
df2 = pd.read_csv("./SpotifyTopSongsByCountry - May 2020.csv",
                  encoding="ISO-8859-1")

In [6]:
df2.head()

Unnamed: 0,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration
0,Global,Global,1,Rain On Me (with Ariana Grande),"Lady Gaga, Ariana Grande",Rain On Me (with Ariana Grande),0,3:02
1,Global,Global,2,Blinding Lights,The Weeknd,After Hours,0,3:20
2,Global,Global,3,ROCKSTAR (feat. Roddy Ricch),"DaBaby, Roddy Ricch",BLAME IT ON BABY,1,3:01
3,Global,Global,4,Roses - Imanbek Remix,"SAINt JHN, Imanbek",Roses (Imanbek Remix),1,2:56
4,Global,Global,5,Toosie Slide,Drake,Dark Lane Demo Tapes,1,4:07


In [7]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3150 entries, 0 to 3149
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Country    3150 non-null   object
 1   Continent  3150 non-null   object
 2   Rank       3150 non-null   int64 
 3   Title      3150 non-null   object
 4   Artists    3150 non-null   object
 5   Album      3150 non-null   object
 6   Explicit   3150 non-null   int64 
 7   Duration   3150 non-null   object
dtypes: int64(2), object(6)
memory usage: 197.0+ KB


O dtype mostra de forma mais resumida os tipos de dados de cada coluna

In [8]:
df2.dtypes

Country      object
Continent    object
Rank          int64
Title        object
Artists      object
Album        object
Explicit      int64
Duration     object
dtype: object

In [9]:
df2.select_dtypes(include="int64")

Unnamed: 0,Rank,Explicit
0,1,0
1,2,0
2,3,1
3,4,1
4,5,1
...,...,...
3145,46,0
3146,47,0
3147,48,0
3148,49,0


In [10]:
df2.select_dtypes(include="object")

Unnamed: 0,Country,Continent,Title,Artists,Album,Duration
0,Global,Global,Rain On Me (with Ariana Grande),"Lady Gaga, Ariana Grande",Rain On Me (with Ariana Grande),3:02
1,Global,Global,Blinding Lights,The Weeknd,After Hours,3:20
2,Global,Global,ROCKSTAR (feat. Roddy Ricch),"DaBaby, Roddy Ricch",BLAME IT ON BABY,3:01
3,Global,Global,Roses - Imanbek Remix,"SAINt JHN, Imanbek",Roses (Imanbek Remix),2:56
4,Global,Global,Toosie Slide,Drake,Dark Lane Demo Tapes,4:07
...,...,...,...,...,...,...
3145,Vietnam,Asia,ÄÃ Tá»ªNG LÃ,VÅ©.,ÄÃ Tá»ªNG LÃ,4:20
3146,Vietnam,Asia,MÆ°á»£n RÆ°á»£u Tá» TÃ¬nh,"BigDaddy, Emily",MÆ°á»£n RÆ°á»£u Tá» TÃ¬nh,3:18
3147,Vietnam,Asia,NgÃ y Táº­n Tháº¿,"TÃ³c TiÃªn, Da LAB, Touliver",NgÃ y Táº­n Tháº¿,3:52
3148,Vietnam,Asia,Äi Äu ÄÆ°a Äi,Bich Phuong,Äi Äu ÄÆ°a Äi,3:40


In [11]:
df2.describe()

Unnamed: 0,Rank,Explicit
count,3150.0,3150.0
mean,25.5,0.348889
std,14.433161,0.476694
min,1.0,0.0
25%,13.0,0.0
50%,25.5,0.0
75%,38.0,1.0
max,50.0,1.0


In [12]:
df.describe()

Unnamed: 0.1,Unnamed: 0,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
count,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0,603.0
mean,302.0,2014.59204,118.545605,70.504146,64.379768,-5.578773,17.774461,52.225539,224.674959,14.3267,8.358209,66.52073
std,174.215384,2.607057,24.795358,16.310664,13.378718,2.79802,13.102543,22.51302,34.130059,20.766165,7.483162,14.517746
min,1.0,2010.0,0.0,0.0,0.0,-60.0,0.0,0.0,134.0,0.0,0.0,0.0
25%,151.5,2013.0,100.0,61.0,57.0,-6.0,9.0,35.0,202.0,2.0,4.0,60.0
50%,302.0,2015.0,120.0,74.0,66.0,-5.0,12.0,52.0,221.0,6.0,5.0,69.0
75%,452.5,2017.0,129.0,82.0,73.0,-4.0,24.0,69.0,239.5,17.0,9.0,76.0
max,603.0,2019.0,206.0,98.0,97.0,-2.0,74.0,98.0,424.0,99.0,48.0,99.0


In [13]:
df2.isna().sum()

Country      0
Continent    0
Rank         0
Title        0
Artists      0
Album        0
Explicit     0
Duration     0
dtype: int64

In [14]:
df["artist"].unique()

array(['Train', 'Eminem', 'Kesha', 'Lady Gaga', 'Bruno Mars',
       'Justin Bieber', 'Taio Cruz', 'OneRepublic', 'Alicia Keys',
       'Rihanna', 'Flo Rida', 'Mike Posner', 'Far East Movement', 'Usher',
       'Sean Kingston', 'The Black Eyed Peas', 'Adam Lambert', 'Maroon 5',
       'Neon Trees', 'Selena Gomez & The Scene', 'Enrique Iglesias',
       'Katy Perry', 'Britney Spears', '3OH!3', 'David Guetta',
       'Christina Aguilera', 'Florence + The Machine', 'Shakira',
       'Tinie Tempah', 'T.I.', 'Martin Solveig', 'Christina Perri',
       'Adele', 'Pitbull', 'Beyoncé', 'Hot Chelle Rae', 'Avril Lavigne',
       'Kanye West', 'LMFAO', 'Jessie J', 'Jennifer Lopez', 'Chris Brown',
       'Sleeping At Last', 'Nicki Minaj', 'P!nk', 'Coldplay',
       'One Direction', 'Taylor Swift', 'Carly Rae Jepsen',
       'Kelly Clarkson', 'Owl City', 'The Wanted', 'fun.',
       'Ellie Goulding', 'Gym Class Heroes', 'Avicii', 'The Script',
       'Miley Cyrus', 'Swedish House Mafia', 'Daft Punk'

In [15]:
df.size

9045

In [16]:
df.columns

Index(['Unnamed: 0', 'title', 'artist', 'top genre', 'year', 'bpm', 'nrgy',
       'dnce', 'dB', 'live', 'val', 'dur', 'acous', 'spch', 'pop'],
      dtype='object')

In [17]:
df.nsmallest(3, "nrgy")

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
442,443,Million Years Ago,Adele,british soul,2016,0,0,0,-60,0,0,227,0,0,0
431,432,Start,John Legend,neo mellow,2016,110,4,52,-15,9,26,310,99,4,47
255,256,Not About Angels,Birdy,neo mellow,2014,116,14,41,-10,9,23,190,97,4,56


In [18]:
df_ops = df.drop_duplicates(subset="title", keep="last")
display(df_ops)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,16,176,1,3,75
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,61,206,21,12,75
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,65,260,7,34,70
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,24,217,1,7,69


In [19]:
df_ops.fillna(0)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,16,176,1,3,75
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,61,206,21,12,75
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,65,260,7,34,70
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,24,217,1,7,69


In [20]:
df_ops.T

Unnamed: 0,0,1,2,3,5,6,7,8,9,10,...,593,594,595,596,597,598,599,600,601,602
Unnamed: 0,1,2,3,4,6,7,8,9,10,11,...,594,595,596,597,598,599,600,601,602,603
title,"Hey, Soul Sister",Love The Way You Lie,TiK ToK,Bad Romance,Baby,Dynamite,Secrets,Empire State of Mind (Part II) Broken Down,Only Girl (In The World),Club Can't Handle Me (feat. David Guetta),...,Call You Mine,No Guidance (feat. Drake),Antisocial (with Travis Scott),"Taki Taki (feat. Selena Gomez, Ozuna & Cardi B)",Con Calma - Remix,Find U Again (feat. Camila Cabello),Cross Me (feat. Chance the Rapper & PnB Rock),"No Brainer (feat. Justin Bieber, Chance the Ra...",Nothing Breaks Like a Heart (feat. Miley Cyrus),Kills You Slowly
artist,Train,Eminem,Kesha,Lady Gaga,Justin Bieber,Taio Cruz,OneRepublic,Alicia Keys,Rihanna,Flo Rida,...,The Chainsmokers,Chris Brown,Ed Sheeran,DJ Snake,Daddy Yankee,Mark Ronson,Ed Sheeran,DJ Khaled,Mark Ronson,The Chainsmokers
top genre,neo mellow,detroit hip hop,dance pop,dance pop,canadian pop,dance pop,dance pop,hip pop,barbadian pop,dance pop,...,electropop,dance pop,pop,electronic trap,latin,dance pop,pop,dance pop,dance pop,electropop
year,2010,2010,2010,2010,2010,2010,2010,2010,2010,2010,...,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019
bpm,97,87,120,119,65,120,148,93,126,128,...,104,93,152,96,94,104,95,136,114,150
nrgy,89,93,84,92,86,78,76,37,72,87,...,70,45,82,80,87,66,79,76,79,44
dnce,67,75,76,70,73,75,52,48,79,62,...,59,70,72,84,74,61,75,53,60,70
dB,-4,-5,-3,-4,-5,-4,-6,-8,-4,-4,...,-6,-7,-5,-4,-3,-7,-6,-5,-6,-9
live,8,52,29,8,11,4,12,12,7,6,...,34,16,36,6,4,20,7,9,42,13


In [21]:
df_ops.sample(n=10)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
364,365,Love Yourself,Justin Bieber,canadian pop,2016,100,38,61,-10,28,52,234,84,44,83
41,42,Doesn't Mean Anything,Alicia Keys,hip pop,2010,104,41,71,-7,10,10,273,4,3,57
185,186,Made In The USA,Demi Lovato,dance pop,2013,87,86,58,-4,32,59,196,0,4,60
318,319,Easy Love,Sigala,dance pop,2015,124,94,68,-4,12,65,230,18,6,67
194,195,Take Back the Night,Justin Timberlake,dance pop,2013,107,66,59,-5,64,33,353,4,16,54
211,212,Stay With Me,Sam Smith,pop,2014,84,42,42,-6,11,18,173,59,4,85
56,57,Run the World (Girls),Beyoncé,dance pop,2011,127,90,73,-4,37,76,236,0,14,76
313,314,Lips Are Movin,Meghan Trainor,dance pop,2015,139,83,78,-5,11,95,183,5,5,68
126,127,Where Have You Been,Rihanna,barbadian pop,2012,128,85,72,-6,22,44,243,0,9,68
28,29,Teenage Dream,Katy Perry,dance pop,2010,120,80,72,-5,13,59,228,2,4,63


In [22]:
len(df_ops)

584

In [23]:
df.query("artist == 'Ariana Grande'")

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
171,172,The Way,Ariana Grande,dance pop,2013,82,88,65,-3,8,86,227,29,11,68
221,222,Problem,Ariana Grande,dance pop,2014,103,81,66,-5,16,63,194,2,15,75
287,288,Love Me Harder,Ariana Grande,dance pop,2015,99,71,47,-4,8,24,236,1,3,76
291,292,Break Free,Ariana Grande,dance pop,2015,130,70,69,-5,20,28,215,1,5,75
323,324,Focus,Ariana Grande,dance pop,2015,100,88,67,-6,44,79,211,27,24,66
368,369,Into You,Ariana Grande,dance pop,2016,108,73,62,-6,14,37,244,2,11,80
381,382,Dangerous Woman,Ariana Grande,dance pop,2016,134,60,66,-5,36,29,236,5,4,78
451,452,Side To Side,Ariana Grande,dance pop,2017,159,74,65,-6,24,61,226,5,23,80
513,514,no tears left to cry,Ariana Grande,dance pop,2018,122,71,70,-6,29,35,206,4,6,84


In [24]:
df.query("dur > 200")

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
596,597,"Taki Taki (feat. Selena Gomez, Ozuna & Cardi B)",DJ Snake,electronic trap,2019,96,80,84,-4,6,62,213,16,23,77
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,61,206,21,12,75
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,65,260,7,34,70
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,24,217,1,7,69


In [25]:
df.query("dur < 200")

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
19,20,Your Love Is My Drug,Kesha,dance pop,2010,120,61,83,-4,9,76,187,1,10,69
31,32,My First Kiss - feat. Ke$ha,3OH!3,dance pop,2010,138,89,68,-4,36,83,192,1,8,62
32,33,Blah Blah Blah (feat. 3OH!3),Kesha,dance pop,2010,120,84,75,-3,42,52,172,8,12,62
35,36,Sexy Bitch (feat. Akon),David Guetta,dance pop,2010,130,63,81,-5,13,80,196,8,5,61
40,41,Something's Got A Hold On Me - Burlesque Origi...,Christina Aguilera,dance pop,2010,150,85,51,-4,12,72,185,47,27,58
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
588,589,Talk (feat. Disclosure),Khalid,alternative r&b,2019,136,40,90,-9,6,35,198,5,13,84
591,592,All Around The World (La La La),R3HAB,big room,2019,125,86,73,-5,11,52,148,48,3,82
595,596,Antisocial (with Travis Scott),Ed Sheeran,pop,2019,152,82,72,-5,36,91,162,13,5,78
597,598,Con Calma - Remix,Daddy Yankee,latin,2019,94,87,74,-3,4,61,181,17,5,76


In [26]:
df.loc[df["artist"]=="Ariana Grande"]

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
171,172,The Way,Ariana Grande,dance pop,2013,82,88,65,-3,8,86,227,29,11,68
221,222,Problem,Ariana Grande,dance pop,2014,103,81,66,-5,16,63,194,2,15,75
287,288,Love Me Harder,Ariana Grande,dance pop,2015,99,71,47,-4,8,24,236,1,3,76
291,292,Break Free,Ariana Grande,dance pop,2015,130,70,69,-5,20,28,215,1,5,75
323,324,Focus,Ariana Grande,dance pop,2015,100,88,67,-6,44,79,211,27,24,66
368,369,Into You,Ariana Grande,dance pop,2016,108,73,62,-6,14,37,244,2,11,80
381,382,Dangerous Woman,Ariana Grande,dance pop,2016,134,60,66,-5,36,29,236,5,4,78
451,452,Side To Side,Ariana Grande,dance pop,2017,159,74,65,-6,24,61,226,5,23,80
513,514,no tears left to cry,Ariana Grande,dance pop,2018,122,71,70,-6,29,35,206,4,6,84


In [27]:
df.isin(["Ariana Grande", "Katy Perry"])

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
599,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
600,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
601,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [28]:
artistas = ["Ariana Grande", "Katy Perry"]
df.isin(artistas)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
599,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
600,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
601,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [29]:
artistas = ["Ariana Grande", "Katy Perry"]
df["artist"].isin(artistas)

0      False
1      False
2      False
3      False
4      False
       ...  
598    False
599    False
600    False
601    False
602    False
Name: artist, Length: 603, dtype: bool

In [30]:
artistas = ["Ariana Grande", "Katy Perry"]
df[df["artist"].isin(artistas)]

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
28,29,Teenage Dream,Katy Perry,dance pop,2010,120,80,72,-5,13,59,228,2,4,63
29,30,California Gurls,Katy Perry,dance pop,2010,125,75,79,-4,18,40,235,0,5,62
77,78,E.T.,Katy Perry,dance pop,2011,152,87,62,-5,37,76,230,2,18,66
101,102,Last Friday Night (T.G.I.F.),Katy Perry,dance pop,2011,126,81,65,-4,67,72,231,0,4,27
102,103,Firework,Katy Perry,dance pop,2011,124,83,64,-5,11,65,228,14,5,25
124,125,Part Of Me,Katy Perry,dance pop,2012,130,92,68,-5,7,77,216,0,4,71
127,128,Wide Awake,Katy Perry,dance pop,2012,160,68,51,-5,39,57,221,7,4,68
128,129,The One That Got Away,Katy Perry,dance pop,2012,134,80,69,-4,16,88,227,0,4,67
144,145,Roar,Katy Perry,dance pop,2013,180,77,55,-5,35,46,224,0,4,78
171,172,The Way,Ariana Grande,dance pop,2013,82,88,65,-3,8,86,227,29,11,68


In [31]:
(df["artist"]=="Katy Perry").any()

True

In [32]:
(df["artist"]=="Queen").any()

False

In [33]:
df_ops.drop(columns=["spch"], inplace=True)

In [34]:
df_ops.head()

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,79
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,77


In [35]:
df_ops["dur"].mean()

224.63698630136986

In [36]:
df_ops.dtypes

Unnamed: 0     int64
title         object
artist        object
top genre     object
year           int64
bpm            int64
nrgy           int64
dnce           int64
dB             int64
live           int64
val            int64
dur            int64
acous          int64
pop            int64
dtype: object

In [37]:
df_ops.select_dtypes(include="int64").mean()

Unnamed: 0     304.760274
year          2014.631849
bpm            118.592466
nrgy            70.501712
dnce            64.508562
dB              -5.571918
live            17.902397
val             52.277397
dur            224.636986
acous           14.152397
pop             66.529110
dtype: float64

In [38]:
df_ops.agg({"acous": "mean",
            "dur": "max"})

acous     14.152397
dur      424.000000
dtype: float64

In [39]:
df_ops["dur"].cumsum()

0         217
1         480
2         680
3         975
5        1189
        ...  
598    130292
599    130498
600    130758
601    130975
602    131188
Name: dur, Length: 584, dtype: int64

In [40]:
df_ops.sort_values(by="dur", ascending=False)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,pop
188,189,TKO,Justin Timberlake,dance pop,2013,138,68,61,-7,43,49,424,1,58
422,423,Wish That You Were Here - From Miss Peregrine...,Florence + The Machine,art pop,2016,94,57,37,-6,13,12,403,72,57
63,64,Monster,Kanye West,chicago rap,2011,125,69,63,-6,67,10,379,0,73
162,163,Lose Yourself to Dance,Daft Punk,electro,2013,100,66,83,-8,8,67,354,8,72
194,195,Take Back the Night,Justin Timberlake,dance pop,2013,107,66,59,-5,64,33,353,4,54
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
581,582,Good as Hell (feat. Ariana Grande) - Remix,Lizzo,escape room,2019,96,89,67,-3,74,48,159,30,90
492,493,Reality (feat. Janieck Devy) - Radio Edit,Lost Frequencies,belgian edm,2017,122,64,73,-7,8,53,158,2,59
174,175,I Love It (feat. Charli XCX),Icona Pop,candy pop,2013,126,91,71,-3,15,82,157,1,67
591,592,All Around The World (La La La),R3HAB,big room,2019,125,86,73,-5,11,52,148,48,82


In [41]:
    df_ops["year"].astype(str)

0      2010
1      2010
2      2010
3      2010
5      2010
       ... 
598    2019
599    2019
600    2019
601    2019
602    2019
Name: year, Length: 584, dtype: object

In [42]:
pd.get_dummies(df, columns=["artist"])

Unnamed: 0.1,Unnamed: 0,title,top genre,year,bpm,nrgy,dnce,dB,live,val,...,artist_Train,artist_Troye Sivan,artist_Usher,artist_Wiz Khalifa,artist_Years & Years,artist_ZAYN,artist_Zara Larsson,artist_Zedd,artist_fun.,artist_will.i.am
0,1,"Hey, Soul Sister",neo mellow,2010,97,89,67,-4,8,80,...,True,False,False,False,False,False,False,False,False,False
1,2,Love The Way You Lie,detroit hip hop,2010,87,93,75,-5,52,64,...,False,False,False,False,False,False,False,False,False,False
2,3,TiK ToK,dance pop,2010,120,84,76,-3,29,71,...,False,False,False,False,False,False,False,False,False,False
3,4,Bad Romance,dance pop,2010,119,92,70,-4,8,71,...,False,False,False,False,False,False,False,False,False,False
4,5,Just the Way You Are,pop,2010,109,84,64,-5,9,43,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,599,Find U Again (feat. Camila Cabello),dance pop,2019,104,66,61,-7,20,16,...,False,False,False,False,False,False,False,False,False,False
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),pop,2019,95,79,75,-6,7,61,...,False,False,False,False,False,False,False,False,False,False
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",dance pop,2019,136,76,53,-5,9,65,...,False,False,False,False,False,False,False,False,False,False
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),dance pop,2019,114,79,60,-6,42,24,...,False,False,False,False,False,False,False,False,False,False


In [43]:
df_ops.assign(energy_dance = lambda df: df["dnce"]*df["nrgy"])

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,pop,energy_dance
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,83,5963
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,82,6975
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,80,6384
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,79,6440
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,77,6278
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,16,176,1,75,4026
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,61,206,21,75,5925
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,65,260,7,70,4028
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,24,217,1,69,4740


In [44]:
df.groupby("artist")["title"].count()

artist
3OH!3                   1
5 Seconds of Summer     1
A Great Big World       1
Adam Lambert            2
Adele                  10
                       ..
ZAYN                    5
Zara Larsson            1
Zedd                    6
fun.                    2
will.i.am               1
Name: title, Length: 184, dtype: int64

In [45]:
df.groupby(["artist", "year"])["title"].count()

artist               year
3OH!3                2010    1
5 Seconds of Summer  2014    1
A Great Big World    2014    1
Adam Lambert         2010    2
Adele                2011    2
                            ..
Zedd                 2015    1
                     2016    1
                     2017    2
fun.                 2012    2
will.i.am            2013    1
Name: title, Length: 409, dtype: int64

In [46]:
df.head(4)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79


In [47]:
df.groupby(by= "artist").agg({"year": "count",
                              "dur": "mean"})

Unnamed: 0_level_0,year,dur
artist,Unnamed: 1_level_1,Unnamed: 2_level_1
3OH!3,1,192.0
5 Seconds of Summer,1,202.0
A Great Big World,1,229.0
Adam Lambert,2,227.5
Adele,10,257.5
...,...,...
ZAYN,5,213.0
Zara Larsson,1,213.0
Zedd,6,228.0
fun.,2,264.0


In [48]:
df_grouped_year = df.groupby("year")

In [49]:
df_grouped_year.ngroups

10

In [50]:
df_grouped_year.size()

year
2010    51
2011    53
2012    35
2013    71
2014    58
2015    95
2016    80
2017    65
2018    64
2019    31
dtype: int64

In [51]:
df_grouped_year.groups

{2010: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50], 2011: [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 2012: [104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138], 2013: [139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209], 2014: [210, 211, 212, 213, 

In [52]:
df_grouped_year.get_group(2010).head(4)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79


## Merging datasets

In [53]:
df.head(2)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82


In [54]:
df2.head(2)

Unnamed: 0,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration
0,Global,Global,1,Rain On Me (with Ariana Grande),"Lady Gaga, Ariana Grande",Rain On Me (with Ariana Grande),0,3:02
1,Global,Global,2,Blinding Lights,The Weeknd,After Hours,0,3:20


In [55]:
df.merge(df2, left_on="artist", right_on="Artists", how="left").fillna(0)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,...,spch,pop,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,...,4,83,0,0,0.0,0,0,0,0.0,0
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,...,23,82,0,0,0.0,0,0,0,0.0,0
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,...,14,80,0,0,0.0,0,0,0,0.0,0
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,...,4,79,0,0,0.0,0,0,0,0.0,0
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,...,4,78,0,0,0.0,0,0,0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1858,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,...,3,75,0,0,0.0,0,0,0,0.0,0
1859,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,...,12,75,0,0,0.0,0,0,0,0.0,0
1860,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,...,34,70,0,0,0.0,0,0,0,0.0,0
1861,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,...,7,69,0,0,0.0,0,0,0,0.0,0


In [56]:
df.merge(df2, left_on="artist", right_on="Artists", how="left", indicator=True).fillna(0)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,...,pop,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration,_merge
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,...,83,0,0,0.0,0,0,0,0.0,0,left_only
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,...,82,0,0,0.0,0,0,0,0.0,0,left_only
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,...,80,0,0,0.0,0,0,0,0.0,0,left_only
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,...,79,0,0,0.0,0,0,0,0.0,0,left_only
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,...,78,0,0,0.0,0,0,0,0.0,0,left_only
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1858,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,...,75,0,0,0.0,0,0,0,0.0,0,left_only
1859,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,...,75,0,0,0.0,0,0,0,0.0,0,left_only
1860,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,...,70,0,0,0.0,0,0,0,0.0,0,left_only
1861,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,...,69,0,0,0.0,0,0,0,0.0,0,left_only


In [57]:
df.join(df2)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,...,spch,pop,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,...,4,83,Global,Global,1,Rain On Me (with Ariana Grande),"Lady Gaga, Ariana Grande",Rain On Me (with Ariana Grande),0,3:02
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,...,23,82,Global,Global,2,Blinding Lights,The Weeknd,After Hours,0,3:20
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,...,14,80,Global,Global,3,ROCKSTAR (feat. Roddy Ricch),"DaBaby, Roddy Ricch",BLAME IT ON BABY,1,3:01
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,...,4,79,Global,Global,4,Roses - Imanbek Remix,"SAINt JHN, Imanbek",Roses (Imanbek Remix),1,2:56
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,...,4,78,Global,Global,5,Toosie Slide,Drake,Dark Lane Demo Tapes,1,4:07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,599,Find U Again (feat. Camila Cabello),Mark Ronson,dance pop,2019,104,66,61,-7,20,...,3,75,Costa Rica,North America,49,The Box,Roddy Ricch,Please Excuse Me For Being Antisocial,1,3:16
599,600,Cross Me (feat. Chance the Rapper & PnB Rock),Ed Sheeran,pop,2019,95,79,75,-6,7,...,12,75,Costa Rica,North America,50,CANCIÃN CON YANDEL,"Yandel, Bad Bunny",LAS QUE NO IBAN A SALIR,1,3:29
600,601,"No Brainer (feat. Justin Bieber, Chance the Ra...",DJ Khaled,dance pop,2019,136,76,53,-5,9,...,34,70,Czech Republic,Europe,1,Blinding Lights,The Weeknd,After Hours,0,3:20
601,602,Nothing Breaks Like a Heart (feat. Miley Cyrus),Mark Ronson,dance pop,2019,114,79,60,-6,42,...,7,69,Czech Republic,Europe,2,Roses - Imanbek Remix,"SAINt JHN, Imanbek",Roses (Imanbek Remix),1,2:56


In [58]:
df._append(df2).fillna(0)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,...,spch,pop,Country,Continent,Rank,Title,Artists,Album,Explicit,Duration
0,1.0,"Hey, Soul Sister",Train,neo mellow,2010.0,97.0,89.0,67.0,-4.0,8.0,...,4.0,83.0,0,0,0.0,0,0,0,0.0,0
1,2.0,Love The Way You Lie,Eminem,detroit hip hop,2010.0,87.0,93.0,75.0,-5.0,52.0,...,23.0,82.0,0,0,0.0,0,0,0,0.0,0
2,3.0,TiK ToK,Kesha,dance pop,2010.0,120.0,84.0,76.0,-3.0,29.0,...,14.0,80.0,0,0,0.0,0,0,0,0.0,0
3,4.0,Bad Romance,Lady Gaga,dance pop,2010.0,119.0,92.0,70.0,-4.0,8.0,...,4.0,79.0,0,0,0.0,0,0,0,0.0,0
4,5.0,Just the Way You Are,Bruno Mars,pop,2010.0,109.0,84.0,64.0,-5.0,9.0,...,4.0,78.0,0,0,0.0,0,0,0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3145,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,Vietnam,Asia,46.0,ÄÃ Tá»ªNG LÃ,VÅ©.,ÄÃ Tá»ªNG LÃ,0.0,4:20
3146,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,Vietnam,Asia,47.0,MÆ°á»£n RÆ°á»£u Tá» TÃ¬nh,"BigDaddy, Emily",MÆ°á»£n RÆ°á»£u Tá» TÃ¬nh,0.0,3:18
3147,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,Vietnam,Asia,48.0,NgÃ y Táº­n Tháº¿,"TÃ³c TiÃªn, Da LAB, Touliver",NgÃ y Táº­n Tháº¿,0.0,3:52
3148,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,Vietnam,Asia,49.0,Äi Äu ÄÆ°a Äi,Bich Phuong,Äi Äu ÄÆ°a Äi,0.0,3:40


## Extras

In [59]:
df.select_dtypes(include="int64").corr()

Unnamed: 0.1,Unnamed: 0,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
Unnamed: 0,1.0,0.989031,-0.114849,-0.214586,0.07492,-0.133914,-0.129865,-0.129002,-0.210783,0.096518,0.013672,0.15042
year,0.989031,1.0,-0.104247,-0.225596,0.079269,-0.126471,-0.136331,-0.122025,-0.215344,0.101725,0.004778,0.241261
bpm,-0.114849,-0.104247,1.0,0.12617,-0.131301,0.18387,0.081579,0.016021,-0.029359,-0.113257,0.058999,0.018983
nrgy,-0.214586,-0.225596,0.12617,1.0,0.167209,0.537528,0.186738,0.409577,-0.14361,-0.562287,0.107313,-0.057645
dnce,0.07492,0.079269,-0.131301,0.167209,1.0,0.23317,-0.028801,0.501696,-0.176841,-0.240064,-0.028041,0.116054
dB,-0.133914,-0.126471,0.18387,0.537528,0.23317,1.0,0.081934,0.282922,-0.104723,-0.190401,-0.00111,0.156897
live,-0.129865,-0.136331,0.081579,0.186738,-0.028801,0.081934,1.0,0.020226,0.098339,-0.098167,0.144103,-0.075749
val,-0.129002,-0.122025,0.016021,0.409577,0.501696,0.282922,0.020226,1.0,-0.262256,-0.249038,0.122013,0.038953
dur,-0.210783,-0.215344,-0.029359,-0.14361,-0.176841,-0.104723,0.098339,-0.262256,1.0,0.091802,0.054564,-0.104363
acous,0.096518,0.101725,-0.113257,-0.562287,-0.240064,-0.190401,-0.098167,-0.249038,0.091802,1.0,0.002763,0.026704


In [60]:
(df.select_dtypes(include="int64")
 .corr()
 .style
 .background_gradient(axis=0))

Unnamed: 0.1,Unnamed: 0,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
Unnamed: 0,1.0,0.989031,-0.114849,-0.214586,0.07492,-0.133914,-0.129865,-0.129002,-0.210783,0.096518,0.013672,0.15042
year,0.989031,1.0,-0.104247,-0.225596,0.079269,-0.126471,-0.136331,-0.122025,-0.215344,0.101725,0.004778,0.241261
bpm,-0.114849,-0.104247,1.0,0.12617,-0.131301,0.18387,0.081579,0.016021,-0.029359,-0.113257,0.058999,0.018983
nrgy,-0.214586,-0.225596,0.12617,1.0,0.167209,0.537528,0.186738,0.409577,-0.14361,-0.562287,0.107313,-0.057645
dnce,0.07492,0.079269,-0.131301,0.167209,1.0,0.23317,-0.028801,0.501696,-0.176841,-0.240064,-0.028041,0.116054
dB,-0.133914,-0.126471,0.18387,0.537528,0.23317,1.0,0.081934,0.282922,-0.104723,-0.190401,-0.00111,0.156897
live,-0.129865,-0.136331,0.081579,0.186738,-0.028801,0.081934,1.0,0.020226,0.098339,-0.098167,0.144103,-0.075749
val,-0.129002,-0.122025,0.016021,0.409577,0.501696,0.282922,0.020226,1.0,-0.262256,-0.249038,0.122013,0.038953
dur,-0.210783,-0.215344,-0.029359,-0.14361,-0.176841,-0.104723,0.098339,-0.262256,1.0,0.091802,0.054564,-0.104363
acous,0.096518,0.101725,-0.113257,-0.562287,-0.240064,-0.190401,-0.098167,-0.249038,0.091802,1.0,0.002763,0.026704


In [61]:
df_grouped = df.groupby(by="artist")["title"].count()

In [62]:
df_grouped = df_grouped.to_frame()
df_grouped

Unnamed: 0_level_0,title
artist,Unnamed: 1_level_1
3OH!3,1
5 Seconds of Summer,1
A Great Big World,1
Adam Lambert,2
Adele,10
...,...
ZAYN,5
Zara Larsson,1
Zedd,6
fun.,2


In [63]:
df_grouped = df_grouped.reset_index()
df_grouped

Unnamed: 0,artist,title
0,3OH!3,1
1,5 Seconds of Summer,1
2,A Great Big World,1
3,Adam Lambert,2
4,Adele,10
...,...,...
179,ZAYN,5
180,Zara Larsson,1
181,Zedd,6
182,fun.,2


In [64]:
df_grouped2 = (df.
               groupby(by="artist")["title"]
               .count()
               .to_frame()
               .reset_index())
df_grouped2

Unnamed: 0,artist,title
0,3OH!3,1
1,5 Seconds of Summer,1
2,A Great Big World,1
3,Adam Lambert,2
4,Adele,10
...,...,...
179,ZAYN,5
180,Zara Larsson,1
181,Zedd,6
182,fun.,2


In [65]:
df[:10].style.background_gradient(axis=0)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
6,7,Dynamite,Taio Cruz,dance pop,2010,120,78,75,-4,4,82,203,0,9,77
7,8,Secrets,OneRepublic,dance pop,2010,148,76,52,-6,12,38,225,7,4,77
8,9,Empire State of Mind (Part II) Broken Down,Alicia Keys,hip pop,2010,93,37,48,-8,12,14,216,74,3,76
9,10,Only Girl (In The World),Rihanna,barbadian pop,2010,126,72,79,-4,7,61,235,13,4,73


In [66]:
df[:10].style.background_gradient(axis=0, cmap="YlOrBr")

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
6,7,Dynamite,Taio Cruz,dance pop,2010,120,78,75,-4,4,82,203,0,9,77
7,8,Secrets,OneRepublic,dance pop,2010,148,76,52,-6,12,38,225,7,4,77
8,9,Empire State of Mind (Part II) Broken Down,Alicia Keys,hip pop,2010,93,37,48,-8,12,14,216,74,3,76
9,10,Only Girl (In The World),Rihanna,barbadian pop,2010,126,72,79,-4,7,61,235,13,4,73


In [67]:
df[:10].style.text_gradient(low=0.75, high=1)

Unnamed: 0.1,Unnamed: 0,title,artist,top genre,year,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop
0,1,"Hey, Soul Sister",Train,neo mellow,2010,97,89,67,-4,8,80,217,19,4,83
1,2,Love The Way You Lie,Eminem,detroit hip hop,2010,87,93,75,-5,52,64,263,24,23,82
2,3,TiK ToK,Kesha,dance pop,2010,120,84,76,-3,29,71,200,10,14,80
3,4,Bad Romance,Lady Gaga,dance pop,2010,119,92,70,-4,8,71,295,0,4,79
4,5,Just the Way You Are,Bruno Mars,pop,2010,109,84,64,-5,9,43,221,2,4,78
5,6,Baby,Justin Bieber,canadian pop,2010,65,86,73,-5,11,54,214,4,14,77
6,7,Dynamite,Taio Cruz,dance pop,2010,120,78,75,-4,4,82,203,0,9,77
7,8,Secrets,OneRepublic,dance pop,2010,148,76,52,-6,12,38,225,7,4,77
8,9,Empire State of Mind (Part II) Broken Down,Alicia Keys,hip pop,2010,93,37,48,-8,12,14,216,74,3,76
9,10,Only Girl (In The World),Rihanna,barbadian pop,2010,126,72,79,-4,7,61,235,13,4,73
