# Detecção de Músicas Anômalas no Top 200

Este notebook implementa métodos de análise de anomalias *(outliers)* com o objetivo de encontrar músicas que chegaram ao Top 200 mesmo apresentando caracterísitcas que fogem do padrão do ranking.

Um exemplo curioso: será que Gangnan Style, quando foi lançada e popularizada, possuia as mesmas características das músicas mais ouvidas daquele ano?

## Breve Explicação dos Modelos Utilizados

### ABOD

O Angle Based Outlier Detection, como o nome sugere, utiliza os ângulos entre trincas de pontos do espaço de variáveis *(feature space)* para detectar outliers.

O algoritmo baseia-se na seguinte propriedade dos ângulos entre os pontos: quanto mais anômalo o ponto **P** em relação aos demais pontos do dataset (**A**  e **B** para formar a trinca e podermos calcular um ângulo), menor será a variação do ângulo **APB** ao variamos **A** e **B**. Por outro lado, quanto mais *clusterizados* os 3 pontos, maior será a variação dos ângulos com a escolha de diferentes  **A** e **B**.

A imagem abaixo resume bem essa ideia.

O ponto em verde é outlier e o vermelho, não. Note que, ao escolher diferentes pontos azuis, o ângulo obtido vai variar muito mais para o ponto vermelho do que para o verde.

![](https://miro.medium.com/max/884/1*nLL8AKMaY1xpNkno2CkMUw.png)

O algoritmo em si, basicamente, calcula esses ângulos $\theta$ para cada trinca de pontos (ou para os *k* vizinhos mais próximos) seguindo a seguinte fórmula:

$cos\theta = \dfrac{\vec{A}\cdot\vec{B}}{A^2B^2}$ onde $\vec{A}=\overline{\rm AP}$ e $\vec{B}=\overline{\rm BP}$, $\vec{A}\cdot\vec{B}$ é o produto interno entre $A$ e $B$, e A^2B^2 é o produto dos módulos ao quadrado.

A partir dessa expressão, calcula-se a **variância** de $\theta$ para todo $A$ e $B$, dado um $P$ fixo. Se a variância for grande, $P$ é classificado como *inliner* e, caso seja pequena, como $outlier$.


### Autoencoder

O funcionamento básico de uma rede neural *autoencoder* é que ela aprende comprime e reconstrói os dados de entrada. Um bom autoencoder é capaz de gerar essa reconstrução extremamente similar aos dados originais que foram utilizados como entrada do algoritmo, ou seja, o erro entre a reconstrução e os dados originais deve ser o menor possível.

A imagem abaixo resume a intuição por detrás do algoritmo.

![](https://www.deeplearningbook.com.br/wp-content/uploads/2019/11/mushroom_encoder.png)

A ideia por detrás de usar essas redes neurais para detecção de anomalias é que, ao apresentar um dado anômalo, o erro ao recontrui-lo será maior que o esperado, uma vez que o algoritmo aprendeu a reconstruir os dados a partir de um leque específico e comum de características e, um dado anômlado, por definição, apresenta alguma irregularidade.

Sendo assim, os pontos com os maiores erros de reconstrução podem ser considerados possíveis *outliers*.


### LoOP

O LoOP é um algoritmo baseado em densidade que calcula e comparada a densidade de cada ponto com seus vizinhos. Pontos anômalos estão, normalmente, mais espalhados pelo espaço de variáveis e, com isso, apresentam menor densidade em sua vizinhança. Se a densidade de um ponto for muito menor que a dos seus vizinhos (LOF>>1), o ponto está longe de áreas de alta densidade e é, portanto, um possível outlier.

O LoOP é uma versão mais robusta do LOF (Local Outlier Factor), que retorna a probabilidade do ponto ser outiler.

![](https://miro.medium.com/max/700/1*2e2o6U0zogXSzsmJWzdigg.png)

### Isolation Forest

O modelo de Isolation Forest é construído em cima de Random Forests que, por sua vez, é um ensemble de árvores de decisão. Esse algoritmo utiliza o fato de que observações anômalas são menos frequentes e relevantemente distoantes das demais, de modo que, durante as divisões internas dos ramos da árvore de decisão, essas variáveis normalmente são identificadas mais próximas à raiz da árvore. Uma nota de anomalia é, então, gerada a partir do tamanho médio do caminho até a variável ser escolhida dentre todas as árvores da floresta, sendo que, quando menor o caminho (mais perto da raiz), maior o score de outlier.

![](https://miro.medium.com/max/700/1*xCZtQspQaHSmwniINWTeLQ.png)

### Módulo Python utilizado
[Pyod - ABOD](https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.abod)


Variável booleana que diz se preciso fazer uma nova consulta à API para gerar um novo dataset de musicas 

In [None]:
playlist_existente = True

time: 1.15 ms (started: 2022-06-21 16:24:33 +00:00)


In [1]:
!pip install ipython-autotime
%load_ext autotime
!pip install pyod
!pip install spotipy

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm # tqdm mostra barra de processo https://github.com/tqdm/tqdm

import os
import glob

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

from pyod.models.abod import ABOD
from pyod.models.auto_encoder import AutoEncoder
from pyod.models.iforest import IForest
from pyod.models.lof import LOF


from google.colab import drive
drive.mount('/content/drive')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ipython-autotime
  Downloading ipython_autotime-0.3.1-py2.py3-none-any.whl (6.8 kB)
Installing collected packages: ipython-autotime
Successfully installed ipython-autotime-0.3.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyod
  Downloading pyod-1.0.2.tar.gz (122 kB)
[K     |████████████████████████████████| 122 kB 8.6 MB/s 
Collecting scipy>=1.5.1
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
[K     |████████████████████████████████| 38.1 MB 1.0 MB/s 
Building wheels for collected packages: pyod
  Building wheel for pyod (setup.py) ... [?25l[?25hdone
  Created wheel for pyod: filename=pyod-1.0.2-py3-none-any.whl size=150272 sha256=19484b973630bf06d40e92ae425449e1413169706184b6432ecc5d0641514e3c
  Stored in directory: /root/.cache/pip/wheels/e6/8f/06/5512935ed3c

___
## Criando dataset global

In [None]:
# Usuário: fijita8647@karavic.com
# Senha: INF10322803
cid ="" 
secret = ""

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

time: 8.53 ms (started: 2022-06-21 16:24:46 +00:00)


O path abaixo é para o Top 200 global

In [None]:
path_global = "/content/drive/MyDrive/INF1032 - Spotify/Dados/Anna/csv/global"

time: 1.36 ms (started: 2022-06-21 16:24:46 +00:00)


A funcao abaixo monta um dataframe com as URIs de todas as musicas que entraram em um dado Top 200

In [None]:
def pega_uri_charts(path):

  #Pega todos os arquivos do diretorio
  arquivos = glob.glob(os.path.join(path,"*.csv"))
  dados = pd.DataFrame()

  for file in arquivos:
      dados_temp = pd.read_csv(file)
      #Coloca o nome certo nas colunas
      dados_temp.columns = dados_temp.iloc[0]
      dados_temp = dados_temp.drop(axis=0,index=0)
      #Vai concatenando no dataet final
      dados = pd.concat([dados,dados_temp])

  #Preciso apenas do URI
  dados = dados[["URL","Streams"]]

  #Para analise de anomalias nao preciso da informacao de quantas vezes a musica apareceu no top 200, entao removo duplicatas
  dados = dados.drop_duplicates(keep="first")

  #Reset no index, ja que removi linhas
  dados = dados.reset_index(drop=True)
  
  return dados




time: 23 ms (started: 2022-06-21 16:24:46 +00:00)


A função abaixo pega as features de um track via URI do track

In [None]:
def get_features_from_tracks(track_id):
  #Pega as features da musica e outras informações importantes e/ou interessantes

  #Audio features
  features = sp.audio_features(track_id)

  #Informacoes sobre a musica
  data_lancamento = sp.track(track_id)['album']['release_date']
  nome = sp.track(track_id)['name']
  popularidade_musica = sp.track(track_id)['popularity']
  
  #Informacoes sobre o artista
  nome_artista = sp.track(track_id)['artists'][0]['name']
  uri_artista = sp.track(track_id)["artists"][0]['uri']
  popularidade_artista = sp.artist(uri_artista)['popularity']
  estilo = sp.artist(uri_artista)['genres']

  
  #Montando o dataframe
  features = pd.DataFrame(features)

  features["nome"] = nome
  features["data_lancamento"] = data_lancamento
  features["data_lancamento"] = pd.to_datetime(features["data_lancamento"])
  features["mes_lancamento"] = features["data_lancamento"].dt.month
  features["dia_lancamento"] = features["data_lancamento"].dt.day
  features["dia_semana_lancamento"] = features["data_lancamento"].dt.day_name()
  
  features["artista"] = nome_artista
  features["popularidade_musica"] = popularidade_musica
  features["popularidade_artista"] = popularidade_artista
  features["estilo_musical"] = estilo[-1] #sempre é uma lista e eu, arbritariamente, escolhi pegar o ultimo estilo da lista
  
  
  return features

time: 26.8 ms (started: 2022-06-21 16:24:46 +00:00)


A função abaixo pega as audio features de cada um dos tracks e monta um dataframe

In [None]:
def gera_dataset(lista_uris_tracks):
  df_features_final = pd.DataFrame()
  for idx,track_uri in lista_uris_tracks.iterrows():
    
    #Eu pego as features da musica
    df_features_parcial = get_features_from_tracks(track_uri.URL)

    df_features_parcial["Streams"] = track_uri.Streams
    # E concateno no df final
    df_features_final = pd.concat([df_features_final,df_features_parcial],axis=0)
  
  
  if("Unnamed: 0" in df_features_final.columns):
    df_features_final = df_features_final.drop(axis=1,columns=["Unnamed: 0"])
  

  return df_features_final

time: 7.14 ms (started: 2022-06-21 16:24:46 +00:00)


In [None]:
path_top_global = '/content/drive/My Drive/INF1032 - Spotify/Dados/Matheus/MusicasAnomalas/'

if playlist_existente:
  df_musicas_global = pd.read_csv(path_top_global+"top200_global_consolidado.csv")
  df_musicas_boas = pd.read_csv("/content/drive/My Drive/INF1032 - Spotify/Dados/Consolidados/musicas_boas.csv")
  df_musicas_ruins = pd.read_csv("/content/drive/My Drive/INF1032 - Spotify/Dados/Consolidados/musicas_ruins.csv")
else:
  lista_uris_global = pega_uri_charts(path_global)
  df_musicas_global = gera_dataset(lista_uris_tracks=lista_uris_global)
  df_musicas_global.to_csv(path_top_global+"top200_global_consolidado.csv") 

time: 176 ms (started: 2022-06-21 16:24:46 +00:00)


In [None]:
df_musicas_global.head(10)

Unnamed: 0.1,Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,nome
0,0,0.52,0.731,6,-5.338,0,0.0557,0.342,0.00101,0.311,0.662,173.93,audio_features,4LRPiXqCikLlN15c3yImP7,spotify:track:4LRPiXqCikLlN15c3yImP7,https://api.spotify.com/v1/tracks/4LRPiXqCikLl...,https://api.spotify.com/v1/audio-analysis/4LRP...,167303,4,As It Was
1,0,0.905,0.563,8,-6.135,1,0.102,0.0254,1e-05,0.113,0.324,106.998,audio_features,1rDQ4oMwGJI7B4tovsBOxc,spotify:track:1rDQ4oMwGJI7B4tovsBOxc,https://api.spotify.com/v1/tracks/1rDQ4oMwGJI7...,https://api.spotify.com/v1/audio-analysis/1rDQ...,173948,4,First Class
2,0,0.761,0.525,11,-6.9,1,0.0944,0.44,7e-06,0.0921,0.531,80.87,audio_features,02MWAaffLxlfxAUY7c5dvx,spotify:track:02MWAaffLxlfxAUY7c5dvx,https://api.spotify.com/v1/tracks/02MWAaffLxlf...,https://api.spotify.com/v1/audio-analysis/02MW...,238805,4,Heat Waves
3,0,0.591,0.764,1,-5.484,1,0.0483,0.0383,0.0,0.103,0.478,169.928,audio_features,5PjdY0CKGZdEuoNab3yDmX,spotify:track:5PjdY0CKGZdEuoNab3yDmX,https://api.spotify.com/v1/tracks/5PjdY0CKGZdE...,https://api.spotify.com/v1/audio-analysis/5Pjd...,141806,4,STAY (with Justin Bieber)
4,0,0.756,0.697,8,-6.377,1,0.0401,0.182,0.0,0.333,0.956,94.996,audio_features,2DB4DdfCFMw1iaR6JaR03a,spotify:track:2DB4DdfCFMw1iaR6JaR03a,https://api.spotify.com/v1/tracks/2DB4DdfCFMw1...,https://api.spotify.com/v1/audio-analysis/2DB4...,206071,4,Bam Bam (feat. Ed Sheeran)
5,0,0.728,0.783,11,-4.424,0,0.266,0.237,0.0,0.434,0.555,77.011,audio_features,1HhNoOuqm1a5MXYEgAFl8o,spotify:track:1HhNoOuqm1a5MXYEgAFl8o,https://api.spotify.com/v1/tracks/1HhNoOuqm1a5...,https://api.spotify.com/v1/audio-analysis/1HhN...,173381,4,Enemy (with JID) - from the series Arcane Leag...
6,0,0.795,0.8,1,-6.32,1,0.0309,0.0354,7.3e-05,0.0915,0.934,116.032,audio_features,6JIC3hbC28JZKZ8AlAqX8h,spotify:track:6JIC3hbC28JZKZ8AlAqX8h,https://api.spotify.com/v1/tracks/6JIC3hbC28JZ...,https://api.spotify.com/v1/audio-analysis/6JIC...,202735,4,Cold Heart - PNAU Remix
7,0,0.741,0.691,10,-7.395,0,0.0672,0.0221,0.0,0.0476,0.892,150.087,audio_features,5Z9KJZvQzH6PFmb8SNkxuk,spotify:track:5Z9KJZvQzH6PFmb8SNkxuk,https://api.spotify.com/v1/tracks/5Z9KJZvQzH6P...,https://api.spotify.com/v1/audio-analysis/5Z9K...,212353,4,INDUSTRY BABY (feat. Jack Harlow)
8,0,0.812,0.736,4,-5.421,0,0.0833,0.152,0.00254,0.0914,0.396,91.993,audio_features,3FkeNbs9Zeiqkr3WkbOiGp,spotify:track:3FkeNbs9Zeiqkr3WkbOiGp,https://api.spotify.com/v1/tracks/3FkeNbs9Zeiq...,https://api.spotify.com/v1/audio-analysis/3Fke...,193806,4,Envolver
9,0,0.87,0.548,10,-5.253,0,0.077,0.0924,4.6e-05,0.0534,0.832,96.018,audio_features,1O2pcBJGej0pmH2Y9XZMs6,spotify:track:1O2pcBJGej0pmH2Y9XZMs6,https://api.spotify.com/v1/tracks/1O2pcBJGej0p...,https://api.spotify.com/v1/audio-analysis/1O2p...,153750,4,Una Noche en Medellín


time: 28.1 ms (started: 2022-06-21 16:24:46 +00:00)


___
### Aquisicao dos dados vindo das Top Hits

Esses dados não são adequados para a analise de anomalias por causa da janela temporal grande

In [None]:
df_musicas_boas.columns

Index(['Unnamed: 0', 'danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url',
       'duration_ms', 'time_signature', 'nome', 'data_lancamento',
       'Popularidade Musica', 'Artista', 'ano_lancamento', 'mes_lancamento',
       'dia_semana_lancamento', 'Popularidade Artista', 'Seguidores',
       'Estilos'],
      dtype='object')

time: 5.41 ms (started: 2022-06-21 16:24:46 +00:00)


In [None]:
features_de_interesse = ['danceability','energy', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence','duration_ms','key', 'mode', 'tempo']
features_de_interesse2 = ['danceability','energy', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence','duration_ms','key', 'mode', 'tempo','Popularidade Musica', 'ano_lancamento', 'mes_lancamento',
       'dia_semana_lancamento', 'Popularidade Artista', 'Seguidores']

df_input = df_musicas_global[features_de_interesse]
df_input_boas = df_musicas_boas[features_de_interesse2]
df_input_ruins = df_musicas_ruins[features_de_interesse2]

df_resultados = df_musicas_global.copy(deep=True)
df_resultados = df_resultados.drop(axis=1,columns=['type', 'id', 'uri', 'track_href', 'analysis_url','time_signature'])
df_resultados = df_resultados.reset_index(drop=True)
# df_input = df_input.reset_index(drop=True)

df_resultados_boas = df_musicas_boas.copy(deep=True)
df_resultados_boas = df_resultados_boas.dropna()
df_resultados_boas = df_resultados_boas.reset_index(drop=True)
df_resultados_boas = df_resultados_boas.drop(axis=1,columns=['type', 'id', 'uri', 'track_href', 'analysis_url','time_signature','Estilos','data_lancamento','Artista'])
# df_input_boas = df_input_boas.reset_index(drop=True)

df_resultados_ruins = df_musicas_ruins.copy(deep=True)
df_resultados_ruins = df_resultados_ruins.dropna()
df_resultados_ruins = df_resultados_ruins.reset_index(drop=True)
df_resultados_ruins = df_resultados_ruins.drop(axis=1,columns=['type', 'id', 'uri', 'track_href', 'analysis_url','time_signature','Estilos','data_lancamento','Artista'])
# df_input_ruins = df_input_ruins.reset_index(drop=True)

time: 52 ms (started: 2022-06-21 16:24:46 +00:00)


### Analisando os valores existentes para cada uma dessas features

Esse passo é importante pois alguns dos modelos de detecção de anomalias precisam dos valores de input em um dado formato.

O autoencoder, por exemplo, por ser uma rede neural, pode precisar dos valores normalizados entre [0,1] ou [-1,1].

In [None]:
for feature in features_de_interesse:
  max = df_input[[feature]].max()
  min = df_input[[feature]].min()
  print("Coluna %s:\tMax = %.2f\tMin = %.2f"%(feature,max,min))

Coluna danceability:	Max = 0.98	Min = 0.15
Coluna energy:	Max = 0.99	Min = 0.03
Coluna loudness:	Max = 1.51	Min = -34.48
Coluna speechiness:	Max = 0.97	Min = 0.02
Coluna acousticness:	Max = 0.99	Min = 0.00
Coluna instrumentalness:	Max = 0.95	Min = 0.00
Coluna liveness:	Max = 0.98	Min = 0.02
Coluna valence:	Max = 0.98	Min = 0.03
Coluna duration_ms:	Max = 690732.00	Min = 30133.00
Coluna key:	Max = 11.00	Min = 0.00
Coluna mode:	Max = 1.00	Min = 0.00
Coluna tempo:	Max = 212.12	Min = 46.72
time: 53.8 ms (started: 2022-06-21 16:24:46 +00:00)


In [None]:
df_input_boas = df_input_boas.dropna()
df_input_ruins = df_input_ruins.dropna()
df_input.isna().sum()

danceability        0
energy              0
loudness            0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
duration_ms         0
key                 0
mode                0
tempo               0
dtype: int64

time: 20.8 ms (started: 2022-06-21 16:24:46 +00:00)


A maior parte das colunas está entre [0,1]. Apenas *loudness* e *duration_ms* que, talvez, precisem ser ajustadas, a depender do método utilizado.

### Implementação dos métodos de deteção de anomalias

#### Funções Auxiliares e variáveis globais

In [None]:
def gera_prob_clssificacao(df,metodo,lista_proba):
  prob_normal = []
  prob_outlier = []
  for prob in lista_proba[0]:
      prob_normal.append(prob[0])
      prob_outlier.append(prob[1])

  prob_classif = []
  for i in range(len(df[metodo])):
      if df[metodo][i]==1:
          prob_classif.append(prob_outlier[i])
      else:
          prob_classif.append(prob_normal[i])

  return prob_classif

time: 9.21 ms (started: 2022-06-21 16:24:46 +00:00)


In [None]:
#Para mais de um modelo
contaminacao = 0.05
num_vizinhos = 50

#Autoencoders
num_neuronios = df_input.shape[1]
neuronios = [num_neuronios,num_neuronios//2,num_neuronios//4,num_neuronios//4,num_neuronios//2,num_neuronios]

num_neuronios_boas = df_input_boas.shape[1]
neuronios_boas = [num_neuronios_boas,num_neuronios_boas//2,num_neuronios_boas//4,num_neuronios_boas//4,num_neuronios_boas//2,num_neuronios_boas]

num_neuronios_ruins = df_input_ruins.shape[1]
neuronios_ruins = [num_neuronios_ruins,num_neuronios_ruins//2,num_neuronios_ruins//4,num_neuronios_ruins//4,num_neuronios_ruins//2,num_neuronios_ruins]

time: 7.71 ms (started: 2022-06-21 16:24:46 +00:00)


### ABOD

In [None]:
modelo_abod = ABOD(contamination=contaminacao,method='fast',n_neighbors=num_vizinhos).fit(df_input)
resultado_abod = modelo_abod.predict(df_input, return_confidence=True)
probs_abod = modelo_abod.predict_proba(df_input,method="unify", return_confidence=True)

time: 6min 16s (started: 2022-06-21 16:24:46 +00:00)


In [None]:
modelo_abod_boas = ABOD(contamination=contaminacao,method='fast',n_neighbors=num_vizinhos).fit(df_input_boas)
resultado_abod_boas = modelo_abod_boas.predict(df_input_boas, return_confidence=True)
probs_abod_boas = modelo_abod_boas.predict_proba(df_input_boas,method="unify", return_confidence=True)

time: 5min 59s (started: 2022-06-21 16:31:02 +00:00)


In [None]:
modelo_abod_ruins = ABOD(contamination=contaminacao,method='fast',n_neighbors=num_vizinhos).fit(df_input_ruins)
resultado_abod_ruins = modelo_abod_ruins.predict(df_input_ruins, return_confidence=True)
probs_abod_ruins = modelo_abod_ruins.predict_proba(df_input_ruins,method="unify", return_confidence=True)

time: 4min 9s (started: 2022-06-21 16:37:02 +00:00)


In [None]:
df_resultados["abod"] = resultado_abod[0]
df_resultados["prob abod"] = gera_prob_clssificacao(df_resultados,"abod",probs_abod)

df_resultados_boas["abod"] = resultado_abod_boas[0]
df_resultados_boas["prob abod"] = gera_prob_clssificacao(df_resultados_boas,"abod",probs_abod_boas)

df_resultados_ruins["abod"] = resultado_abod_ruins[0]
df_resultados_ruins["prob abod"] = gera_prob_clssificacao(df_resultados_ruins,"abod",probs_abod_ruins)

mask1 = df_resultados["abod"]==1  
mask2 = df_resultados["prob abod"]>=0.5
mask = mask1 & mask2
df_resultados[mask]

Unnamed: 0.1,Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,nome,abod,prob abod


time: 118 ms (started: 2022-06-21 16:41:12 +00:00)


### Autoencoder

In [None]:
modelo_autoencoder = AutoEncoder(hidden_neurons=neuronios,contamination=contaminacao,verbose=0,random_state=0).fit(df_input)
resultado_autoencoder = modelo_autoencoder.predict(df_input, return_confidence=True)
probs_autoencoder = modelo_autoencoder.predict_proba(df_input,method="unify", return_confidence=True)

time: 35 s (started: 2022-06-21 16:41:12 +00:00)


In [None]:
modelo_autoencoder_boas = AutoEncoder(hidden_neurons=neuronios_boas,contamination=contaminacao,verbose=0,random_state=0).fit(df_input_boas)
resultado_autoencoder_boas = modelo_autoencoder_boas.predict(df_input_boas, return_confidence=True)
probs_autoencoder_boas = modelo_autoencoder_boas.predict_proba(df_input_boas,method="unify", return_confidence=True)

time: 45.2 s (started: 2022-06-21 16:41:47 +00:00)


In [None]:
modelo_autoencoder_ruins = AutoEncoder(hidden_neurons=neuronios_ruins,contamination=contaminacao,verbose=0,random_state=0).fit(df_input_ruins)
resultado_autoencoder_ruins = modelo_autoencoder_ruins.predict(df_input_ruins, return_confidence=True)
probs_autoencoder_ruins = modelo_autoencoder_ruins.predict_proba(df_input_ruins,method="unify", return_confidence=True)

time: 31.6 s (started: 2022-06-21 16:42:32 +00:00)


In [None]:
df_resultados["autoencoder"] = resultado_autoencoder[0]
df_resultados["prob autoencoder"] = gera_prob_clssificacao(df_resultados,"autoencoder",probs_autoencoder)

df_resultados_boas["autoencoder"] = resultado_autoencoder_boas[0]
df_resultados_boas["prob autoencoder"] = gera_prob_clssificacao(df_resultados_boas,"autoencoder",probs_autoencoder_boas)

df_resultados_ruins["autoencoder"] = resultado_autoencoder_ruins[0]
df_resultados_ruins["prob autoencoder"] = gera_prob_clssificacao(df_resultados_ruins,"autoencoder",probs_autoencoder_ruins)

mask1 = df_resultados["autoencoder"]==1  
mask2 = df_resultados["prob autoencoder"]>=0.9
mask = mask1 & mask2
df_resultados[mask]

Unnamed: 0.1,Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,nome,abod,prob abod,autoencoder,prob autoencoder
51,0,0.332,0.225,0,-8.697,1,0.0348,0.76700,0.003490,0.1280,0.2970,81.055,298899,Happier Than Ever,0,0.988887,1,0.920264
71,0,0.684,0.449,3,-9.738,1,0.6110,0.86900,0.000000,0.0881,0.3410,66.165,157373,Flowers,0,0.988887,1,0.968091
81,0,0.426,0.225,8,-12.422,1,0.0291,0.82300,0.367000,0.1100,0.0643,104.506,232204,Something In The Way - Remastered,0,0.988887,1,0.999293
111,0,0.672,0.745,5,-5.269,0,0.3000,0.43800,0.000009,0.0699,0.7390,173.974,413111,"Cayó La Noche (feat. Cruz Cafuné, Abhir Hathi,...",1,0.011113,1,0.920524
121,0,0.746,0.251,11,-16.169,0,0.2590,0.78200,0.002030,0.1060,0.1800,139.999,120027,Revenge,0,0.988887,1,0.983315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5107,0,0.586,0.740,1,-2.997,0,0.4040,0.55500,0.000000,0.1100,0.6970,71.378,519289,"Residente: Bzrp Music Sessions, Vol. 49",1,0.011113,1,0.999815
5112,0,0.394,0.327,4,-14.291,1,0.1140,0.84900,0.000000,0.1250,0.4110,93.358,150320,"Elliot's Song - From ""Euphoria"" An HBO Origina...",0,0.988887,1,0.912817
5118,0,0.544,0.369,2,-9.514,1,0.0380,0.96900,0.279000,0.6390,0.1020,87.010,208212,The Night We Met,0,0.988887,1,0.997099
5143,0,0.412,0.881,2,-3.502,1,0.0870,0.00039,0.000058,0.9230,0.3590,165.012,159096,emo girl (feat. WILLOW),0,0.988887,1,0.993019


time: 147 ms (started: 2022-06-21 16:43:04 +00:00)


### LOF

In [None]:
modelo_lof = LOF(contamination=contaminacao,n_neighbors=num_vizinhos,algorithm='auto').fit(df_input)
resultado_lof = modelo_lof.predict(df_input, return_confidence=True)
probs_lof = modelo_lof.predict_proba(df_input,method="unify", return_confidence=True)

time: 1.24 s (started: 2022-06-21 16:43:04 +00:00)


In [None]:
modelo_lof_boas = LOF(contamination=contaminacao,n_neighbors=num_vizinhos,algorithm='auto').fit(df_input_boas)
resultado_lof_boas = modelo_lof_boas.predict(df_input_boas, return_confidence=True)
probs_lof_boas = modelo_lof_boas.predict_proba(df_input_boas,method="unify", return_confidence=True)

time: 3.85 s (started: 2022-06-21 16:43:05 +00:00)


In [None]:
modelo_lof_ruins = LOF(contamination=contaminacao,n_neighbors=num_vizinhos,algorithm='auto').fit(df_input_ruins)
resultado_lof_ruins = modelo_lof_ruins.predict(df_input_ruins, return_confidence=True)
probs_lof_ruins = modelo_lof_ruins.predict_proba(df_input_ruins,method="unify", return_confidence=True)

time: 2.55 s (started: 2022-06-21 16:43:09 +00:00)


In [None]:
df_resultados["lof"] = resultado_lof[0]
df_resultados["prob lof"] = gera_prob_clssificacao(df_resultados,"lof",probs_lof)

df_resultados_boas["lof"] = resultado_lof_boas[0]
df_resultados_boas["prob lof"] = gera_prob_clssificacao(df_resultados_boas,"lof",probs_lof_boas)

df_resultados_ruins["lof"] = resultado_lof_ruins[0]
df_resultados_ruins["prob lof"] = gera_prob_clssificacao(df_resultados_ruins,"lof",probs_lof_ruins)
mask1 = df_resultados["lof"]==1  
mask2 = df_resultados["prob lof"]>=0.9
mask = mask1 & mask2
df_resultados[mask]

Unnamed: 0.1,Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,nome,abod,prob abod,autoencoder,prob autoencoder,lof,prob lof
111,0,0.672,0.745,5,-5.269,0,0.3000,0.438000,0.000009,0.0699,0.739,173.974,413111,"Cayó La Noche (feat. Cruz Cafuné, Abhir Hathi,...",1,0.011113,1,0.920524,1,0.999999
369,0,0.418,0.106,8,-22.507,0,0.0448,0.994000,0.029200,0.1790,0.800,46.718,85267,Carol of the Bells,1,0.011113,1,0.999999,1,0.992669
444,0,0.526,0.328,1,-9.864,1,0.0461,0.694000,0.000000,0.1120,0.110,116.068,526387,Give Me Love,1,0.011113,1,0.999868,1,1.000000
641,0,0.588,0.479,1,-7.039,1,0.2810,0.604000,0.000007,0.5270,0.434,150.414,460573,FEAR.,1,0.011113,1,0.994711,1,1.000000
677,0,0.837,0.840,7,-4.293,1,0.2390,0.062100,0.000000,0.3290,0.691,135.009,400013,Take It Back,1,0.011113,0,0.158448,1,0.999634
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4938,0,0.269,0.591,10,-5.790,0,0.0416,0.453000,0.000695,0.0702,0.228,183.793,403044,Love Is A Game,1,0.011113,1,0.980567,1,0.999875
5050,0,0.574,0.664,11,-6.068,1,0.0409,0.044800,0.055300,0.1460,0.226,144.654,91870,Formula,1,0.011113,0,0.776860,1,0.946684
5072,0,0.408,0.875,8,-1.536,1,0.0546,0.000017,0.000101,0.2640,0.646,144.975,90266,The Rumbling (TV Size),1,0.011113,0,0.345176,1,0.960781
5093,0,0.780,0.768,6,-4.325,0,0.2380,0.037100,0.000002,0.5180,0.507,80.063,404107,Stan,1,0.011113,1,0.937374,1,0.999922


time: 138 ms (started: 2022-06-21 16:43:12 +00:00)


### IF

In [None]:
modelo_if = IForest(contamination=contaminacao,random_state=0,).fit(df_input)
resultado_if = modelo_if.predict(df_input, return_confidence=True)
probs_if = modelo_if.predict_proba(df_input,method="unify", return_confidence=True)

  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"


time: 2.48 s (started: 2022-06-21 16:43:12 +00:00)


In [None]:
modelo_if_boas = IForest(contamination=contaminacao,random_state=0,).fit(df_input_boas)
resultado_if_boas = modelo_if_boas.predict(df_input_boas, return_confidence=True)
probs_if_boas = modelo_if_boas.predict_proba(df_input_boas,method="unify", return_confidence=True)

  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"


time: 2.58 s (started: 2022-06-21 16:43:14 +00:00)


In [None]:
modelo_if_ruins = IForest(contamination=contaminacao,random_state=0,).fit(df_input_ruins)
resultado_if_ruins = modelo_if_ruins.predict(df_input_ruins, return_confidence=True)
probs_if_ruins = modelo_if_ruins.predict_proba(df_input_ruins,method="unify", return_confidence=True)

  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"
  f"X has feature names, but {self.__class__.__name__} was fitted without"


time: 2.2 s (started: 2022-06-21 16:43:17 +00:00)


In [None]:
df_resultados["if"] = resultado_if[0]
df_resultados["prob if"] = gera_prob_clssificacao(df_resultados,"if",probs_if)

df_resultados_boas["if"] = resultado_if_boas[0]
df_resultados_boas["prob if"] = gera_prob_clssificacao(df_resultados_boas,"if",probs_if_boas)

df_resultados_ruins["if"] = resultado_if_ruins[0]
df_resultados_ruins["prob if"] = gera_prob_clssificacao(df_resultados_ruins,"if",probs_if_ruins)

mask1 = df_resultados["if"]==1  
mask2 = df_resultados["prob if"]>=0.9
mask = mask1 & mask2
df_resultados[mask]

Unnamed: 0.1,Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,duration_ms,nome,abod,prob abod,autoencoder,prob autoencoder,lof,prob lof,if,prob if
51,0,0.332,0.225,0,-8.697,1,0.0348,0.76700,0.003490,0.1280,...,298899,Happier Than Ever,0,0.988887,1,0.920264,0,1.000000,1,0.964013
71,0,0.684,0.449,3,-9.738,1,0.6110,0.86900,0.000000,0.0881,...,157373,Flowers,0,0.988887,1,0.968091,0,1.000000,1,0.963452
81,0,0.426,0.225,8,-12.422,1,0.0291,0.82300,0.367000,0.1100,...,232204,Something In The Way - Remastered,0,0.988887,1,0.999293,0,1.000000,1,0.998709
121,0,0.746,0.251,11,-16.169,0,0.2590,0.78200,0.002030,0.1060,...,120027,Revenge,0,0.988887,1,0.983315,0,1.000000,1,0.998768
124,0,0.414,0.404,0,-9.928,0,0.0499,0.27100,0.000000,0.3000,...,354320,Bohemian Rhapsody - Remastered 2011,1,0.011113,0,0.107907,0,0.933518,1,0.985115
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5103,0,0.456,0.840,8,-3.637,0,0.2830,0.51900,0.000000,0.9600,...,191524,Vai Lá Em Casa Hoje,0,0.988887,1,0.995262,0,1.000000,1,0.992147
5107,0,0.586,0.740,1,-2.997,0,0.4040,0.55500,0.000000,0.1100,...,519289,"Residente: Bzrp Music Sessions, Vol. 49",1,0.011113,1,0.999815,1,1.000000,1,0.983324
5118,0,0.544,0.369,2,-9.514,1,0.0380,0.96900,0.279000,0.6390,...,208212,The Night We Met,0,0.988887,1,0.997099,0,1.000000,1,0.999472
5143,0,0.412,0.881,2,-3.502,1,0.0870,0.00039,0.000058,0.9230,...,159096,emo girl (feat. WILLOW),0,0.988887,1,0.993019,0,1.000000,1,0.987666


time: 146 ms (started: 2022-06-21 16:43:19 +00:00)


### Juntando resultados

In [None]:
df_resultados["total votos"] = df_resultados["abod"] + df_resultados["autoencoder"] + df_resultados["if"] + df_resultados["lof"]
df_resultados["probabilidade"] = df_resultados["abod"]*df_resultados["prob abod"] + df_resultados["autoencoder"]*df_resultados["prob autoencoder"] + df_resultados["if"]*df_resultados["prob if"] + df_resultados["lof"]*df_resultados["prob lof"]
df_resultados["probabilidade"] /= 4

df_resultados_boas["total votos"] = df_resultados_boas["abod"] + df_resultados_boas["autoencoder"] + df_resultados_boas["if"] + df_resultados_boas["lof"]
df_resultados_boas["probabilidade"] = df_resultados_boas["abod"]*df_resultados_boas["prob abod"] + df_resultados_boas["autoencoder"]*df_resultados_boas["prob autoencoder"] + df_resultados_boas["if"]*df_resultados_boas["prob if"] + df_resultados_boas["lof"]*df_resultados_boas["prob lof"]
df_resultados_boas["probabilidade"] /= 4

df_resultados_ruins["total votos"] = df_resultados_ruins["abod"] + df_resultados_ruins["autoencoder"] + df_resultados_ruins["if"] + df_resultados_ruins["lof"]
df_resultados_ruins["probabilidade"] = df_resultados_ruins["abod"]*df_resultados_ruins["prob abod"] + df_resultados_ruins["autoencoder"]*df_resultados_ruins["prob autoencoder"] + df_resultados_ruins["if"]*df_resultados_ruins["prob if"] + df_resultados_ruins["lof"]*df_resultados_ruins["prob lof"]
df_resultados_ruins["probabilidade"] /= 4
#Exportando resultados
df_resultados.to_csv(path_top_global+"resultado_anomalias.csv")
df_resultados_boas.to_csv("/content/drive/My Drive/INF1032 - Spotify/Dados/Consolidados/resultado_anomalias_musicas_boas.csv")
df_resultados_ruins.to_csv("/content/drive/My Drive/INF1032 - Spotify/Dados/Consolidados/resultado_anomalias_musicas_ruins.csv")

time: 904 ms (started: 2022-06-21 16:43:19 +00:00)


# Análise dos Resultados

In [2]:
path_top_global = '/content/drive/My Drive/INF1032 - Spotify/Dados/Matheus/MusicasAnomalas/'
df_resultados = pd.read_csv(path_top_global+"resultado_anomalias.csv",index_col="Unnamed: 0")
df_resultados

Unnamed: 0,Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,abod,prob abod,autoencoder,prob autoencoder,lof,prob lof,if,prob if,total votos,probabilidade
0,0,0.520,0.731,6,-5.338,0,0.0557,0.34200,0.001010,0.3110,...,0,0.988887,0,1.000000,0,0.948568,0,0.777578,0,0.000000
1,0,0.905,0.563,8,-6.135,1,0.1020,0.02540,0.000010,0.1130,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
2,0,0.761,0.525,11,-6.900,1,0.0944,0.44000,0.000007,0.0921,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
3,0,0.591,0.764,1,-5.484,1,0.0483,0.03830,0.000000,0.1030,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
4,0,0.756,0.697,8,-6.377,1,0.0401,0.18200,0.000000,0.3330,...,0,0.988887,0,1.000000,0,0.989254,0,1.000000,0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5151,0,0.714,0.442,6,-5.909,1,0.0605,0.74200,0.000000,0.1140,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
5152,0,0.601,0.713,4,-3.758,0,0.0449,0.02820,0.000000,0.1580,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
5153,0,0.578,0.431,2,-7.034,1,0.0269,0.46900,0.000000,0.1370,...,0,0.988887,0,1.000000,0,1.000000,0,1.000000,0,0.000000
5154,0,0.340,0.684,11,-6.306,1,0.0440,0.00979,0.000000,0.3480,...,0,0.988887,0,0.751209,0,1.000000,0,0.848838,0,0.000000


time: 1.76 s (started: 2022-06-26 23:49:12 +00:00)


### Voto Majoritário

In [None]:
mask = df_resultados["total votos"]==4  

df_resultados[mask]

Unnamed: 0,Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,abod,prob abod,autoencoder,prob autoencoder,lof,prob lof,if,prob if,total votos,probabilidade
369,0,0.418,0.106,8,-22.507,0,0.0448,0.994,0.0292,0.179,...,1,0.011113,1,0.999999,1,0.992669,1,0.999982,4,0.750941
391,0,0.722,0.331,8,-7.789,1,0.0726,0.337,0.282,0.146,...,1,0.011113,1,0.924641,1,0.384735,1,0.973091,4,0.573395
641,0,0.588,0.479,1,-7.039,1,0.281,0.604,7e-06,0.527,...,1,0.011113,1,0.994711,1,1.0,1,0.962016,4,0.74196
901,0,0.461,0.0279,0,-21.992,1,0.0412,0.973,0.00226,0.169,...,1,0.011113,1,0.999995,1,0.988345,1,0.999829,4,0.74982
911,0,0.57,0.285,9,-14.125,0,0.0381,0.737,0.0133,0.108,...,1,0.011113,1,0.958714,1,0.818768,1,0.989038,4,0.694408
918,0,0.597,0.113,0,-34.475,1,0.954,0.461,0.000393,0.148,...,1,0.011113,1,1.0,1,0.999998,1,0.99999,4,0.752775
1020,0,0.773,0.859,11,-4.913,1,0.0747,0.0855,0.00018,0.914,...,1,0.011113,1,0.99727,1,0.137808,1,0.995927,4,0.535529
1121,0,0.546,0.89,7,-3.38,1,0.513,0.0781,0.0,0.146,...,1,0.011113,1,0.960848,1,0.624932,1,0.950536,4,0.636857
1351,0,0.61,0.258,0,-12.758,1,0.0331,0.883,0.0145,0.103,...,1,0.011113,1,0.931155,1,0.200477,1,0.997094,4,0.53496
1355,0,0.356,0.143,9,-15.148,1,0.0388,0.976,1.9e-05,0.111,...,1,0.011113,1,0.99736,1,0.821064,1,0.996911,4,0.706612


time: 155 ms (started: 2022-06-22 15:43:23 +00:00)


### Probabilidade de ser anomalia

In [None]:
prob = 0.7
mask1 = df_resultados["total votos"]==4  
mask2 = df_resultados["probabilidade"]>=prob
mask = (mask1 & mask2)
print("Qtd musicas anômalas (%.2f%% prob) = %d"%(100*prob,mask.sum()))
df_resultados[mask]

Qtd musicas anômalas (70.00% prob) = 25


Unnamed: 0,Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,abod,prob abod,autoencoder,prob autoencoder,lof,prob lof,if,prob if,total votos,probabilidade
369,0,0.418,0.106,8,-22.507,0,0.0448,0.994,0.0292,0.179,...,1,0.011113,1,0.999999,1,0.992669,1,0.999982,4,0.750941
641,0,0.588,0.479,1,-7.039,1,0.281,0.604,7e-06,0.527,...,1,0.011113,1,0.994711,1,1.0,1,0.962016,4,0.74196
901,0,0.461,0.0279,0,-21.992,1,0.0412,0.973,0.00226,0.169,...,1,0.011113,1,0.999995,1,0.988345,1,0.999829,4,0.74982
918,0,0.597,0.113,0,-34.475,1,0.954,0.461,0.000393,0.148,...,1,0.011113,1,1.0,1,0.999998,1,0.99999,4,0.752775
1355,0,0.356,0.143,9,-15.148,1,0.0388,0.976,1.9e-05,0.111,...,1,0.011113,1,0.99736,1,0.821064,1,0.996911,4,0.706612
1359,0,0.746,0.338,11,-13.472,0,0.109,0.674,0.000375,0.529,...,1,0.011113,1,0.978369,1,0.999999,1,0.99638,4,0.746465
1826,0,0.336,0.231,1,-6.217,1,0.0497,0.942,0.0,0.188,...,1,0.011113,1,0.984697,1,1.0,1,0.985577,4,0.745347
1872,0,0.636,0.335,11,-13.327,1,0.966,0.993,0.0,0.342,...,1,0.011113,1,1.0,1,1.0,1,0.99992,4,0.752758
1875,0,0.707,0.314,6,-10.115,0,0.747,0.977,0.0,0.109,...,1,0.011113,1,0.999645,1,0.999999,1,0.994542,4,0.751325
2191,0,0.593,0.0668,11,-15.268,0,0.0722,0.949,1.3e-05,0.194,...,1,0.011113,1,0.997746,1,0.997016,1,0.998611,4,0.751121


time: 47.3 ms (started: 2022-06-22 15:43:33 +00:00)


In [None]:
df_resultados[mask].nome

369                                    Carol of the Bells
641                                                 FEAR.
901                               Dead Inside (Interlude)
918                                       The Explanation
1355                               before I close my eyes
1359                            love yourself (interlude)
1826                           raindrops (an angel cried)
1872                                          Paul - Skit
1875                                 Em Calls Paul - Skit
2191                               difference (interlude)
2901                                         Venice Bitch
3044                                        Jesus Is Lord
3198                                             JACKBOYS
3266                                   Alfred - Interlude
3658                                        Chromatica II
3660                                         Chromatica I
3775                                      Anxiety - Intro
3779          

time: 10.7 ms (started: 2022-06-22 15:43:33 +00:00)


In [None]:
df_resultados[mask][['danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'duration_ms', 'nome']]

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,nome
369,0.418,0.106,8,-22.507,0,0.0448,0.994,0.0292,0.179,0.8,46.718,85267,Carol of the Bells
641,0.588,0.479,1,-7.039,1,0.281,0.604,7e-06,0.527,0.434,150.414,460573,FEAR.
901,0.461,0.0279,0,-21.992,1,0.0412,0.973,0.00226,0.169,0.354,88.388,86827,Dead Inside (Interlude)
918,0.597,0.113,0,-34.475,1,0.954,0.461,0.000393,0.148,0.313,81.311,50748,The Explanation
1355,0.356,0.143,9,-15.148,1,0.0388,0.976,1.9e-05,0.111,0.334,75.522,99658,before I close my eyes
1359,0.746,0.338,11,-13.472,0,0.109,0.674,0.000375,0.529,0.469,120.053,48423,love yourself (interlude)
1826,0.336,0.231,1,-6.217,1,0.0497,0.942,0.0,0.188,0.429,168.685,37640,raindrops (an angel cried)
1872,0.636,0.335,11,-13.327,1,0.966,0.993,0.0,0.342,0.561,161.68,35240,Paul - Skit
1875,0.707,0.314,6,-10.115,0,0.747,0.977,0.0,0.109,0.602,104.014,49024,Em Calls Paul - Skit
2191,0.593,0.0668,11,-15.268,0,0.0722,0.949,1.3e-05,0.194,0.19,102.456,76974,difference (interlude)


time: 50.7 ms (started: 2022-06-22 15:49:17 +00:00)


In [None]:
df_resultados[['danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'duration_ms', 'nome']].describe()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms
count,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0,5156.0
mean,0.684269,0.635878,5.227308,-6.309993,0.576804,0.127005,0.229244,0.011145,0.178445,0.490334,122.135238,203397.779868
std,0.141697,0.163691,3.648195,2.489318,0.494114,0.116225,0.241108,0.070749,0.137566,0.224414,29.936594,47962.752733
min,0.15,0.0279,0.0,-34.475,0.0,0.0232,1.7e-05,0.0,0.0197,0.032,46.718,30133.0
25%,0.598,0.535,1.0,-7.4585,0.0,0.045,0.041575,0.0,0.0966,0.317,97.61075,174144.25
50%,0.701,0.65,5.0,-5.954,1.0,0.0768,0.139,0.0,0.124,0.4875,120.019,199849.0
75%,0.786,0.757,8.0,-4.6715,1.0,0.173,0.336,2.7e-05,0.215,0.663,142.948,226166.75
max,0.985,0.989,11.0,1.509,1.0,0.966,0.994,0.953,0.977,0.982,212.117,690732.0


time: 175 ms (started: 2022-06-22 15:50:25 +00:00)
