Classificação Multi Label em Python com a biblioteca scikit-Multilearn

Iremos fazer dois algoritmos adaptados, vizinho mais próximo (kNN) e o Hierarquical ARAM (tipo de Neural Network)
E também veremos os três tipos de Transformação do Problema

In [1]:
#pip install scikit-multilearn
# Knn e Hierarquical ARAM NN
from skmultilearn.adapt import MLkNN, MLARAM
from skmultilearn.problem_transform import BinaryRelevance, ClassifierChain, LabelPowerset #transformação de problemas
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import hamming_loss
import pandas as pd

In [2]:
musica = pd.read_csv('Musica.csv', sep=",")
musica.shape

(592, 77)

In [3]:
musica.head()

Unnamed: 0,amazed-suprised,happy-pleased,relaxing-clam,quiet-still,sad-lonely,angry-aggresive,Mean_Acc1298_Mean_Mem40_Centroid,Mean_Acc1298_Mean_Mem40_Rolloff,Mean_Acc1298_Mean_Mem40_Flux,Mean_Acc1298_Mean_Mem40_MFCC_0,...,Std_Acc1298_Std_Mem40_MFCC_10,Std_Acc1298_Std_Mem40_MFCC_11,Std_Acc1298_Std_Mem40_MFCC_12,BH_LowPeakAmp,BH_LowPeakBPM,BH_HighPeakAmp,BH_HighPeakBPM,BHSUM1,BHSUM2,BHSUM3
0,0,1,1,0,0,0,0.132498,0.077848,0.229227,0.602629,...,0.197026,0.196244,0.164323,0.030017,0.253968,0.008473,0.240602,0.136735,0.058442,0.107594
1,1,0,0,0,0,1,0.384281,0.355249,0.16719,0.853089,...,0.093526,0.085649,0.025101,0.182955,0.285714,0.156764,0.270677,0.191377,0.153728,0.197951
2,0,1,0,0,0,1,0.541782,0.356491,0.152246,0.791142,...,0.198082,0.108067,0.140574,0.099303,0.142857,0.0,0.593985,0.105114,0.025555,0.122965
3,0,0,1,0,0,0,0.174288,0.243935,0.254326,0.438987,...,0.235793,0.220195,0.235834,0.024988,0.222222,0.117169,0.210526,0.057288,0.134575,0.091509
4,0,0,0,1,0,0,0.347436,0.155448,0.100047,0.126026,...,0.715683,0.573359,0.412368,0.016398,0.761905,0.081703,0.721805,0.108737,0.172882,0.189934


Separando as Classes:

In [4]:
classe = musica.iloc[:,0:6].values
previsores = musica.iloc[:,7:78].values
classe

array([[0, 1, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 1],
       ...,
       [0, 0, 1, 1, 1, 0],
       [0, 1, 1, 0, 0, 0],
       [0, 1, 0, 0, 0, 0]], dtype=int64)

Divisão de Treino e teste:

In [5]:
X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(previsores,classe,test_size = 0.3,
                                                                  random_state = 0)

Primeiro iremos usar o Algoritmo Adaptado do vizinho mais próximo (kNN):

In [6]:
vmp = MLkNN(k=3) 
vmp.fit(X_treinamento, y_treinamento) 



MLkNN(k=3)

In [7]:
# Fazendo a previsão com dados de teste:
previsto = vmp.predict(X_teste) 
# hamming para avaliar preformance  
print(hamming_loss(y_teste, previsto)) 

0.23782771535580524


O Hamming Loss do Algoritmo KNN adaptado foi 0.23 ou 23% dos labels foram errados

Segundo Classificador Adaptado: Hierarquical ARAM Neural Network:

In [8]:
ann = MLARAM()
ann.fit(X_treinamento, y_treinamento) 

MLARAM(neurons=[<skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E826E48>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E826F88>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D048>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E826FC8>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D108>,
                <skmultilearn.adapt.mlaram.Ne...
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D808>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D848>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D7C8>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D8C8>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D948>,
                <skmultilearn.adapt.mlaram.Neuron object at 0x0000011D9E80D908>, ...])

In [9]:
#previsão com dados de teste
previsto = ann.predict(X_teste) 
#hamming para avaliar preformance  
print(hamming_loss(y_teste, previsto)) 

0.24812734082397003


O Hamming Loss do Algoritmo Hierarquical ARAM foi de 0.24, ou seja, 24% dos labels foram errados


Transformação do problema: 

In [10]:
#binary relevance
binary = BinaryRelevance(classifier = SVC())
    # Quando fazemos a transformação de Problema, precisamos de um classsificador 
    # Usamos a máquina de vetor de suporte (SVC) como classificador 
binary.fit(X_treinamento, y_treinamento)
previsao = binary.predict(X_teste)
print(hamming_loss(y_teste, previsao))

0.199438202247191


Com o Binary Relevance o Hamming Loss foi de apenas 0.19

In [11]:
#ClassifierChain
chain = ClassifierChain(classifier = SVC())
chain.fit(X_treinamento, y_treinamento)
previsoes = chain.predict(X_teste)
print(hamming_loss(y_teste,previsoes))

0.2340823970037453


In [12]:
#labelpowerset
label = LabelPowerset(classifier = SVC())
label.fit(X_treinamento, y_treinamento)
previsoes = label.predict(X_teste)
print(hamming_loss(y_teste,previsoes))

0.2209737827715356


A Transformação de Problema com Binary Relevance foi o que teve o melhor resultado para estes dados.