# Projeto de Aprendizado de Máquina: Classificador Musical

Marina Rocha Guimarães

Ygor Kupas

Temos dois objetivos centrais nesse notebook:

* No cenário I iremos utilizar um SVM e analisar os diferentes resultados obtidos ao utilizarmos diferentes extrações de features (LPC, MFCC e Mel Spectogram) e suas combinações (MFCC + LPC e Mel Spectogram + LPC). Isso é uma simplificação baseada no estudo realizado em [1];
    
* No cenário II iremos utilizar 3 SVMs, com o intuito de comparar os resultados obtidos no cenário I com o resultado obtido ao separar inicialmente os 4 estilos musicais em 2 grupos distintos (como feito em [2]).

[1] [Mutiara,A.B.; Refianti,R.; Mukarromah, N. R. A. Musical Genre Classification Using Support Vector Machines and Audio Features. Faculty of Computer Science and Information Technology, Gunadarma University. Setembro, 2016](https://pdfs.semanticscholar.org/92b4/66160755cd4c540cad6ab744019ee006e4ec.pdf)

[2] [XU, Changseng et al. Musical Genre Classification Using Support Vector Machines. Laboratories for Information Technology, 2003](https://www.researchgate.net/publication/4015150_Musical_genre_classification_using_support_vector_machines)


Indice<br/><br/>1 - [Importando Libs](#1---Importando-Libs)<br/>2 - [Apagando os avisos do notebook](#2---Apagando-os-avisos-do-notebook)<br/>3 - [Carregando Dataset](#3---Carregando-Dataset)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.1 - [Testando os audios](#3.1---Testando-os-audios)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.2 - [Separando Treinamento e Teste](#3.2---Separando-Treinamento-e-Teste)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.2.1 - [Verificando a distribuição de treinamento e teste entre os estilos musicais](#3.2.1---Verificando-a-distribuição-de-treinamento-e-teste-entre-os-estilos-musicais)<br/>4 - [Extração de Features](#4---Extração-de-Features)<br/>5 - [Cenário I](#5---Cenário-I)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1 - [Analisando os melhores parâmetros para cada extração de feature](#5.1---Analisando-os-melhores-parâmetros-para-cada-extração-de-feature)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1.1 - [MFCC](#5.1.1---MFCC)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1.2 - [Mel Spectogram](#5.1.2---Mel-Spectogram)
<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1.3 - [LPC](#5.1.3---LPC)
<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1.4 - [MFCC + LPC](#5.1.4---MFCC-+-LPC)
<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.1.5 - [Mel Spectogram + LPC](#5.1.5---Mel-Spectogram-+-LPC)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.2 - [Analisando o melhor resultado para cada extração de feature](#5.2---Analisando-o-melhor-resultado-para-cada-extração-de-feature)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.3 - [Melhor Resultado](#5.3---Melhor-Resultado)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5.3.1 - [Tentativa de predição com o melhor modelo](#5.3.1---Tentativa-de-predição-com-o-melhor-modelo)<br/>6 - [Cenário II](#6---Cenário-II)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1 - [Analisando cada extração de feature](#6.1---Analisando-cada-extração-de-feature)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2 - [Melhor Resultado](#6.2---Melhor-Resultado)<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1 - [Tentativa de predição com o melhor modelo](#6.2.1---Tentativa-de-predição-com-o-melhor-modelo)<br/>7 - [Comparando os melhores resultados de cada cenário](#7---Comparando-os-melhores-resultados-de-cada-cenário)

# 1 - Importando Libs

In [None]:
# Libs matplotlib, numpy and pandas
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Lib librosa
import librosa
import librosa.display

# Lib OS
from os import listdir
from os.path import isfile, join, exists

# Lib ipwidgets
from ipywidgets import interact, IntSlider, interact_manual
import ipywidgets as widgets

# Lib IPtyhon
from IPython import display as ipd

# Lib glob
import glob

# Lib sklearn
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, plot_confusion_matrix

# Lib keras
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.utils import to_categorical
from keras import optimizers

# Lib collections
from collections import defaultdict

# 2 - Apagando os avisos do notebook

In [None]:
# Warnings off
import warnings
warnings.filterwarnings('ignore')

# 3 - Carregando Dataset

Baixar todo o [dataset](https://www.kaggle.com/carlthome/gtzan-genre-collection) (genre/) e colocar na mesma pasta do projeto.

In [None]:
def import_signal(path):
    
    # Test if path exists
    if( exists(path) == False ):
        print("Path does not exist, download from https://www.kaggle.com/carlthome/gtzan-genre-collection")
        return 0, 0
    
    # Load files woth glob
    files = glob.glob(path + '*.au')
    audios = []
    
    # Load audio files with sample rate 22050 Hz
    for file in files: 
        s, sr = librosa.core.load(file, sr=22050)
        audios.append(s)
        
    # Return audio vector and sample rate
    return audios, sr

Iremos considerar os seguintes rótulos para cada estilo:

    * Popular  =>  y=0 e y2=0
    * Clássico =>  y=1 e y2=0
    * Jazz     =>  y=2 e y2=1
    * Rock     =>  y=3 e y2=1
    
Isso porque **y** equivale ao rótulo do **primeiro cenário**, no qual temos apenas uma SVM e 4 possíveis saídas. Já o **segundo cenário** usa tanto o **y2** (na primeira SVM) como o **y** (já que a primeira SVM é usada para separar os estilos popular e clássico dos estilos jazz e rock).

In [None]:
# Get audios dataframe
def get_audios_df():
    
    cont = 0
    audios_dict = defaultdict(list)
    
    # Loop in music genres
    for musical_genre in ['pop', 'classical', 'jazz', 'rock']:

        # Importing signals
        audios, sr =  import_signal(f'genres/{musical_genre}/')
        
        #
        audios_dict[musical_genre].append(audios)
        
        # Joining all genres in one dataframe
        audios_df_aux = pd.DataFrame([[x] for x in audios], columns=['x'])
        genre_dict_y = {'pop':0, 'classical':1, 'jazz':2, 'rock':3}
        genre_dict_y2 = {'pop':0, 'classical':0, 'jazz':1, 'rock':1}
        audios_df_aux['y'] = genre_dict_y[f'{musical_genre}']
        audios_df_aux['y2'] = genre_dict_y2[f'{musical_genre}']
        
        if cont == 0:
            audios_df = audios_df_aux
        else:
            audios_df = pd.concat([audios_df, audios_df_aux])
        cont+=1
        
    return audios_df, audios_dict, sr

In [None]:
# Call "get_audio_df()" function
audios_df, audios_dict, sr = get_audios_df()

# Shuffle all music genres
audios_df = audios_df.sample(frac=1).reset_index(drop=True) # shuffle

# Printing rows number
print(f'Rows: {len(audios_df)}')
audios_df.head(2)

## 3.1 - Testando os audios

In [None]:
# Declaring global variables to interact
global chosed_musical_genre
chosed_musical_genre = 'pop'
global chosed_file
chosed_file = 0

In [None]:
# Dropdown from widgets
w_dropdown = widgets.Dropdown(options=['pop', 'classical', 'jazz', 'rock'],
                              value='pop', description='Genero:', disabled=False)

output_dropdown = widgets.Output()

# Changing "chosed_musical_genre"
def on_change_dropdown(change):
    if change['type'] == 'change' and change['name'] == 'value':
        global chosed_musical_genre
        chosed_musical_genre = change['new']
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([16])'))

w_dropdown.observe(on_change_dropdown)
display(w_dropdown, output_dropdown)

In [None]:
# IntSlider from widgets
w_int_slider = widgets.IntSlider(value=0, min=0, max=99, step=1,
                                 description='File:', disabled=False,
                                 continuous_update=False, orientation='horizontal',
                                 readout=True, readout_format='d')

output_int_slider = widgets.Output()

# Changing "chosed_file"
def on_change_int_slider(change):
    with output_int_slider:
        global chosed_file
        chosed_file = change['new']
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([16])'))

w_int_slider.observe(on_change_int_slider, 'value')
display(w_int_slider, output_int_slider)

In [None]:
# Show audio
ipd.Audio(audios_dict[chosed_musical_genre][0][chosed_file], rate=sr)

## 3.2 - Separando Treinamento e Teste

In [None]:
# Using numpy random to separate training and test data
# Could use sklearn.model_selection.train_test_split

# 80% for training and 20% for tests (approximately)
msk = np.random.rand(len(audios_df)) < 0.8
train = audios_df[msk]
test = audios_df[~msk]

# Showing train and test's rows
print(f'Rows train:{len(train)}')
print(f'Rows test:{len(test)}')

### 3.2.1 - Verificando a distribuição de treinamento e teste entre os estilos musicais

In [None]:
# Printing a histogram for test and train distribution

bins_y = np.sort(train['y'].unique())
bins_y2 = np.sort(train['y2'].unique())

fig, axs = plt.subplots(2, 2, figsize=(15,10))

axs[0, 0].bar(bins_y, np.histogram(train['y'], bins=len(bins_y))[0], 
              align='center', color='#ff66b3', ec='#ffffff')
axs[0, 0].set_xticks(bins_y)
axs[0, 0].set_title('Treinamento (y)', fontsize=16)

axs[0, 1].bar(bins_y2, np.histogram(train['y2'], bins=len(bins_y2))[0], 
              align='center', color='#ff66b3', ec='#ffffff')
axs[0, 1].set_xticks(bins_y2)
axs[0, 1].set_title('Treinamento (y2)', fontsize=16)

axs[1, 0].bar(bins_y, np.histogram(test['y'], bins=len(bins_y))[0], 
              align='center', color='#ff66b3', ec='#ffffff')
axs[1, 0].set_xticks(bins_y)
axs[1, 0].set_title('Teste (y)', fontsize=16)

axs[1, 1].bar(bins_y2, np.histogram(test['y2'], bins=len(bins_y2))[0], 
              align='center', color='#ff66b3', ec='#ffffff')
axs[1, 1].set_xticks(bins_y2)
axs[1, 1].set_title('Teste (y2)', fontsize=16)

plt.show()

# 4 - Extração de Features

In [None]:
def mfcc_feature_extract(x_column, sr):
    mfcc_list = []
    for x in x_column:
        
        # Transform to frequency domain with cossenoid transform
        # .flatten() to transform an array to a single vector
        mfcc_flatten = np.array(librosa.feature.mfcc(y=x, sr=sr)).flatten()
        
        # It is done becouse for each x, it has different sizes 
        # of MFCC (max is 26280)
        zeros = np.zeros((26280 - len(mfcc_flatten))) 
                       
        # Append in MFCC list
        mfcc_list.append(np.concatenate((mfcc_flatten, zeros), axis=0))
        
    return mfcc_list

def mel_spect_feature_extract(x_column, sr):
    mel_spect_list = []
    for x in x_column:
        
        # Transform to frequency domain
        # .flatten() to transform an array to a single vector
        mel_spect_flatten = np.array(librosa.feature.melspectrogram(y=x, sr=sr)).flatten()
        
        # It is done becouse for each x, it has different sizes 
        # of Mel Specs (max is 168192)
        zeros = np.zeros((168192 - len(mel_spect_flatten))) 
                              
        # Append in Mel_Spec list
        mel_spect_list.append(np.concatenate((mel_spect_flatten, zeros), axis=0))
        
    return mel_spect_list

def lpc_feature_extract(x_column, order=6):
    lpc_list = []
    for x in x_column:
        # Linear 
        lpc = librosa.lpc(x, order=order)
        lpc_list.append(lpc)
    return lpc_list

In [None]:
# Get all features extrations
def feature_extract(df, sr):
    
    # MFCC
    df['mfcc'] = mfcc_feature_extract(df['x'], sr)
    
    # LPC
    df['lpc'] = lpc_feature_extract(df['x'], order=6)
    
    # Mel Spectogram
    df['mel_spect'] = mel_spect_feature_extract(df['x'], sr)
    
    # MFCC + LPC
    df['mfcc_lpc'] = (
        df['mfcc'].apply(lambda x: x.tolist()) + 
        df['lpc'].apply(lambda x: x.tolist())
    )
    
    # Mel Spectogram + LPC
    df['mel_spect_lpc'] = (
        df['mel_spect'].apply(lambda x: x.tolist()) + 
        df['lpc'].apply(lambda x: x.tolist())
    )
    
    return df 

In [None]:
# Train extraction
train = feature_extract(train, sr)
train.head(2)

In [None]:
# Test extraction
test = feature_extract(test, sr)
test.head(2)

# 5 - Cenário I

* Usando uma rede SVM de apenas uma cama para reconhecer os 4 tipos musicais

In [None]:
def get_first_scenario_score_and_model(train, test, feature_ex='lpc', feature_ex2='mfcc_lpc', 
                                       two_features_ex=False, return_model=False, degree=3, 
                                       gamma=0.7, C=1.0):
    
    # If more than one feature (e.g.: LPC + MFCC)
    if two_features_ex:
        model = svm.SVC(kernel='poly', degree=degree, C=C)
        model.fit(train[f'{feature_ex2}'].to_list(), train['y'].to_list())

        train_score = (
            model.score(train[f'{feature_ex2}'].to_list(), train['y'].to_list())
        )

        test_score = (
            model.score(test[f'{feature_ex2}'].to_list(), test['y'].to_list())
        )
        
        if return_model: return model, train_score, test_score
        
        return train_score, test_score
    
    # If using LPF feature, it is better to use RBF Kernel
    if feature_ex=='lpc':
        model = svm.SVC(kernel='rbf', gamma=gamma, C=C)
    # Else, use Poly Kernel
    else:
        model = svm.SVC(kernel='poly', degree=degree, C=C)

    model.fit(train[f'{feature_ex}'].to_list(), train['y'].to_list())

    train_score = (
        model.score(train[f'{feature_ex}'].to_list(), train['y'].to_list())
    )

    test_score = (
        model.score(test[f'{feature_ex}'].to_list(), test['y'].to_list())
    )
    
    if return_model: return model, train_score, test_score
        
    return train_score, test_score

## 5.1 - Analisando os melhores parâmetros para cada extração de feature

### 5.1.1 - MFCC

In [None]:
train_List = []
test_List = []

for degree in np.arange(1,7,1):
    print(f'Results with degree={degree}')
    
    train_score, test_score = get_first_scenario_score_and_model(train, test, 
                                                                 feature_ex='mfcc', 
                                                                 degree=degree)
    
    train_List.append(train_score)
    test_List.append(test_score)
    
    print('Train accuracy: {:.1%}'.format(train_score))
    print('Test accuracy: {:.1%}'.format(test_score))
    print('\n')

plt.title("Score Plot")
plt.ylabel("Score")
plt.xlabel("Degree")
plt.plot(np.arange(1,7,1), train_List, color='#00cc66', linestyle='-', label="train")
plt.plot(np.arange(1,7,1), test_List, color='#00ace6', linestyle='--', label="test")
plt.legend();
plt.show

Melhor resultado: degree=2

### 5.1.2 - Mel Spectogram

In [None]:
train_List = []
test_List = []

for degree in np.arange(1,7,1):
    print(f'Results with degree={degree}')
    
    train_score, test_score = get_first_scenario_score_and_model(train, test, 
                                                                 feature_ex='mel_spect', 
                                                                 degree=degree)
    
    train_List.append(train_score)
    test_List.append(test_score)
    
    print('Train accuracy: {:.1%}'.format(train_score))
    print('Test accuracy: {:.1%}'.format(test_score))
    print('\n')
    
plt.title("Score Plot")
plt.ylabel("Score")
plt.xlabel("Degree")
plt.plot(np.arange(1,7,1), train_List, color='#00cc66', linestyle='-', label="train")
plt.plot(np.arange(1,7,1), test_List, color='#00ace6', linestyle='--', label="test")
plt.legend();
plt.show

Melhor resultado: degree=1

### 5.1.3 - LPC

In [None]:
train_List = []
test_List = []

for gamma in np.arange(0.5,1.7,0.1):
    print(f'Results with gamma={gamma}')
    
    train_score, test_score = get_first_scenario_score_and_model(train, test, 
                                                                 feature_ex='lpc', 
                                                                 gamma=gamma)
    
    train_List.append(train_score)
    test_List.append(test_score)
    
    print('Train accuracy: {:.1%}'.format(train_score))
    print('Test accuracy: {:.1%}'.format(test_score))
    print('\n')
    
plt.title("Score Plot")
plt.ylabel("Score")
plt.xlabel("Gamma")
plt.plot(np.arange(0.5,2,0.1), train_List, color='#00cc66', linestyle='-', label="train")
plt.plot(np.arange(0.5,2,0.1), test_List, color='#00ace6', linestyle='--', label="test")
plt.legend();
plt.show

Melhor resultado: gamma=1.5

### 5.1.4 - MFCC + LPC

In [None]:
train_List = []
test_List = []

for degree in np.arange(1,7,1):
    print(f'Results with degree={degree}')
    
    train_score, test_score = get_first_scenario_score_and_model(train, test, feature_ex2='mfcc_lpc', 
                                                                 two_features_ex=True, degree=degree)
    
    train_List.append(train_score)
    test_List.append(test_score)
    
    print('Train accuracy: {:.1%}'.format(train_score))
    print('Test accuracy: {:.1%}'.format(test_score))
    print('\n')
    
plt.title("Score Plot")
plt.ylabel("Score")
plt.xlabel("Degree")
plt.plot(np.arange(1,7,1), train_List, color='#00cc66', linestyle='-', label="train")
plt.plot(np.arange(1,7,1), test_List, color='#00ace6', linestyle='--', label="test")
plt.legend();
plt.show

Melhor resultado: degree=2

### 5.1.5 - Mel Spectogram + LPC

In [None]:
train_List = []
test_List = []

for degree in np.arange(1,7,1):
    print(f'Results with degree={degree}')
    
    train_score, test_score = get_first_scenario_score_and_model(train, test, feature_ex2='mel_spect_lpc', 
                                                                 two_features_ex=True, degree=degree)
    
    train_List.append(train_score)
    test_List.append(test_score)
    
    print('Train accuracy: {:.1%}'.format(train_score))
    print('Test accuracy: {:.1%}'.format(test_score))
    print('\n')
    
plt.title("Score Plot")
plt.ylabel("Score")
plt.xlabel("Degree")
plt.plot(np.arange(1,7,1), train_List, color='#00cc66', linestyle='-', label="train")
plt.plot(np.arange(1,7,1), test_List, color='#00ace6', linestyle='--', label="test")
plt.legend();
plt.show

Melhor resultado: degree=1

## 5.2 - Analisando o melhor resultado para cada extração de feature

In [None]:
options = [('Confusion Matrix Using MFCC (first scenario)', 'mfcc', None, False, 2, 0),
           ('Confusion Matrix Using Mel Spectogram (first scenario)', 'mel_spect', None, False, 1, 0),
           ('Confusion Matrix Using LPC (first scenario)', 'lpc', None, False, 0, 1.5),
           ('Confusion Matrix Using MFCC + LPC (first scenario)', None, 'mfcc_lpc', True, 2, 0),
           ('Confusion Matrix Using Mel Spectogram + LPC (first scenario)', None, 'mel_spect_lpc', True, 1, 0)]

for title, feature_ex, feature_ex2, two_features, degree, gamma in options:     
    
    model, train_score, test_score = (
        get_first_scenario_score_and_model(train, test, feature_ex=feature_ex, feature_ex2=feature_ex2, 
                                           two_features_ex=two_features, return_model=True, 
                                           degree=degree, gamma=gamma)
    )
       
    feature = feature_ex or feature_ex2
    
    plt.rcParams.update({'font.size': 12})
    disp = plot_confusion_matrix(model, test[f'{feature}'].to_list(), 
                                 test['y'].to_list(), labels=[0,1,2,3],
                                 display_labels=['pop', 'classical', 'jazz', 'rock'],
                                 cmap=plt.cm.BuPu)
    disp.ax_.set_title(title, fontsize=16)
    plt.figure(figsize=(20,20))
    plt.show()

    print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score))
    print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score))
    print('\n')

## 5.3 - Melhor Resultado

Analisando o melhor resultado com base principalmente na porcentagem de acerto no dataset de teste, temos 2 melhores modelos:
    * SVM com kernel=poly e extração de feature MFCC
    * SVM com kernel=poly e extração de feature MFCC+LPC

Nota-se que os dois resultados acima são identicos. Vamos então escolher o primeiro citado para verificar o resultado final.

In [None]:
model, train_score, test_score = (
    get_first_scenario_score_and_model(train, test, feature_ex='mfcc', 
                                       return_model=True, degree=2)
)

plt.rcParams.update({'font.size': 12})
disp = plot_confusion_matrix(model, test['mfcc'].to_list(), 
                             test['y'].to_list(), labels=[0,1,2,3],
                             display_labels=['pop', 'classical', 'jazz', 'rock'],
                             cmap=plt.cm.BuPu)
disp.ax_.set_title('Confusion Matrix Using MFCC (first scenario)', fontsize=16)
plt.figure(figsize=(20,20))
plt.show()

print('\n')
print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score))
print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score))
print('\n')

### 5.3.1 - Tentativa de predição com o melhor modelo

In [None]:
global chosed_musical_genre_scenario1
chosed_musical_genre_scenario1 = 'pop'

global chosed_file_scenario1
chosed_file_scenario1 = 0

In [None]:
w_dropdown = widgets.Dropdown(options=['pop', 'classical', 'jazz', 'rock'],
                              value='pop', description='Genero:', disabled=False)

output_dropdown = widgets.Output()

def on_change_dropdown(change):
    if change['type'] == 'change' and change['name'] == 'value':
        global chosed_musical_genre_scenario1
        chosed_musical_genre_scenario1 = change['new']
        
w_dropdown.observe(on_change_dropdown)
display(w_dropdown, output_dropdown)

In [None]:
w_int_slider = widgets.IntSlider(value=0, min=0, max=99, step=1,
                                 description='File:', disabled=False,
                                 continuous_update=False, orientation='horizontal',
                                 readout=True, readout_format='d')

output_int_slider = widgets.Output()

def on_change_int_slider(change):
    with output_int_slider:
        global chosed_file_scenario1
        chosed_file_scenario1 = change['new']
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([53])'))
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([54])'))
w_int_slider.observe(on_change_int_slider, 'value')
display(w_int_slider, output_int_slider)

In [None]:
ipd.Audio(audios_dict[chosed_musical_genre_scenario1][0][chosed_file_scenario1], rate=sr)

In [None]:
audio_test = audios_dict[chosed_musical_genre_scenario1][0][chosed_file_scenario1]
audio_test_mfcc = mfcc_feature_extract([audio_test], sr)[0]

genre_dict = {0: 'popular', 1:'classico',
              2:'jazz', 3:'rock'}

predicted_musical_genre = genre_dict[model.predict([audio_test_mfcc])[0]]

print('\nA música inserida pertence ao estilo musical ' + '\033[1m' + f'{predicted_musical_genre}' + '\033[0;0m')

# 6 - Cenário II

* Usando 3 redes SVM:
    * SVM1 - separa os estilos pop e classico dos estilos jazz e rock
    * SVM2 - separa o estilo pop do estilo classico
    * SVM3 - separa o estilo jazz do estilo rock

In [None]:
def training_a_model(x, y, feature_ex, gamma, degree, C):
    if feature_ex=='lpc':
        model = svm.SVC(kernel='rbf', gamma=gamma, C=C)
    else:
        model = svm.SVC(kernel='poly', degree=degree, C=C)
        
    model.fit(x, y)
    return model

In [None]:
def get_second_scenario_score_and_models(train, test, feature_ex_svm1='lpc', feature_ex_svm23='mfcc',
                                        degree=2, gamma=1.5, C=1.0, return_models=False):
    
    # SVM1
    model1 = training_a_model(train[f'{feature_ex_svm1}'].to_list(), 
                              train['y2'].to_list(),
                              feature_ex_svm1, gamma, degree, C)
    train['y2_predicted'] = model1.predict(train[f'{feature_ex_svm1}'].to_list())
    
    #SVM2
    train_0 = train[train['y2_predicted']==0]
    train_0 = train_0[train_0['y'].isin([0,1])]
    
    model2 = training_a_model(train_0[f'{feature_ex_svm23}'].to_list(), 
                              train_0['y'].to_list(),
                              feature_ex_svm23, gamma, degree, C)
    train_0['y_predicted'] = model2.predict(train_0[f'{feature_ex_svm23}'].to_list())
    test_0 = test[test['y2']==0]
    
    #SVM3
    train_1 = train[train['y2_predicted']==1]
    train_1 = train_1[train_1['y'].isin([2,3])]
    
    model3 = training_a_model(train_1[f'{feature_ex_svm23}'].to_list(), 
                              train_1['y'].to_list(),
                              feature_ex_svm23, gamma, degree, C)
    train_1['y_predicted'] = model3.predict(train_1[f'{feature_ex_svm23}'].to_list())
    test_1 = test[test['y2']==1]
    
    train_score = (
        model2.score(train_0[f'{feature_ex_svm23}'].to_list(), train_0['y'].to_list()) * 0.5 +
        model3.score(train_1[f'{feature_ex_svm23}'].to_list(), train_1['y'].to_list()) * 0.5
    )
    
    test_score = (
        model2.score(test_0[f'{feature_ex_svm23}'].to_list(), test_0['y'].to_list()) * 0.5 +
        model3.score(test_1[f'{feature_ex_svm23}'].to_list(), test_1['y'].to_list()) * 0.5
    )
    
    return model1, model2, model3, train_score, test_score

In [None]:
# Create own predict function
def second_scenario_predict_an_audio(feature1, feature2, model1, model2, model3):    
    y2_predicted = model1.predict(feature1)
    
    y_predicted_list = []
    i=0
    for predicted in y2_predicted:
        
        feature = feature2[i]
        
        if predicted == 0:
            y_predicted = model2.predict([feature])
        elif predicted == 1:
            y_predicted = model3.predict([feature])
            
        y_predicted_list.append(y_predicted)
        
        i+=1
    
    return y_predicted_list

## 6.1 - Analisando cada extração de feature

Aqui vamos analisar os resultados do cenário II alterando a extração de feature utilizada na SVM1 e nas SVM2 e SVM3 - vale ressaltar que a extração de feature utilizada na SVM2 é igual àquela utilizada na SVM3.

Por simplificação, vamos apenas analisar as extrações LPC e MFCC.

In [None]:
# Choose best parameters for each feature extraction  (find in first scenario)
options = [('Confusion Matrix Using LPC & LPC (second scenario)', 'lpc', 'lpc', 0, 1.5),
           ('Confusion Matrix Using MFCC & MFCC (second scenario)', 'mfcc', 'mfcc', 2, 0),
           ('Confusion Matrix Using LPC & MFCC (second scenario)', 'lpc', 'mfcc', 2, 1.5),
           ('Confusion Matrix Using MFCC & LPC (second scenario)', 'mfcc', 'lpc', 2, 1.5),]

for title, feature_ex_svm1, feature_ex_svm23, degree, gamma in options:     
    
    model1, model2, model3, train_score, test_score = (
        get_second_scenario_score_and_models(train, test, feature_ex_svm1=feature_ex_svm1,
                                             feature_ex_svm23=feature_ex_svm23, degree=degree,
                                             gamma=gamma)
    )
    
    predictions = (
        second_scenario_predict_an_audio(test[f'{feature_ex_svm1}'].to_list(), 
                                         test[f'{feature_ex_svm23}'].to_list(), 
                                         model1, model2, model3)
    )
    predictions = [x[0] for x in predictions]
    
    plt.rcParams.update({'font.size': 12})
    
    disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix(test['y'].to_list(), predictions),
                                  display_labels=['pop', 'classical', 'jazz', 'rock'])
    disp = disp.plot(include_values=True,
                     cmap=plt.cm.BuPu)
    disp.ax_.set_title(title, fontsize=16)
    
    plt.figure(figsize=(20,20))
    plt.show()

    print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score))
    print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score))
    print('\n')

## 6.2 - Melhor Resultado

Analisando o melhor resultado com base principalemente na porcentagem de acerto no dataset de teste, o melhor modelo foi o que utilizou kernel=poly e extração de feature MFCC nos 3 SVM utilizados.

In [None]:
model1, model2, model3, train_score, test_score = (
    get_second_scenario_score_and_models(train, test, feature_ex_svm1='mfcc',
                                         feature_ex_svm23='mfcc', degree=2)
)

predictions = (
    second_scenario_predict_an_audio(test['mfcc'].to_list(), 
                                     test['mfcc'].to_list(), 
                                     model1, model2, model3)
)
predictions = [x[0] for x in predictions]

plt.rcParams.update({'font.size': 12})

disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix(test['y'].to_list(), predictions),
                              display_labels=['pop', 'classical', 'jazz', 'rock'])
disp = disp.plot(include_values=True,
                 cmap=plt.cm.BuPu)
disp.ax_.set_title('Confusion Matrix Using MFCC & MFCC (second scenario)', fontsize=16)

plt.figure(figsize=(20,20))
plt.show()

print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score))
print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score))
print('\n')

### 6.2.1 - Tentativa de predição com o melhor modelo

In [None]:
global chosed_musical_genre_scenario2
chosed_musical_genre_scenario2 = 'pop'

global chosed_file_scenario2
chosed_file_scenario2 = 0

In [None]:
w_dropdown = widgets.Dropdown(options=['pop', 'classical', 'jazz', 'rock'],
                              value='pop', description='Genero:', disabled=False)

output_dropdown = widgets.Output()

def on_change_dropdown(change):
    if change['type'] == 'change' and change['name'] == 'value':
        global chosed_musical_genre_scenario2
        chosed_musical_genre_scenario2 = change['new']

w_dropdown.observe(on_change_dropdown)
display(w_dropdown, output_dropdown)

In [None]:
w_int_slider = widgets.IntSlider(value=0, min=0, max=99, step=1,
                                 description='File:', disabled=False,
                                 continuous_update=False, orientation='horizontal',
                                 readout=True, readout_format='d')

output_int_slider = widgets.Output()

def on_change_int_slider(change):
    with output_int_slider:
        global chosed_file_scenario2
        chosed_file_scenario2 = change['new']
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([69])'))
        ipd.display(ipd.Javascript('IPython.notebook.execute_cells([70])'))

w_int_slider.observe(on_change_int_slider, 'value')
display(w_int_slider, output_int_slider)

In [None]:
ipd.Audio(audios_dict[chosed_musical_genre_scenario2][0][chosed_file_scenario2], rate=sr)

In [None]:
audio_test = audios_dict[chosed_musical_genre_scenario2][0][chosed_file_scenario2]
audio_test_mfcc = mfcc_feature_extract([audio_test], sr)[0]

genre_dict = {0: 'popular', 1:'classico',
              2:'jazz', 3:'rock'}

predicted_musical_genre = (
    genre_dict[
        second_scenario_predict_an_audio([audio_test_mfcc], [audio_test_mfcc], model1, model2, model3)[0][0]
    ]
)

print('\nA música inserida pertence ao estilo musical ' + '\033[1m' + f'{predicted_musical_genre}' + '\033[0;0m')

# 7 - Comparando os melhores resultados de cada cenário

In [None]:
# Cenário I
model, train_score1, test_score1 = (
    get_first_scenario_score_and_model(train, test, feature_ex='mfcc', 
                                       return_model=True, degree=2)
)

plt.rcParams.update({'font.size': 12})
disp = plot_confusion_matrix(model, test['mfcc'].to_list(), 
                             test['y'].to_list(), labels=[0,1,2,3],
                             display_labels=['pop', 'classical', 'jazz', 'rock'],
                             cmap=plt.cm.BuPu)
disp.ax_.set_title('Confusion Matrix Using MFCC (first scenario)', fontsize=16)
plt.figure(figsize=(20,20))
plt.show()

print('\n')
print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score1))
print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score1))
print('\n')

# Cenário II
model1, model2, model3, train_score2, test_score2 = (
    get_second_scenario_score_and_models(train, test, feature_ex_svm1='mfcc',
                                         feature_ex_svm23='mfcc', degree=2)
)

predictions = (
    second_scenario_predict_an_audio(test['mfcc'].to_list(), 
                                     test['mfcc'].to_list(), 
                                     model1, model2, model3)
)
predictions = [x[0] for x in predictions]

plt.rcParams.update({'font.size': 12})

disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix(test['y'].to_list(), predictions),
                              display_labels=['pop', 'classical', 'jazz', 'rock'])
disp = disp.plot(include_values=True,
                 cmap=plt.cm.BuPu)
disp.ax_.set_title('Confusion Matrix Using MFCC & MFCC (second scenario)', fontsize=16)

plt.figure(figsize=(20,20))
plt.show()

print('\033[1m' + 'Train accuracy:' + '\033[0;0m' + ' {:.1%}'.format(train_score2))
print('\033[1m' + 'Test accuracy: '+ '\033[0;0m' + ' {:.1%}'.format(test_score2))
print('\n')

Nota-se que, ao utilizar a separação inicial dos 4 estilos musicais em 2 grupos (cenário 2) a eficácia do modelo melhorou. Isso nos mostra que, tendo as mesmas ferramentas, podemos melhorar nosso modelo se conhecermos mais sobre o dataset utilizado - nesse caso, ao conhecer um pouco sobre teoria da música, percebemos que seria mais fácil separar, por exemplo, o estilo popular do estilo clássico do que o estilo jazz do estilo classico.