# Detection of TOXicity in comments in Spanish (DETOXIS 2021)

## SESIÓN 2.4: Combinación de clasificadores

### Realizado por Álvaro Mazcuñán y Miquel Marín

**IMPORTANTE**: En esta última práctica se ha presentado el modelo BETO junto con las técnicas de combinación de clasificadores. No obstante, aunque no figure en esta entrega, la realización de la variante BERT para tweets en castellano (BETO) se implementó ya en la anterior entrega (Sesión 3: Clasificador basado en BERT y evaluación).

A continuación se pasaría a mostrar lo que se ha realizado para obtener la combinación de modelos.

#### Librerías

Se importan las mismas librerías que se utilizaron en la parte anterior. Además se añaden las necesarias como es la de StackingClassifier para poder evaluar la combinación de clasificadores.

In [1]:
import pandas as pd
import re
import string
import numpy as np

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import StackingClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

import warnings

In [2]:
warnings.filterwarnings("ignore")

In [3]:
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

#### Carga de datos DETOXIS

Se pasan a cargar los datos de DETOXIS que se han utilizado a lo largo de todas las entregas.

In [4]:
df = pd.read_csv("DATASET_DETOXIS.csv")
df

Unnamed: 0,topic,thread_id,comment_id,reply_to,comment_level,comment,argumentation,constructiveness,positive_stance,negative_stance,target_person,target_group,stereotype,sarcasm,mockery,insult,improper_language,aggressiveness,intolerance,toxicity,toxicity_level
0,CR,0_000,0_002,0_002,1,Pensó: Zumo para restar.,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1
1,CR,0_001,0_003,0_003,1,Como les gusta el afeitado en seco a esta gente.,0,0,0,0,0,1,1,1,1,0,0,0,0,1,1
2,CR,0_002,0_004,0_004,1,"asi me gusta, que se maten entre ellos y en al...",0,0,0,0,0,1,0,0,0,0,0,1,1,1,2
3,CR,0_003,0_005,0_005,1,"Loss mas valientes, los que mejor cortan nuest...",0,0,0,0,1,1,0,1,1,0,0,0,0,1,1
4,CR,0_004,0_006,0_006,1,Costumbres...,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3458,MI,20_134,20_164,20_164,1,Ya decía yo que veía menos moros,0,0,0,0,0,1,0,0,1,1,0,0,0,1,1
3459,MI,20_006,20_165,20_008,2,+1. Como lo sabes...,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3460,MI,20_135,20_166,20_166,1,"Seguirán cobrando paguitas en Marruecos,expoli...",0,0,0,0,0,1,1,0,0,0,0,0,1,1,1
3461,MI,20_136,20_167,20_167,1,"pobres, se arriesgan en pateras porque huyen d...",0,0,0,0,0,1,0,0,1,0,0,0,0,1,1


In [5]:
sample_data = df[["comment", "toxicity","toxicity_level"]]
sample_data

Unnamed: 0,comment,toxicity,toxicity_level
0,Pensó: Zumo para restar.,1,1
1,Como les gusta el afeitado en seco a esta gente.,1,1
2,"asi me gusta, que se maten entre ellos y en al...",1,2
3,"Loss mas valientes, los que mejor cortan nuest...",1,1
4,Costumbres...,1,1
...,...,...,...
3458,Ya decía yo que veía menos moros,1,1
3459,+1. Como lo sabes...,0,0
3460,"Seguirán cobrando paguitas en Marruecos,expoli...",1,1
3461,"pobres, se arriesgan en pateras porque huyen d...",1,1


#### Leer tweets y preprocesado

Se pasan a leer y preprocesar cada uno de los tweets que disponemos en la base de datos mediante la función de limpieza que se ha ido utilizando a lo largo de todas las entregas.

In [6]:
def tweet_preprocessing_not_tokenized(tweet):
    tweet = tweet.lower() # Se empieza pasando todos los mensajes a minúsculas
    tweet = re.sub(r"http\S+|www\S+|https\S+", "" ,tweet , flags=re.MULTILINE) # Quitar URLs
    tweet = re.sub(r"\@\w+|\#", "", tweet) # Quitar @ y #
    tweet = re.sub(r"[\U00010000-\U0010ffff]|:\)|:\(|XD|xD|;\)|:,\(|:D|D:", "", tweet) # Quitar emojis y emoticones
    tweet = tweet.translate(str.maketrans('', '', string.punctuation)) # Quitar signos de puntuación
    tokenized_tweets = word_tokenize(tweet)
    filtered_tweets = [word for word in tokenized_tweets if not word in set(stopwords.words('spanish'))] # Quitar stopwords y filtrar
    
    stemming = PorterStemmer() # Inicializamos PorterStemmer para obtener la raíz de cada una de las palabras
    stemmed_tweets = [stemming.stem(word) for word in filtered_tweets]
    lemmatization = WordNetLemmatizer() # Inicializamos el Lemmatizer para obtener los lemas de las palabras
    lemma_tweets = [lemmatization.lemmatize(word, pos='a') for word in stemmed_tweets] 
    return " ".join(lemma_tweets) # NO TOKENIZADO

preprocessing = lambda x: tweet_preprocessing_not_tokenized(x)

In [7]:
sample_data['comment'] = pd.DataFrame(sample_data["comment"].apply(preprocessing))

#### Variable `Toxicity`

#### Dividir el corpus en conjunto de entrenamiento y test

In [22]:
train_X, test_X, train_Y, test_Y = train_test_split(sample_data['comment'], sample_data['toxicity'], test_size=0.3)

### Extracción de características

#### Term Frequency - Inverse Document Frequency (TF-IDF)

In [23]:
tfidf_vect = TfidfVectorizer()
tfidf_vect.fit(sample_data['comment'])
train_X_Tfidf = tfidf_vect.transform(train_X)
test_X_Tfidf = tfidf_vect.transform(test_X)

#### Support Vector Machine (SVM)

In [24]:
svm_clf = SVC(C=1.0, kernel='linear', degree=3, gamma='auto')
svm_clf.fit(train_X_Tfidf,train_Y)

y_train_pred = svm_clf.predict(train_X_Tfidf)
y_test_pred = svm_clf.predict(test_X_Tfidf)

# Training set performance
svm_train_accuracy = accuracy_score(train_Y, y_train_pred) # Calculate Accuracy
svm_train_f1 = f1_score(train_Y, y_train_pred, average='weighted') # Calculate F1-score

# Test set performance
svm_test_accuracy = accuracy_score(test_Y, y_test_pred) # Calculate Accuracy
svm_test_f1 = f1_score(test_Y, y_test_pred, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % svm_train_accuracy)
print('- F1 score: %s' % svm_train_f1)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % svm_test_accuracy)
print('- F1 score: %s' % svm_test_f1)

Model performance for Training set
- Accuracy: 0.9451320132013201
- F1 score: 0.9439714298372067
----------------------------------
Model performance for Test set
- Accuracy: 0.7670837343599615
- F1 score: 0.7381633922707466


#### Decision Tree

In [25]:
tree_clf = DecisionTreeClassifier(max_depth=4)
tree_clf.fit(train_X_Tfidf,train_Y)

y_train_pred = tree_clf.predict(train_X_Tfidf)
y_test_pred = tree_clf.predict(test_X_Tfidf)

# Training set performance
dt_train_accuracy = accuracy_score(train_Y, y_train_pred) # Calculate Accuracy
dt_train_f1 = f1_score(train_Y, y_train_pred, average='weighted') # Calculate F1-score

# Test set performance
dt_test_accuracy = accuracy_score(test_Y, y_test_pred) # Calculate Accuracy
dt_test_f1 = f1_score(test_Y, y_test_pred, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % dt_train_accuracy)
print('- F1 score: %s' % dt_train_f1)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % dt_test_accuracy)
print('- F1 score: %s' % dt_test_f1)

Model performance for Training set
- Accuracy: 0.6984323432343235
- F1 score: 0.6089207425922718
----------------------------------
Model performance for Test set
- Accuracy: 0.7276227141482194
- F1 score: 0.6509128557478479


#### Random Forest

In [26]:
rf = RandomForestClassifier(n_estimators=10)
rf.fit(train_X_Tfidf, train_Y)

y_train_pred = rf.predict(train_X_Tfidf)
y_test_pred = rf.predict(test_X_Tfidf)

# Training set performance
rf_train_accuracy = accuracy_score(train_Y, y_train_pred) # Calculate Accuracy
rf_train_f1 = f1_score(train_Y, y_train_pred, average='weighted') # Calculate F1-score

# Test set performance
rf_test_accuracy = accuracy_score(test_Y, y_test_pred) # Calculate Accuracy
rf_test_f1 = f1_score(test_Y, y_test_pred, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % rf_train_accuracy)
print('- F1 score: %s' % rf_train_f1)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % rf_test_accuracy)
print('- F1 score: %s' % rf_test_f1)

Model performance for Training set
- Accuracy: 0.969059405940594
- F1 score: 0.9686918838880257
----------------------------------
Model performance for Test set
- Accuracy: 0.7401347449470644
- F1 score: 0.7041506624562469


#### Perceptrón Multicapa

In [27]:
mlp = MLPClassifier(alpha=1, max_iter=1000)
mlp.fit(train_X_Tfidf, train_Y)

y_train_pred = mlp.predict(train_X_Tfidf)
y_test_pred = mlp.predict(test_X_Tfidf)

# Training set performance
mlp_train_accuracy = accuracy_score(train_Y, y_train_pred) # Calculate Accuracy
mlp_train_f1 = f1_score(train_Y, y_train_pred, average='weighted') # Calculate F1-score

# Test set performance
mlp_test_accuracy = accuracy_score(test_Y, y_test_pred) # Calculate Accuracy
mlp_test_f1 = f1_score(test_Y, y_test_pred, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % mlp_train_accuracy)
print('- F1 score: %s' % mlp_train_f1)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % mlp_test_accuracy)
print('- F1 score: %s' % mlp_test_f1)

Model performance for Training set
- Accuracy: 0.7978547854785478
- F1 score: 0.7698127655902844
----------------------------------
Model performance for Test set
- Accuracy: 0.737247353224254
- F1 score: 0.6744482783075784


#### Stacking model

En esta práctica se han presentado diferentes técnicas para combinar modelos y de esta forma poder obtener mejores resultados de los que se han ido obteniendo hasta el momento.

De entre todas las opciones propuestas en dicha práctica, se ha decidido utilizar la técnica de Generalización apilada. Esta decisión se ha tomado en base a varios artículos/papers en los cuales se comenta que Stacking es una técnica muy utilizada para poder lidiar con los problemas de desbalanceo de clases tal y como ocurre en la segunda tarea: clasificar los tweets en función del nivel de toxicidad (variable `toxicity_level`).

Para realizar Stacking, se han añadido los siguientes modelos:

- SVM
- DecisionTree
- RandomForest
- Multilayer Perceptron

Por otra parte, se ha utilizado Logistic Regression como modelo final.

In [29]:
estimator_list = [
    ('svm_clf',svm_clf),
    ('tree_clf',tree_clf),
    ('rf',rf),
    ('mlp',mlp) ]


stack_model = StackingClassifier(
    estimators=estimator_list, final_estimator=LogisticRegression()
)


stack_model.fit(train_X_Tfidf, train_Y)


y_train_pred = stack_model.predict(train_X_Tfidf)
y_test_pred = stack_model.predict(test_X_Tfidf)

# Training set model performance
stack_model_train_accuracy = accuracy_score(train_Y, y_train_pred) # Calculate Accuracy
stack_model_train_f1 = f1_score(train_Y, y_train_pred, average='weighted') # Calculate F1-score

# Test set model performance
stack_model_test_accuracy = accuracy_score(test_Y, y_test_pred) # Calculate Accuracy
stack_model_test_f1 = f1_score(test_Y, y_test_pred, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % stack_model_train_accuracy)
print('- F1 score: %s' % stack_model_train_f1)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % stack_model_test_accuracy)
print('- F1 score: %s' % stack_model_test_f1)

Model performance for Training set
- Accuracy: 0.9834983498349835
- F1 score: 0.9834281900575487
----------------------------------
Model performance for Test set
- Accuracy: 0.7718960538979788
- F1 score: 0.756802362938718


In [30]:
acc_train_list = {
'svm_rbf': svm_train_accuracy,
'tree_clf': dt_train_accuracy,
'rf': rf_train_accuracy,
'mlp': mlp_train_accuracy,
'stack_model': stack_model_train_accuracy}


f1_train_list = {
'svm_rbf': svm_train_f1,
'tree_clf': dt_train_f1,
'rf': rf_train_f1,
'mlp': mlp_train_f1,
'stack_model': stack_model_train_f1}

In [31]:
acc_test_list = {
'svm_rbf': svm_test_accuracy,
'tree_clf': dt_test_accuracy,
'rf': rf_test_accuracy,
'mlp': mlp_test_accuracy,
'stack_model': stack_model_test_accuracy}


f1_test_list = {
'svm_rbf': svm_test_f1,
'tree_clf': dt_test_f1,
'rf': rf_test_f1,
'mlp': mlp_test_f1,
'stack_model': stack_model_test_f1}

#### Resultados para Train y Test de la variable `Toxicity`

##### Train

In [32]:
acc_df = pd.DataFrame.from_dict(acc_train_list, orient='index', columns=['Accuracy'])
f1_df = pd.DataFrame.from_dict(f1_train_list, orient='index', columns=['F1'])
df = pd.concat([acc_df, f1_df], axis=1)
df

Unnamed: 0,Accuracy,F1
svm_rbf,0.945132,0.943971
tree_clf,0.698432,0.608921
rf,0.969059,0.968692
mlp,0.797855,0.769813
stack_model,0.983498,0.983428


##### Test

In [33]:
acc_df_test = pd.DataFrame.from_dict(acc_test_list, orient='index', columns=['Accuracy'])
f1_df_test = pd.DataFrame.from_dict(f1_test_list, orient='index', columns=['F1'])
df_test = pd.concat([acc_df_test, f1_df_test], axis=1)
df_test

Unnamed: 0,Accuracy,F1
svm_rbf,0.767084,0.738163
tree_clf,0.727623,0.650913
rf,0.740135,0.704151
mlp,0.737247,0.674448
stack_model,0.771896,0.756802


#### Variable `Toxicity_level`

#### Dividir el corpus en conjunto de entrenamiento y test

In [8]:
train_X_, test_X_, train_Y_, test_Y_ = train_test_split(sample_data['comment'], sample_data['toxicity_level'], test_size=0.3)

### Extracción de características

#### Term Frequency - Inverse Document Frequency (TF-IDF)

In [12]:
tfidf_vect_levels = TfidfVectorizer()
tfidf_vect_levels.fit(sample_data['comment'])
train_X_Tfidf_ = tfidf_vect_levels.transform(train_X_)
test_X_Tfidf_ = tfidf_vect_levels.transform(test_X_)

#### Support Vector Machine (SVM)

In [13]:
svm_clf_levels = SVC(C=1.0, kernel='linear', degree=3, gamma='auto')
svm_clf_levels.fit(train_X_Tfidf_,train_Y_)

y_train_pred_levels = svm_clf_levels.predict(train_X_Tfidf_)
y_test_pred_levels = svm_clf_levels.predict(test_X_Tfidf_)

# Training set performance
svm_train_accuracy_levels = accuracy_score(train_Y_, y_train_pred_levels) # Calculate Accuracy
svm_train_f1_levels = f1_score(train_Y_, y_train_pred_levels, average='weighted') # Calculate F1-score

# Test set performance
svm_test_accuracy_levels = accuracy_score(test_Y_, y_test_pred_levels) # Calculate Accuracy
svm_test_f1_levels = f1_score(test_Y_, y_test_pred_levels, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % svm_train_accuracy_levels)
print('- F1 score: %s' % svm_train_f1_levels)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % svm_test_accuracy_levels)
print('- F1 score: %s' % svm_test_f1_levels)

Model performance for Training set
- Accuracy: 0.8679867986798679
- F1 score: 0.8543835055270441
----------------------------------
Model performance for Test set
- Accuracy: 0.6785370548604427
- F1 score: 0.5867964652445087


#### Decision Tree

In [14]:
tree_clf_levels = DecisionTreeClassifier(max_depth=4)
tree_clf_levels.fit(train_X_Tfidf_,train_Y_)

y_train_pred_levels = tree_clf_levels.predict(train_X_Tfidf_)
y_test_pred_levels = tree_clf_levels.predict(test_X_Tfidf_)

# Training set performance
dt_train_accuracy_levels = accuracy_score(train_Y_, y_train_pred_levels) # Calculate Accuracy
dt_train_f1_levels = f1_score(train_Y_, y_train_pred_levels, average='weighted') # Calculate F1-score

# Test set performance
dt_test_accuracy_levels = accuracy_score(test_Y_, y_test_pred_levels) # Calculate Accuracy
dt_test_f1_levels = f1_score(test_Y_, y_test_pred_levels, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % dt_train_accuracy_levels)
print('- F1 score: %s' % dt_train_f1_levels)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % dt_test_accuracy_levels)
print('- F1 score: %s' % dt_test_f1_levels)

Model performance for Training set
- Accuracy: 0.7004950495049505
- F1 score: 0.6082576383482872
----------------------------------
Model performance for Test set
- Accuracy: 0.6631376323387873
- F1 score: 0.5560463832751367


#### Random Forest

In [15]:
rf_levels = RandomForestClassifier(n_estimators=10)
rf_levels.fit(train_X_Tfidf_, train_Y_)

y_train_pred_levels = rf_levels.predict(train_X_Tfidf_)
y_test_pred_levels = rf_levels.predict(test_X_Tfidf_)

# Training set performance
rf_train_accuracy_levels = accuracy_score(train_Y_, y_train_pred_levels) # Calculate Accuracy
rf_train_f1_levels = f1_score(train_Y_, y_train_pred_levels, average='weighted') # Calculate F1-score

# Test set performance
rf_test_accuracy_levels = accuracy_score(test_Y_, y_test_pred_levels) # Calculate Accuracy
rf_test_f1_levels = f1_score(test_Y_, y_test_pred_levels, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % rf_train_accuracy_levels)
print('- F1 score: %s' % rf_train_f1_levels)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % rf_test_accuracy_levels)
print('- F1 score: %s' % rf_test_f1_levels)

Model performance for Training set
- Accuracy: 0.9653465346534653
- F1 score: 0.9648049717650516
----------------------------------
Model performance for Test set
- Accuracy: 0.6785370548604427
- F1 score: 0.5996636125418464


#### Perceptrón Multicapa

In [16]:
mlp_levels = MLPClassifier(alpha=1, max_iter=1000)
mlp_levels.fit(train_X_Tfidf_, train_Y_)

y_train_pred_levels = mlp_levels.predict(train_X_Tfidf_)
y_test_pred_levels = mlp_levels.predict(test_X_Tfidf_)

# Training set performance
mlp_train_accuracy_levels = accuracy_score(train_Y_, y_train_pred_levels) # Calculate Accuracy
mlp_train_f1_levels = f1_score(train_Y_, y_train_pred_levels, average='weighted') # Calculate F1-score

# Test set performance
mlp_test_accuracy_levels = accuracy_score(test_Y_, y_test_pred_levels) # Calculate Accuracy
mlp_test_f1_levels = f1_score(test_Y_, y_test_pred_levels, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % mlp_train_accuracy_levels)
print('- F1 score: %s' % mlp_train_f1_levels)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % mlp_test_accuracy_levels)
print('- F1 score: %s' % mlp_test_f1_levels)

Model performance for Training set
- Accuracy: 0.7887788778877888
- F1 score: 0.7352735959523691
----------------------------------
Model performance for Test set
- Accuracy: 0.6727622714148219
- F1 score: 0.5747157919858865


#### Stacking model

In [17]:
estimator_list_levels = [
    ('svm_clf_levels',svm_clf_levels),
    ('tree_clf_levels',tree_clf_levels),
    ('rf_levels',rf_levels),
    ('mlp_levels',mlp_levels) ]


stack_model_levels = StackingClassifier(
    estimators=estimator_list_levels, final_estimator=LogisticRegression()
)


stack_model_levels.fit(train_X_Tfidf_, train_Y_)


y_train_pred_levels = stack_model_levels.predict(train_X_Tfidf_)
y_test_pred_levels = stack_model_levels.predict(test_X_Tfidf_)

# Training set model performance
stack_model_train_accuracy_levels = accuracy_score(train_Y_, y_train_pred_levels) # Calculate Accuracy
stack_model_train_f1_levels = f1_score(train_Y_, y_train_pred_levels, average='weighted') # Calculate F1-score

# Test set model performance
stack_model_test_accuracy_levels = accuracy_score(test_Y_, y_test_pred_levels) # Calculate Accuracy
stack_model_test_f1_levels = f1_score(test_Y_, y_test_pred_levels, average='weighted') # Calculate F1-score

print('Model performance for Training set')
print('- Accuracy: %s' % stack_model_train_accuracy_levels)
print('- F1 score: %s' % stack_model_train_f1_levels)
print('----------------------------------')
print('Model performance for Test set')
print('- Accuracy: %s' % stack_model_test_accuracy_levels)
print('- F1 score: %s' % stack_model_test_f1_levels)

Model performance for Training set
- Accuracy: 0.9121287128712872
- F1 score: 0.902785330185105
----------------------------------
Model performance for Test set
- Accuracy: 0.6948989412897016
- F1 score: 0.6247262037201117


In [18]:
acc_train_list_levels = {
'svm_rbf': svm_train_accuracy_levels,
'tree_clf': dt_train_accuracy_levels,
'rf': rf_train_accuracy_levels,
'mlp': mlp_train_accuracy_levels,
'stack_model': stack_model_train_accuracy_levels}


f1_train_list_levels = {
'svm_rbf': svm_train_f1_levels,
'tree_clf': dt_train_f1_levels,
'rf': rf_train_f1_levels,
'mlp': mlp_train_f1_levels,
'stack_model': stack_model_train_f1_levels}

In [19]:
acc_test_list_levels = {
'svm_rbf': svm_test_accuracy_levels,
'tree_clf': dt_test_accuracy_levels,
'rf': rf_test_accuracy_levels,
'mlp': mlp_test_accuracy_levels,
'stack_model': stack_model_test_accuracy_levels}


f1_test_list_levels = {
'svm_rbf': svm_test_f1_levels,
'tree_clf': dt_test_f1_levels,
'rf': rf_test_f1_levels,
'mlp': mlp_test_f1_levels,
'stack_model': stack_model_test_f1_levels}

#### Resultados para Train y Test de la variable `Toxicity_level`

##### Train

In [20]:
acc_df_levels = pd.DataFrame.from_dict(acc_train_list_levels, orient='index', columns=['Accuracy'])
f1_df_levels = pd.DataFrame.from_dict(f1_train_list_levels, orient='index', columns=['F1'])
df_levels = pd.concat([acc_df_levels, f1_df_levels], axis=1)
df_levels

Unnamed: 0,Accuracy,F1
svm_rbf,0.867987,0.854384
tree_clf,0.700495,0.608258
rf,0.965347,0.964805
mlp,0.788779,0.735274
stack_model,0.912129,0.902785


##### Test

In [21]:
acc_df_test_levels = pd.DataFrame.from_dict(acc_test_list_levels, orient='index', columns=['Accuracy'])
f1_df_test_levels = pd.DataFrame.from_dict(f1_test_list_levels, orient='index', columns=['F1'])
df_test_levels = pd.concat([acc_df_test_levels, f1_df_test_levels], axis=1)
df_test_levels

Unnamed: 0,Accuracy,F1
svm_rbf,0.678537,0.586796
tree_clf,0.663138,0.556046
rf,0.678537,0.599664
mlp,0.672762,0.574716
stack_model,0.694899,0.624726
