## **Métricas de Avaliação (Performance)**

**Carregando Dataset**

In [18]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd

df = pd.read_csv("other_viruses_covid.csv")

In [20]:
df.head(10)

Unnamed: 0,nameseq,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,label
0,NC_045512.2,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
1,MT483553.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
2,MT483554.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
3,MT483555.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
4,MT483556.1,0.199618,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
5,MT483557.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
6,MT483563.1,0.199624,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
7,MT483702.1,0.199622,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
8,MT477835.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1
9,MT477836.1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,1


In [21]:
X = df[df.columns[1:(len(df.columns) - 1)]]
y = df['label']

X

Unnamed: 0,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12
0,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
1,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
2,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
3,0.199619,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
4,0.199618,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
...,...,...,...,...,...,...,...,...,...,...,...,...
24810,0.199478,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
24811,0.199494,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
24812,0.199478,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2
24813,0.199471,0.199999,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2


In [22]:
from collections import Counter

print('Dataset shape %s' % Counter(y))

Dataset shape Counter({0: 22442, 1: 2373})


**Criando o modelo**

In [23]:
from sklearn.model_selection import train_test_split
from sklearn import tree

train, test, train_labels, test_labels = train_test_split(X,
                                                          y,
                                                          test_size=0.3,
                                                          random_state=12,
                                                          stratify=y)

model = tree.DecisionTreeClassifier()
model.fit(train, train_labels)
preds = model.predict(test)
pred_prob = model.predict_proba(test)[:, 1]

## **Avaliando**

In [24]:
from sklearn.metrics import confusion_matrix

pd.crosstab(test_labels, preds, rownames=["REAL"], colnames=["PREDITO"], margins=True)

PREDITO,0,1,All
REAL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,6684,49,6733
1,34,678,712
All,6718,727,7445


In [25]:
from sklearn.metrics import classification_report

print(classification_report(test_labels, preds))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6733
           1       0.93      0.95      0.94       712

    accuracy                           0.99      7445
   macro avg       0.96      0.97      0.97      7445
weighted avg       0.99      0.99      0.99      7445



**Accuracy classification score.**

In [26]:
from sklearn.metrics import accuracy_score

accuracy_score(test_labels, preds)

0.9888515782404298

**Compute the balanced accuracy**

In [27]:
from sklearn.metrics import balanced_accuracy_score

balanced_accuracy_score(test_labels, preds)

0.9724848015059151

**Compute the F1 score, also known as balanced F-score or F-measure**

The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:



![](https://www.gstatic.com/education/formulas2/355397047/en/f1_score.svg)

In [28]:
from sklearn.metrics import f1_score

f1_score(test_labels, preds, pos_label=1)

0.9423210562890897

**Calcule a precisão**

A precisão é a razão tp / (tp + fp) em que tp é o número de verdadeiros positivos e fp o número de falsos positivos. A precisão é intuitivamente a capacidade do classificador de não rotular como positiva uma amostra negativa.

O melhor valor é 1 e o pior valor é 0.

In [29]:
from sklearn.metrics import precision_score

precision_score(test_labels, preds, pos_label=1)

0.9325997248968363

**Calcular o recall**

O recall é a razão tp / (tp + fn) em que tp é o número de verdadeiros positivos e fn o número de falsos negativos. O recall é intuitivamente a capacidade do classificador de encontrar todas as amostras positivas.

O melhor valor é 1 e o pior valor é 0.

In [30]:
from sklearn.metrics import recall_score

recall_score(test_labels, preds, pos_label=1)

0.952247191011236

**Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.**

Note: this implementation can be used with binary, multiclass and multilabel classification, but some restrictions apply (see Parameters).

In [31]:
from sklearn.metrics import roc_auc_score

roc_auc_score(test_labels, pred_prob)

0.9802068088252227

**Kappa de Cohen: estatística que mede a concordância entre anotadores.**

Essa função calcula o kappa de Cohen [1], uma pontuação que expressa o nível de concordância entre dois anotadores em um problema de classificação.

In [32]:
from sklearn.metrics import cohen_kappa_score

cohen_kappa_score(test_labels, preds)

0.9361512535457606

**O MCC é essencialmente um valor do coeficiente de correlação entre -1 e +1. Um coeficiente de +1 representa uma previsão perfeita, 0 uma previsão aleatória média e -1 uma previsão inversa.**

In [None]:
from sklearn.metrics import matthews_corrcoef

matthews_corrcoef(test_labels, preds)

0.936213582592172

**Calcule a média geométrica.**

A média geométrica (média G) é a raiz do produto da sensibilidade de classe. Essa medida tenta maximizar a precisão de cada uma das classes, mantendo essas precisões equilibradas.

In [33]:
from imblearn.metrics import geometric_mean_score

geometric_mean_score(test_labels, preds)

0.972274204266196