Esse notebook foi baseado nos seguintes sites:

[ScikitLearn Classification Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics)

[Explicicação](https://medium.com/greyatom/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b)

[Dataset](https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival)

**Link do vídeo:**  https://www.youtube.com/watch?v=MViaDNkSP88&feature=youtu.be

Attribute Information:
   1. Age of patient at time of operation (numerical)
   2. Patient's year of operation (year - 1900, numerical)
   3. Number of positive axillary nodes detected (numerical)
   4. Survival status (class attribute)   
     1 = the patient survived 5 years or longer     
     2 = the patient died within 5 year

In [13]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics as mt
import numpy as np

In [3]:
colunas = ["idade",
           "ano_operacao",
           "nos_positivos",
           "y"]

In [4]:
data = pd.read_csv('haberman.data', names=colunas)

In [5]:
data.head()

Unnamed: 0,idade,ano_operacao,nos_positivos,y
0,30,64,1,1
1,30,62,3,1
2,30,65,0,1
3,31,59,2,1
4,31,65,4,1


In [6]:
y = data['y']
X = data.drop('y', axis=1).values

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [8]:
gnb = GaussianNB()

In [9]:
gnb.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [10]:
y_pred = gnb.predict(X_test)

In [11]:
mt.confusion_matrix(y_test, y_pred)

array([[40,  6],
       [15,  1]])

In [12]:
y_test.value_counts()

1    46
2    16
Name: y, dtype: int64

In [15]:
np.unique(y_pred, return_counts=True)

(array([1, 2]), array([55,  7]))

In [25]:
# recall no braço
r = 40/46
print(r)

0.8695652173913043


In [21]:
mt.recall_score(y_test, y_pred)

0.8695652173913043

In [24]:
# precision no braço
p = 40/55
print(p)

0.7272727272727273


In [20]:
mt.precision_score(y_test, y_pred)

0.7272727272727273

In [22]:
# accuracy no braço
41/62

0.6612903225806451

In [23]:
mt.accuracy_score(y_test, y_pred)

0.6612903225806451

In [26]:
# f-score no braço
2*p*r/(r+p)

0.792079207920792

In [27]:
mt.f1_score(y_test, y_pred)

0.792079207920792

In [29]:
mt.fbeta_score(y_test, y_pred, 1)

0.792079207920792

In [30]:
mt.fbeta_score(y_test, y_pred, 2)

0.8368200836820084

In [31]:
mt.fbeta_score(y_test, y_pred, 0.5)

0.7518796992481205