#**Notebook 09**
- **Professor:** Iális Cavalcante
- **Monitor:** Iago Magalhães
- **Disciplina:** Ciência de dados
- **Curso:** Engenharia da Computação
- **Descrição:**
No notebook 9 iremos aprender sobre o algoritmo Naive Bayes Gaussian.


##Importações de bibliotecas

In [1]:
!pip -q install plotly
!pip -q install yellowbrick

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from yellowbrick.classifier import ConfusionMatrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

##Leitura dos dados

In [15]:
base_credit = pd.read_csv('credit_data.csv')

In [None]:
base_credit

##Análise de dados

In [None]:
base_credit.head(10)

In [None]:
base_credit.tail(8)

In [None]:
base_credit.describe()

In [None]:
base_credit[base_credit['income'] >= 69995.685578]

In [None]:
base_credit[base_credit['loan'] <= 1.377630]

In [None]:
np.unique(base_credit['default'], return_counts=True)

In [None]:
sns.countplot(x = base_credit['default']);

In [None]:
plt.hist(x = base_credit['age']);

In [None]:
plt.hist(x = base_credit['income']);

In [None]:
plt.hist(x = base_credit['loan']);

In [None]:
grafico = px.scatter_matrix(base_credit, dimensions=['age', 'income', 'loan'], color = 'default')
grafico.show()

###Tratamento de valores inconsistentes

In [None]:
base_credit.loc[base_credit['age'] < 0]

In [None]:
base_credit[base_credit['age'] < 0]

In [None]:
# Apagar a coluna inteira (de todos os registros da base de dados)
base_credit2 = base_credit.drop('age', axis = 1)
base_credit2

In [None]:
base_credit.index

In [None]:
base_credit[base_credit['age'] < 0].index

In [None]:
# Apagar somente os registros com valores inconsistentes
base_credit3 = base_credit.drop(base_credit[base_credit['age'] < 0].index)
base_credit3

In [None]:
base_credit3.loc[base_credit3['age'] < 0]

In [None]:
base_credit.mean()

In [None]:
base_credit['age'].mean()

In [None]:
base_credit['age'][base_credit['age'] > 0].mean()

In [38]:
base_credit.loc[base_credit['age'] < 0, 'age'] = 40.92

In [None]:
base_credit.loc[base_credit['age'] < 0]

In [None]:
base_credit.head(27)

###Tratamento de valores faltantes

In [None]:
base_credit.isnull()

In [None]:
base_credit.isnull().sum()

In [None]:
base_credit.loc[pd.isnull(base_credit['age'])]

In [44]:
base_credit['age'].fillna(base_credit['age'].mean(), inplace = True)

In [None]:
base_credit.loc[pd.isnull(base_credit['age'])]

In [None]:
base_credit.loc[(base_credit['clientid'] == 29) | (base_credit['clientid'] == 31) | (base_credit['clientid'] == 32)]

In [None]:
base_credit.loc[base_credit['clientid'].isin([29, 31, 32])]

###Divisão entre previsores e classe

In [None]:
type(base_credit)

In [49]:
X_credit = base_credit.iloc[:, 1:4].values

In [None]:
X_credit

In [None]:
type(X_credit)

In [52]:
y_credit = base_credit.iloc[:, 4].values

In [None]:
y_credit

In [None]:
type(y_credit)

###Escalonamento dos valores

In [None]:
X_credit

In [None]:
X_credit[:,0].min(), X_credit[:,1].min(), X_credit[:,2].min()

In [None]:
X_credit[:,0].max(), X_credit[:,1].max(), X_credit[:,2].max()

In [58]:
scaler_credit = StandardScaler()
X_credit = scaler_credit.fit_transform(X_credit)

In [None]:
X_credit[:,0].min(), X_credit[:,1].min(), X_credit[:,2].min()

In [None]:
X_credit[:,0].max(), X_credit[:,1].max(), X_credit[:,2].max()

In [None]:
X_credit

##Divisão das bases em treinamento e teste

In [62]:
X_credit_treinamento, X_credit_teste, y_credit_treinamento, y_credit_teste = train_test_split(X_credit, y_credit, test_size = 0.25, random_state = 0)

In [None]:
X_credit_treinamento.shape

In [None]:
y_credit_treinamento.shape

In [None]:
X_credit_teste.shape, y_credit_teste.shape

##Algoritmo Naive Bayes - Gaussian

In [None]:
naive_credit_data = GaussianNB()
naive_credit_data.fit(X_credit_treinamento, y_credit_treinamento)

In [67]:
previsoes = naive_credit_data.predict(X_credit_teste)

In [None]:
previsoes

In [None]:
y_credit_teste

In [None]:
accuracy_score(y_credit_teste, previsoes)

In [None]:
confusion_matrix(y_credit_teste, previsoes)

In [None]:
cm = ConfusionMatrix(naive_credit_data)
cm.fit(X_credit_treinamento, y_credit_treinamento)
cm.score(X_credit_teste, y_credit_teste)

In [None]:
print(classification_report(y_credit_teste, previsoes))

##Atividades de casa
- Utilize a mesma base de dados para outros modelos de algoritmo Naive Bayes.

##Referências
- [O algoritmo Naive Bayes — descrição e implementação em Python](https://joaoclaudionc.medium.com/o-algoritmo-naive-bayes-descri%C3%A7%C3%A3o-e-implementa%C3%A7%C3%A3o-em-python-35757ade6b36)
- [Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes)
- [Credit Risk Dataset](https://www.kaggle.com/datasets/laotse/credit-risk-dataset)
- [Machine Learning e Data Science com Python de A a Z
](https://www.udemy.com/share/101sO83@3JeaCsoVXbtLR3c19vqxGXpQtlRYAXiHwCeouw_gbHJJjG_yQKxj_n81udLVMCgf/)