# Aula 7 - Treinamento Machine Learning - FIAP + Alura

## Artificial Intelligence Tools & Examples - Machine Learning - Classificação via Naïve Bayes - Predição de Doenças Cardíacas Aprendizado de Máquina Naive Bayes

Ref. https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data

Prof. Dr. Ahirton Lopes (https://github.com/ahirtonlopes)

### Bibliotecas usadas

1. NumPy (Python numérico)
2. Pandas
3. Sckit learn


In [37]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler

### Conteúdo:

4. Separando recursos e rótulos
5. Padronização de dados
6. Dividindo o conjunto de dados em conjunto de treinamento e conjunto de teste
7. Construir modelo Naive Bayes com hiperparâmetro padrão

### Lendo Dados

In [38]:
# 1. Carregando a base de dados
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
column_names = [
    "age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang",
    "oldpeak", "slope", "ca", "thal", "target"
]

data = pd.read_csv(url, names=column_names, na_values="?", header=None)

In [39]:
data.sample(5)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
247,47.0,1.0,4.0,110.0,275.0,0.0,2.0,118.0,1.0,1.0,2.0,1.0,3.0,1
299,68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4,2.0,2.0,7.0,2
31,60.0,1.0,4.0,117.0,230.0,1.0,0.0,160.0,1.0,1.4,1.0,2.0,7.0,2
204,43.0,1.0,4.0,110.0,211.0,0.0,0.0,161.0,0.0,0.0,1.0,0.0,7.0,0
189,69.0,1.0,3.0,140.0,254.0,0.0,2.0,146.0,0.0,2.0,2.0,3.0,7.0,2


In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    float64
 1   sex       303 non-null    float64
 2   cp        303 non-null    float64
 3   trestbps  303 non-null    float64
 4   chol      303 non-null    float64
 5   fbs       303 non-null    float64
 6   restecg   303 non-null    float64
 7   thalach   303 non-null    float64
 8   exang     303 non-null    float64
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    float64
 11  ca        299 non-null    float64
 12  thal      301 non-null    float64
 13  target    303 non-null    int64  
dtypes: float64(13), int64(1)
memory usage: 33.3 KB


# 2. Pré-processamento

In [41]:
# Removendo valores ausentes

data.dropna(inplace=True)

In [42]:
# Separando as features (X) e o target (y)
X = data.drop("target", axis=1)
y = data["target"]

In [43]:
# Transformando o target em binário (1 para doença, 0 para ausência)
y = y.apply(lambda x: 1 if x > 0 else 0)

In [44]:
# Normalizando as features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Divisão em treino e teste

In [45]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# 4. Treinando o modelo Naive Bayes

In [46]:
model = GaussianNB()
model.fit(X_train, y_train)

# 5. Fazendo previsões

In [47]:
y_pred = model.predict(X_test)

# 6. Avaliação do modelo

In [48]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Acurácia: {accuracy:.2f}")
print("\nRelatório de Classificação:\n", classification_report(y_test, y_pred))
print("\nMatriz de Confusão:\n", confusion_matrix(y_test, y_pred))

Acurácia: 0.92

Relatório de Classificação:
               precision    recall  f1-score   support

           0       0.90      0.97      0.93        36
           1       0.95      0.83      0.89        24

    accuracy                           0.92        60
   macro avg       0.92      0.90      0.91        60
weighted avg       0.92      0.92      0.92        60


Matriz de Confusão:
 [[35  1]
 [ 4 20]]
