## Modelo KNN

Nesse arquivo vamos trabalhar o modelo de KNN, se baseando nos dados vistos na pasta 'data'

---

### Importando bibliotecas

In [1]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import sklearn.metrics as metrics

### Importando base de dados

In [2]:
df = pd.read_csv("./data/heart_attack_prediction_dataset.csv", sep=',')
df['systolic'] = df['Blood Pressure'].str.split('/').str[0]
df['diastolic'] = df['Blood Pressure'].str.split('/').str[1]
df = df.drop(columns=['Patient ID', 'Sex', 'Heart Rate'
                      , 'Previous Heart Problems','Medication Use', 
                      'Sedentary Hours Per Day', 'Blood Pressure', 
                      'Country', 'Continent', 'Hemisphere',
                      'Income', 'BMI', 'Triglycerides'])
Columns = ['Family History', 'Smoking', 'Obesity', 'Alcohol Consumption', 'Heart Attack Risk']
df[Columns] = df[Columns].astype(bool)
df['systolic'] = pd.to_numeric(df['systolic']).astype('int64')
df['diastolic'] = pd.to_numeric(df['diastolic']).astype('int64')
df

Unnamed: 0,Age,Cholesterol,Diabetes,Family History,Smoking,Obesity,Alcohol Consumption,Exercise Hours Per Week,Diet,Stress Level,Physical Activity Days Per Week,Sleep Hours Per Day,Heart Attack Risk,systolic,diastolic
0,67,208,0,False,True,False,False,4.168189,1,9,0,6,False,158,88
1,21,389,1,True,True,True,True,1.813242,0,1,1,7,False,165,93
2,21,324,1,False,False,False,False,2.078353,2,9,4,4,False,174,99
3,84,383,1,True,True,False,True,9.828130,1,9,3,4,False,163,100
4,66,318,1,True,True,True,False,5.804299,0,6,1,5,False,91,88
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8758,60,121,1,True,True,False,True,7.917342,2,8,7,7,False,94,76
8759,28,120,1,False,False,True,False,16.558426,2,8,4,9,False,157,102
8760,47,250,0,True,True,True,True,3.148438,1,5,4,4,True,161,75
8761,36,178,1,False,True,False,False,3.789950,0,5,2,8,False,119,67


### Preparando dados para o modelo

In [3]:
x = df.drop(columns=['Heart Attack Risk'])
y = df['Heart Attack Risk']

x.head
y.head

<bound method NDFrame.head of 0       False
1       False
2       False
3       False
4       False
        ...  
8758    False
8759    False
8760     True
8761    False
8762     True
Name: Heart Attack Risk, Length: 8763, dtype: bool>

### Separação de dados de treino e teste

In [4]:
X_train, X_test, y_train, y_test = train_test_split(x, y)

### Criando modelo e realizando o treinamento

In [8]:
knn = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)

In [9]:
accur = str(round(metrics.accuracy_score(y_test, predictions) * 100, 2))+"%"
print(f"Acurácia: {accur}")
precisao = str(round(metrics.precision_score(y_test, predictions) * 100, 2))+"%"
print(f'Precisão: {precisao}')
revocacao = str(round(metrics.recall_score(y_test, predictions) * 100, 2))+"%"
print(f'Revocação: {revocacao}')
f1 = str(round(metrics.f1_score(y_test, predictions) * 100, 2))+"%"
print(f'F1-Score: {f1}')


Acurácia: 55.41%
Precisão: 35.12%
Revocação: 28.3%
F1-Score: 31.34%


In [33]:
y_pred = knn.predict(X_test)
print('\n Matriz de confusão\n', pd.crosstab(y_test, y_pred, rownames=['Real'], colnames=['Predito'], margins = True, margins_name="Todos"))


 Matriz de confusão
 Predito  False  True  Todos
Real                       
False     1003   376   1379
True       569   243    812
Todos     1572   619   2191
