<a href="https://colab.research.google.com/github/IagoConrado/colab-notebooks/blob/master/Redes_Neurais.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Previsão de insuficiência cardíaca**


*   Link do dataset: [Heart Failure Prediction](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data).
*   Os dados do dataset descrevem características dos pacientes e se eles faleceram durante o acompanhamento.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **Imports**

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import (
    accuracy_score, 
    precision_score, 
    recall_score, 
    f1_score
)

## **Leitura dos dados**

In [3]:
dados = pd.read_csv('/content/drive/My Drive/Colab Notebooks/datasets/heart_failure.csv')
dados.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## **Limpeza e organização dos dados**

In [4]:
#verificar se existem valores NAN, ? ou dados faltantes
dados = dados.dropna()

In [5]:
#excluir a coluna 'time' já que só é possível ter a informação dela apos receber informações na coluna 'DEATH_EVENT'
dados = dados.drop(columns=['time'])
dados.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,1


In [6]:
#re-escala dos dados usando maximo e minimo
dados = (dados - dados.min())/(dados.max()-dados.min())

In [7]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 0 to 298
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   age                       299 non-null    float64
 1   anaemia                   299 non-null    float64
 2   creatinine_phosphokinase  299 non-null    float64
 3   diabetes                  299 non-null    float64
 4   ejection_fraction         299 non-null    float64
 5   high_blood_pressure       299 non-null    float64
 6   platelets                 299 non-null    float64
 7   serum_creatinine          299 non-null    float64
 8   serum_sodium              299 non-null    float64
 9   sex                       299 non-null    float64
 10  smoking                   299 non-null    float64
 11  DEATH_EVENT               299 non-null    float64
dtypes: float64(12)
memory usage: 30.4 KB


## **Organizando dados para modelagem**

In [8]:
x = dados.iloc[:,:-1]
x.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking
0,0.636364,0.0,0.071319,0.0,0.090909,1.0,0.290823,0.157303,0.485714,1.0,0.0
1,0.272727,0.0,1.0,0.0,0.363636,0.0,0.288833,0.067416,0.657143,1.0,0.0
2,0.454545,0.0,0.015693,0.0,0.090909,0.0,0.16596,0.089888,0.457143,1.0,1.0
3,0.181818,1.0,0.011227,0.0,0.090909,0.0,0.224148,0.157303,0.685714,1.0,0.0
4,0.454545,1.0,0.017479,1.0,0.090909,0.0,0.365984,0.247191,0.085714,0.0,0.0


In [9]:
y = dados.DEATH_EVENT
y.head()

0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
Name: DEATH_EVENT, dtype: float64

## **Otimização de Parâmetros**

In [10]:
param_grid = [{
    'hidden_layer_sizes': [(10), (50), (100), (10, 50), (50, 100), (50, 100, 150)],
    'activation': ['identity', 'logistic', 'tanh', 'relu'],
    'solver': ['lbfgs', 'sgd', 'adam'],
    'max_iter': [500, 1000, 2000]
}]

O método de busca que será utilizado é o **Randomized Search**, que realiza uma escolha aleatória dos parâmetros, devido a aleatoriedade esse método tem a chance de desconsiderar uma configuração de parâmetros que poderia ser a melhor, porém ele possui o menor tempo de execução.

In [11]:
mlp = RandomizedSearchCV(MLPClassifier(),param_grid,cv=5,scoring='accuracy')

In [12]:
mlp.fit(x, y)



RandomizedSearchCV(cv=5, error_score=nan,
                   estimator=MLPClassifier(activation='relu', alpha=0.0001,
                                           batch_size='auto', beta_1=0.9,
                                           beta_2=0.999, early_stopping=False,
                                           epsilon=1e-08,
                                           hidden_layer_sizes=(100,),
                                           learning_rate='constant',
                                           learning_rate_init=0.001,
                                           max_fun=15000, max_iter=200,
                                           momentum=0.9, n_iter_no_change=10,
                                           nesterovs_momentum=True, power_t=0.5,
                                           random...
                                           verbose=False, warm_start=False),
                   iid='deprecated', n_iter=10, n_jobs=None,
                   param_distributions=[{

In [13]:
print(mlp.best_params_)

{'solver': 'lbfgs', 'max_iter': 2000, 'hidden_layer_sizes': (50, 100), 'activation': 'identity'}


In [14]:

print(round(mlp.best_score_,3))

0.739


### **Considerações e resultado do aprendizado**


*   O resultado obtido da acurácia foi 0.739, aproximadamente 74%, um resultado possivelmente bom/satisfatório.
*   A acurácia do modelo apresentou um número aproximado da árvore de decisão e da random forest geradas no trabalho anterior.
*   É válido ressaltar novamente que foi utilizado o método **Randomized Search** e que devido a aleatoriedade, alguns parâmetros que poderiam ser melhores não foram selecionados.

