<a href="https://colab.research.google.com/github/LuixCabral/Stroke-Prediction/blob/main/Stroke_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Um AVC (Acidente Vascular Cerebral), ou derrame, ocorre quando o fluxo sanguíneo para o cérebro é interrompido ou reduzido, causando danos às células cerebrais. É uma emergência médica que requer tratamento imediato para minimizar danos e complicações. O AVC pode afetar a função motora, a fala, a visão e a cognição, dependendo da área do cérebro afetada.
Tipos de AVC:


* Isquêmico:
Ocorre quando um vaso sanguíneo no cérebro é bloqueado, impedindo o fluxo de sangue e oxigênio.


* Hemorrágico:
Ocorre quando um vaso sanguíneo no cérebro rompe, causando sangramento no cérebro

In [1]:
!pip install pandas
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn



In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [3]:
data = pd.read_csv('/content/healthcare-dataset-stroke-data (1).csv')

data.head(10)

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,work_type,Residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,Male,67.0,0,1,Yes,Private,Urban,228.69,36.6,formerly smoked,1
1,51676,Female,61.0,0,0,Yes,Self-employed,Rural,202.21,,never smoked,1
2,31112,Male,80.0,0,1,Yes,Private,Rural,105.92,32.5,never smoked,1
3,60182,Female,49.0,0,0,Yes,Private,Urban,171.23,34.4,smokes,1
4,1665,Female,79.0,1,0,Yes,Self-employed,Rural,174.12,24.0,never smoked,1
5,56669,Male,81.0,0,0,Yes,Private,Urban,186.21,29.0,formerly smoked,1
6,53882,Male,74.0,1,1,Yes,Private,Rural,70.09,27.4,never smoked,1
7,10434,Female,69.0,0,0,No,Private,Urban,94.39,22.8,never smoked,1
8,27419,Female,59.0,0,0,Yes,Private,Rural,76.15,,Unknown,1
9,60491,Female,78.0,0,0,Yes,Private,Urban,58.57,24.2,Unknown,1


In [4]:
data.isnull().sum()

Unnamed: 0,0
id,0
gender,0
age,0
hypertension,0
heart_disease,0
ever_married,0
work_type,0
Residence_type,0
avg_glucose_level,0
bmi,201


-- **MANIPULAÇÃO** DOS DADOS

In [5]:
datacat = data.dtypes[data.dtypes == 'object'].index
datacat

Index(['gender', 'ever_married', 'work_type', 'Residence_type',
       'smoking_status'],
      dtype='object')

In [6]:
datanum = data.dtypes[data.dtypes != 'object'].index
datanum

Index(['id', 'age', 'hypertension', 'heart_disease', 'avg_glucose_level',
       'bmi', 'stroke'],
      dtype='object')

In [7]:
data['bmi'] = data['bmi'].fillna(data['bmi'].mean())
data.isnull().sum()

Unnamed: 0,0
id,0
gender,0
age,0
hypertension,0
heart_disease,0
ever_married,0
work_type,0
Residence_type,0
avg_glucose_level,0
bmi,0


In [8]:
data[datacat] = data[datacat].astype('category')
data.dtypes


Unnamed: 0,0
id,int64
gender,category
age,float64
hypertension,int64
heart_disease,int64
ever_married,category
work_type,category
Residence_type,category
avg_glucose_level,float64
bmi,float64


In [9]:
data['gender'] = data['gender'].cat.codes
data['ever_married'] = data['ever_married'].cat.codes
data['Residence_type'] = data['Residence_type'].cat.codes


In [10]:
data = pd.get_dummies(data, columns=['work_type', 'smoking_status'])


In [11]:
data

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,Residence_type,avg_glucose_level,bmi,stroke,work_type_Govt_job,work_type_Never_worked,work_type_Private,work_type_Self-employed,work_type_children,smoking_status_Unknown,smoking_status_formerly smoked,smoking_status_never smoked,smoking_status_smokes
0,9046,1,67.0,0,1,1,1,228.69,36.600000,1,False,False,True,False,False,False,True,False,False
1,51676,0,61.0,0,0,1,0,202.21,28.893237,1,False,False,False,True,False,False,False,True,False
2,31112,1,80.0,0,1,1,0,105.92,32.500000,1,False,False,True,False,False,False,False,True,False
3,60182,0,49.0,0,0,1,1,171.23,34.400000,1,False,False,True,False,False,False,False,False,True
4,1665,0,79.0,1,0,1,0,174.12,24.000000,1,False,False,False,True,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5105,18234,0,80.0,1,0,1,1,83.75,28.893237,0,False,False,True,False,False,False,False,True,False
5106,44873,0,81.0,0,0,1,1,125.20,40.000000,0,False,False,False,True,False,False,False,True,False
5107,19723,0,35.0,0,0,1,0,82.99,30.600000,0,False,False,False,True,False,False,False,True,False
5108,37544,1,51.0,0,0,1,0,166.29,25.600000,0,False,False,True,False,False,False,True,False,False


In [None]:
data.drop('stroke', axis=1)
x = data.sample(frac=0.20 , random_state = 12)
y = data['stroke'].sample(frac=0.20 , random_state=10)

xTrain , xTest, yTrain, yTest = train_test_split(x, y, test_size=0.2, random_state=42)
model = RandomForestClassifier()
model.fit(xTrain, yTrain)

In [None]:
from sklearn.metrics import accuracy_score
yPred = model.predict(xTest)

accuracy = accuracy_score(yTest, yPred)

print(f'Accuracy: {accuracy:.4f}')

Accuracy: 0.9561


In [None]:
from sklearn.metrics import f1_score

f1 = f1_score(yTest, yPred)

print(f'F1 Score: {f1:.4f}')

F1 Score: 0.0000


## Aplicar uma técnica de balanceamento

### Subtask:
Utilizar uma técnica como SMOTE (para oversampling da classe minoritária) ou NearMiss (para undersampling da classe majoritária).


**Reasoning**:
Import necessary libraries and apply SMOTE to balance the dataset.



In [None]:
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import NearMiss

x = data.drop('stroke', axis=1)
y = data['stroke']

smote = SMOTE(random_state=42)
x_resampled, y_resampled = smote.fit_resample(x, y)

print("Distribuição da classe após SMOTE:")
print(y_resampled.value_counts())

Distribuição da classe após SMOTE:
stroke
1    4861
0    4861
Name: count, dtype: int64


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

xTrain_resampled, xTest_resampled, yTrain_resampled, yTest_resampled = train_test_split(x_resampled, y_resampled, test_size=0.2, random_state=42)

model_resampled = RandomForestClassifier()
model_resampled.fit(xTrain_resampled, yTrain_resampled)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

xTest_processed = xTest.drop('stroke', axis=1)
yPred_resampled = model_resampled.predict(xTest_processed)

accuracy_resampled = accuracy_score(yTest, yPred_resampled)
precision_resampled = precision_score(yTest, yPred_resampled)
recall_resampled = recall_score(yTest, yPred_resampled)
f1_resampled = f1_score(yTest, yPred_resampled)

print(f'Accuracy (Resampled Model on Original Test Set): {accuracy_resampled:.4f}')
print(f'Precision (Resampled Model on Original Test Set): {precision_resampled:.4f}')
print(f'Recall (Resampled Model on Original Test Set): {recall_resampled:.4f}')
print(f'F1 Score (Resampled Model on Original Test Set): {f1_resampled:.4f}')

Accuracy (Resampled Model on Original Test Set): 0.9317
Precision (Resampled Model on Original Test Set): 0.1429
Recall (Resampled Model on Original Test Set): 0.1111
F1 Score (Resampled Model on Original Test Set): 0.1250
