This repository contains a Stroke Prediction project implemented in Python using machine learning techniques. The goal of this project is to predict the likelihood of a person having a stroke based on various demographic, lifestyle, and medical factors.
Stroke is a medical condition that occurs when the blood supply to the brain is interrupted or reduced, resulting in damage to brain cells. Early detection and prediction of stroke risk can help healthcare professionals take preventive measures and provide appropriate treatment.
This project aims to build a machine learning model that can predict the probability of stroke based on various attributes such as age, gender, hypertension, heart disease, smoking status, etc. The model is trained using a dataset consisting of these attributes and corresponding stroke labels.
The dataset used for this project contains the following features:
id
: The unique identifier for each individual.gender
: The gender of the individual (Male or Female).age
: The age of the individual in years.hypertension
: Indicates whether the individual has hypertension (0 for No, 1 for Yes).heart_disease
: Indicates whether the individual has a heart disease (0 for No, 1 for Yes).ever_married
: Indicates whether the individual is ever married (No or Yes).work_type
: The type of work the individual is engaged in (children, government, private, self-employed, or never worked).Residence_type
: The type of residence of the individual (Rural or Urban).avg_glucose_level
: The average glucose level in the individual's blood.bmi
: The body mass index (BMI) of the individual.smoking_status
: The smoking status of the individual (formerly smoked, never smoked, or smokes).stroke
: Indicates whether the individual had a stroke (0 for No, 1 for Yes).
The dataset used in this project is included in this repository.
-
Clone this repository to your local machine using the following command:
git clone https://github.com/Mo-Shakib/Stroke-Prediction.git
-
Change to the project directory:
cd Stroke-Prediction
-
Install the required dependencies. It is recommended to use a virtual environment:
pip install -r requirements.txt
The main script for running the stroke prediction model is predict_stroke.py
. You can use this script to predict stroke probabilities for new data points. Here's an example of how to use it:
# Make predictions for new data points
age = float(input("Enter age: "))
hypertension = int(input("Enter hypertension [0 for 'NO', 1 for 'YES']: "))
heart_disease = int(input("Enter heart disease [0 for 'NO', 1 for 'YES']: "))
ever_married = int(input("Enter marital status [0 for 'NO', 1 for 'YES']: "))
avg_glucose_level = float(input("Enter average glucose level: "))
bmi = float(input("Enter BMI: "))
gender = input("Enter gender (female, male, or other): ")
work_type = input("Enter work type (government, private, self-employed, children, or never worked): ")
residence_type = input("Enter residence type (rural or urban): ")
smoking_status = input("Enter smoking status (formerly smoked, never smoked, or smokes): ")
with open('Models/model_name.pkl', 'rb') as file:
predictor = pickle.load(file)
X_test = np.array([list(user_data.values())])
predictions = predictor.predict(X_test)
To train the stroke prediction model using the provided dataset, you can run the train_model.py
script. This script loads the dataset, preprocesses the data, trains the model, and saves the trained model to a file.
python train_model.py
The trained models will be saved as RandomForest.pkl, LinearSVC.pkl, NeuralNetwork.pkl, LogisticRegression.pkl, KNN.pkl
in the models directory.
The outcome of the trained model can be tested using the predict_stroke.py
script. Providing user input, one can predict the outcome.
python predict_stroke.py
Contributions to this project are welcome. If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.
This project is licensed under the Apache License 2.0. See the LICENSE file for more details.