🫀 Heart Stroke Prediction

Problem Statement and Solution Proposed

This project aims to solve the problem of Healthcare clinics where they can predict if a patient is likely to get heart stroke based on diagnostic report, the model is done Using Sklearn's supervised machine learning techniques. It is a Classification problem and training are carried out on dataset of previous patients with their diagnostic report with age, gender and history of other disease. Several classification techniques have been studied, the model has been finalized with Random forest and K-Nearest Neighbors in pipeline.

For Detailed EDA and Feature engineering Check out notebook directory. dataset is stored locally inside notebooks directory.

Their performances were compared in order to determine which one works best with our dataset and used them to predict if patient will get heart stroke or not from user input from Flask application.

👨‍💻 Tech Stack Used

Python
FastAPI
Machine learning algorithms
Docker
MongoDB

🌐 Infrastructure Required.

AWS S3
AWS EC2
AWS ECR
Git Actions
Terraform

💾 Features in the dataset

- id : a unique identifier that distinguishes each data [int]
- Gender: Patient's gender ('Male', 'Female', and 'Other') [str]
- age : Age of the patient [int]
- Hypertension: Hypertension or high blood pressure is a disease that puts a person at risk for stroke. 0 if the patient does not have hypertension, 1 if the patient has hypertension. [int]
- heart_disease: Heart disease is a disease that puts a person at risk for stroke. 0 if the patient does not have heart disease, 1 if the patient has heart disease. [int]
- ever_married : Describes whether the patient is married or not ('Yes' or 'No') [str]
- work_type : Type of employment or status ('children' for children, 'Govt_job' for civil servants, 'Never_worked' for those who have never worked, 'Private' or 'Self-employed' for entrepreneurs or freelancers) [str]
- Residence_type : Condition of residence ('Rural' for rural areas and 'Urban' for urban areas) [str]
- avg_glucose_level : Average amount of glucose (sugar) in the blood [float]
- bmi : Body Mass Index to measure the stability of body weight with height. [float]
- smoking_status : Description of smoking ('formerly smoked' for those who have smoked, 'never smoked' for those who have never smoked, 'smokes' for those who smoke, and 'unknown' for those whose smoking status is unknown) [str]

Project Folder Structure

root/
└── heart_stroke/
    ├── cloud_storage/
    │   ├── __init__.py
    │   └── aws_storage.py
    ├── components/
    │   ├── __init__.py
    │   ├── data_ingestion.py
    │   ├── data_transformation.py
    │   ├── data_validation.py
    │   ├── model_evaluation.py
    │   ├── model_pusher.py
    │   └── model_trainer.py
    ├── configuration/
    │   ├── __init__.py
    │   ├── aws_connection.py
    │   └── mongo_db_connection.py
    ├── constant/
    │   ├── __init__.py
    │   ├── training_pipeline/
    │   │   └── __init__.py
    │   ├── application.py
    │   ├── database.py
    │   ├── env_variables.py
    │   └── s3_bucket.py
    ├── data_access/
    │   ├── __init__.py
    │   └── heart_stroke_data.py
    ├── entity/
    │   ├── __init__.py
    │   ├── artifact_entity.py
    │   ├── config_entity.py
    │   ├── estimator.py
    │   └── s3_estimator.py
    ├── exception/
    │   └── __init__.py
    ├── logger/
    │   └── __init__.py
    ├── pipeline/
    │   ├── __init__.py
    │   ├── train_pipeline.py
    │   └── prediction_pipline.py
    └── utils/
        ├── __init__.py
        └── main_utils.py

How to run?

Before we run the project, make sure that you are having MongoDB in your local system, with Compass since we are using MongoDB for data storage. You also need AWS account to access the service like S3, ECR and EC2 instances.

Data Collections

Project Archietecture

Deployment Archietecture

Step 1: Clone the repository

git clone my repository

Step 2- Create a conda environment after opening the repository

conda create -p stroke python=3.9 -y

conda activate stroke/

Step 3 - Install the requirements

pip install -r requirements.txt

Step 4 - Export the environment variable

export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>

export AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>

export AWS_DEFAULT_REGION=<AWS_DEFAULT_REGION>

export MONGODB_URL="mongodb+srv://<username>:<password>@ineuron-ai-projects.7eh1w4s.mongodb.net/?retryWrites=true&w=majority"

Step 5 - Run the application server

python app.py

Step 6. Train application

http://localhost:8080/train

Step 7. Prediction application

http://localhost:8080/predict

Run locally

Check if the Dockerfile is available in the project directory
Build the Docker image

docker build --build-arg AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID> --build-arg AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY> --build-arg AWS_DEFAULT_REGION=<AWS_DEFAULT_REGION> --build-arg MONGODB_URL=<MONGODB_URL> . -t <tag>

Run the Docker image

docker run -d -p 8080:8080 <IMAGEID>

Models Used

Logistic Regression
KNeighbors Classifier
XGB Classifier
CatBoost Classifier
SVC
AdaBoost Classifier
RandomForest Classifier

From these above models after hyperparameter optimization we selected Top two models which were KNeighbors Classifier and Random Forest Classifier and used the following in Pipeline.

GridSearchCV is used for Hyperparameter Optimization in the pipeline.

heart_stroke is the main package folder which contains all codes.

Conclusion

This Project can be used in real-life by Health Clinics to predict if the user has chance of heart stroke or not.
Can be implemented in hospital website to predict the chance of heart stroke for the patients.
As heart diseases and strokes are increasing rapidly across the world and causing deaths, it becomes necessary to develop an efficient system that would predict the heart stroke effectively before hand so that immediate medical attention can be given. In the proposed system, the most effective algorithm for stroke prediction was obtained after comparative analysis of the accuracy scores of various models.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
config		config
docs		docs
flowcharts		flowcharts
heart_stroke		heart_stroke
notebooks		notebooks
static/css		static/css
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

License

aravind-selvam/ml-pipeline-using-stroke-data

Folders and files

Latest commit

History

Repository files navigation