STROKE-PREDICTION-PROJECT

DATA SCIENCE PROJECT: PERFORMING EDA, HYPOTHESIS TESTING, VISUALIZATIONS, CONSTRUCTING A MODEL AND PREDICTING A STROKE OCCURANCE FROM DATA.

This project was a task given to us by a professor in one of our uni courses. We are sophmores majoring in AI ENGINEERING and the course of this project is called introduction to data science. The code contains EDA, a lot of visualization and an SVM model to predict a stroke occurance. Data are from two sources in kaggle more detail on sources are going to be given later on.

please refer to this link to show interactive plots:https://nbviewer.org/github/PURPLEWATER00/STROKE-PREDICTION-PROJECT/blob/main/final%20final.ipynb

model deployed in streamlit link: https://purplewater00-stroke-prediction-project-main-vbxln1.streamlit.app/

EDA
SVM model
Stroke prediction
Synthetically generated data
Synthetically generated data and real life data

PROJECT STRUCTRE

The following map shows the flow of the project:

PROJECT USAGE:

If you want to use the code there are several steps you might want to consider before runnin the code:

train set and test set are constructed of the following:
- train set: constructed from two sources one was synthetically generated and the other is a real life data set both from kaggle (Link will be given at the end). The train1_df and train2_df variable are where either file names could go in.
- test set: constructed of the test set of the synthetically generated data competition.

The rest of the code will run successfully if test set and train set are specified as the instructed way above.

Found a bug?

We would love some feedback in the comments. Please be as ruthless as possible we would love to learn from anyone willing to point out any issue. (professor if you are reading this please give us feedback :) )

links:

Kaggle competition (1st train set + test set): https://www.kaggle.com/competitions/playground-series-s3e2/data
Kaggle real world data set (2nd train set): https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.streamlit		.streamlit
style		style
README.md		README.md
final final.ipynb		final final.ipynb
main.py		main.py
model.pkl		model.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STROKE-PREDICTION-PROJECT

DATA SCIENCE PROJECT: PERFORMING EDA, HYPOTHESIS TESTING, VISUALIZATIONS, CONSTRUCTING A MODEL AND PREDICTING A STROKE OCCURANCE FROM DATA.

PROJECT STRUCTRE

PROJECT USAGE:

Found a bug?

links:

About

Contributors 3

Languages

PURPLEWATER00/STROKE-PREDICTION-PROJECT

Folders and files

Latest commit

History

Repository files navigation

STROKE-PREDICTION-PROJECT

DATA SCIENCE PROJECT: PERFORMING EDA, HYPOTHESIS TESTING, VISUALIZATIONS, CONSTRUCTING A MODEL AND PREDICTING A STROKE OCCURANCE FROM DATA.

PROJECT STRUCTRE

PROJECT USAGE:

Found a bug?

links:

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages