Our Web API looks something like this
- Dataset Imported from kaggle
- The Dataset contains 14 attributes
- Age : Age of the patient
- Sex : Sex of the patient
- cp : Chest Pain type chest pain type
- Value 1: typical angina
- Value 2: atypical angina
- Value 3: non-anginal pain
- Value 4: asymptomatic
- trestbps : resting blood pressure (in mm Hg)
- chol : cholestoral in mg/dl fetched via BMI sensor
- fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
- rest_ecg : resting electrocardiographic results
- Value 0: normal
- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
- thalach : maximum heart rate achieved
- exang: exercise induced angina (1 = yes; 0 = no)
- oldpeak: ST depression induced by exercise relative to rest OR Patient's old peak history recorded
- slope: the slope of the peak exercise ST segment
- Value 1: upsloping
- Value 2: flat
- Value 3: downsloping
- ca: the number of major blood vessels with a fluorescent color (0-4). Fluorescent color is mainly associated with diabetes
- thal: Thalassemia
- target : 0= less chance of heart attack 1= more chance of heart attack
- Dataset had 14 features and 303 samples
- Dataset had 5 numerical and 8 categorical features
- Slightly imbalanced dataset with 54.5% diseased patients
- No null value in dataset
- Many outliers present in our dataset
- 1 duplicate row was present which was removed
- No null value in dataset
- After removal of outliers dataset contained 283 samples
- Applied dummy encoding on dataset
- Applied Standard scaler method for scaling (Standardization)
- Also applied MinMaxScaler but did not used with Application
- Test size was 20% of dataset
Applied various Machine learning algorithms like-
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Naive Bayes' Classifier
- Support Vector Machine
- K-Nearest Neighbours
- Gradient Boosting
- Extreme Gradient Boosting (XG Boost)
- Created Confusion matrix for all classifiers
- Drawn plot of Roc Curves
- Made Classification report
- Applied RandomizedSearchCV and RepeatedStratifiedKfold for hyperparameters tuning
Clearly Logistic Regression algorithm worked best for our Dataset providing accuracy of 89.5%
- App was created using FLASK - a micro web framework in python
- App was deployed using Heroku - providing platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.
- App link - https://heart--disease--predictions.herokuapp.com/
- https://www.geeksforgeeks.org/deploy-machine-learning-model-using-flask/ (Full Template)
- https://www.digitalocean.com/community/tutorials/how-to-use-and-validate-web-forms-with-flask-wtf
- https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-gunicorn-and-nginx-on-ubuntu-20-04
- https://www.digitalocean.com/community/tutorials/how-to-create-your-first-web-application-using-flask-and-python-3
- https://www.digitalocean.com/community/tutorials/how-to-use-templates-in-a-flask-application
- https://www.digitalocean.com/community/tutorial_series/how-to-build-a-website-with-html
- https://www.digitalocean.com/community/tutorials/how-to-use-web-forms-in-a-flask-application