GitHub - Hpanuganty/Heart_Disease_Classification: End to end classification project predicting the risk of coronary heart disease based on several demographic, behavioral and medical variables

This repository will provide a detailed walk through of the 9 essential steps for an end-to-end project using the Framingham Heart Study dataset. The data is from an ongoing cardiovascular study on individuals from Framingham, Massachusetts where study participants are monitored for the risk of Coronary Heart Disease (CHD) based off 15 different variables. With this dataset we'll determine the most relevant variables to the outcome and predict the overall risk of being diagnosed with CHD.

Step 1: Defining the problem Understanding supervised vs unsupervised ml problems and classification vs regression. In this project the label is provided for us and we're looking to classify individuals into two (binary categories), thus this is a supervised classification problem

Step 2: Data Loading Brief explanation of common data loading libraries (NumPy, Pandas, Matplotlib, Seaborn), loading data from a CSV file and describing variables/features

Step 3: Data Cleaning Identifiying and handling null values with specific methodologies/techniques

Step 4: Exploratory Data Analysis

Boxplots: Recognizing and removing outliers (extreme data points)
Heatmaps: Showcasing the relationship between all variables with the target variable, and tips on reading heatmaps
Distplots: Showing frequency distribution and skew of each of the variables
Barplots: Plot relationship between two categorical variables
Countplots: Displaying the count of numerical observations in a categorical 'bin' with bars
Regplots: Plotting trendline to show relationship between a numerical and categorical variable

Step 5: Feature Selection An overview of choosing specific features based off EDA for the ml model and techniques on how to do so. Detailed explanation of SelectKBest and Mutual Information Classification methods.

Step 6: Data pre-processing Summarize pre-processing steps including train-test split, feature scaling and balancing via SMOTE

Step 7: Predictive Modeling Walk through of 4 common classification algorithms and reasoning as to why it was used

Logistic Regression: Understanding influence of one or more indepedent variables to single outcome variable
Random Forest: Combines multiple decision trees into one model, enhancing performance of each tree model into one strong tree model
K-Nearest Neighbors: Calculating 'closeness' between each data point to assign the label
Support Vector Machine: Linear separator to divide data points into classes

Step 8: Hyperparameter Tuning Discussed importance of hyperparameter tuning, overview of RandomizedSearch/GridSearch and provided an explanation as to how hyperparameters were adjusted for each model and how tuning affected overall model performance.

Step 9: Model Evaluation Explored the different ways of evaluating a model and why accuracy/confusion matrices were chosen in this instance. Provided reasoning as to why Hypertuned Random Forest was the model that represented the dataset and outcome the best.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Framingham Heart Study Final.ipynb		Framingham Heart Study Final.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Hpanuganty/Heart_Disease_Classification

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages