# **Heart Disease prediction using Machine Learning**

We're going to use various python based ML & Data Science libraries in order to build a Machine learning Model capable of predicting whether a patient has heart disease, based on the medical data of the patient.

We'll use **Supervised machine learning** as the data we've is labelled.

We'll use the following steps to define our workflow:

1. **Problem definition :** What's the problem, it's type in ML & what we have to get as output.
2. **Data :** Look into the data we've, it's source, properties etc.
3. **Target (Success) :** What is success for us, what level of results we wanted from our model.
4. **Features :** Look into the Features **(predicting variables)**, what's necessary etc.
5. **Modelling :** It includes a number of things.
    * Get the data ready for the model, Explore and preprocess it.
    * Get the best base model for our data & problem, it is generally recursive approach.
    * Evaluate our model on various metrics & with different techniques.
    * Quantify the model's performance, as results.
6. **Experimentation :** The overall Process is very experimental, we've to try different things & determine the best output for us.

At the end **ML is all about Experimentation.**

<img src="./Images/Steps.png" width=1000>

## **Let's tackle the steps:**


### **Problem :**

In a statement,
> Given clinical attributes of a patient, can we predict whether the patient has any heart disease or not?

The problem is a **ML Binary Classification** problem i.e. we've to determine whether a patient belongs to diseased group or not!

### **Data :**

> The original data comes from UCI Machine Learning Repo, Cleveland data [repository](https://archive.ics.uci.edu/dataset/45/heart+disease).

There is also a Kaggle Notebook of the dataset for more details [here](https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset/data).

### **Target | Success**

We're building a model to predict whether a patient has heart disease or not!.

**medical** predictions are really **crucial**, We will target a **95% or over** prediction accuracy for our model to be used in Production.

### **Features Understanding**

Here, the dataset is **already curated** & have all **14 important features** present.

**The Data Dictionary**, what the columns mean!


A data dictionary describes the data you're dealing with. Not all datasets come with them so this is where you may have to do your research or ask a subject matter expert (someone who knows about the data) for more.

The following are the features we'll use to predict our target variable (heart disease or no heart disease).


1. age - age in years 
2. sex - (1 = male; 0 = female) 
3. cp - chest pain type 
    * 0: Typical angina: chest pain related decrease blood supply to the heart
    * 1: Atypical angina: chest pain not related to heart
    * 2: Non-anginal pain: typically esophageal spasms (non heart related)
    * 3: Asymptomatic: chest pain not showing signs of disease
4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
    * anything above 130-140 is typically cause for concern
5. chol - serum cholestoral in mg/dl 
    * serum = LDL + HDL + .2 * triglycerides
    * above 200 is cause for concern
6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 
    * '>126' mg/dL signals diabetes
7. restecg - resting electrocardiographic results
    * 0: Nothing to note
    * 1: ST-T Wave abnormality
        - can range from mild symptoms to severe problems
        - signals non-normal heart beat
    * 2: Possible or definite left ventricular hypertrophy
        - Enlarged heart's main pumping chamber
8. thalach - maximum heart rate achieved 
9. exang - exercise induced angina (1 = yes; 0 = no) 
10. oldpeak - ST depression induced by exercise relative to rest 
    * looks at stress of heart during excercise
    * unhealthy heart will stress more
11. slope - the slope of the peak exercise ST segment
    * 0: Upsloping: better heart rate with excercise (uncommon)
    * 1: Flatsloping: minimal change (typical healthy heart)
    * 2: Downslopins: signs of unhealthy heart
12. ca - number of major vessels (0-3) colored by flourosopy 
    * colored vessel means the doctor can see the blood passing through
    * the more blood movement the better (no clots)
13. thal - thalium stress result
    * 1,3: normal
    * 6: fixed defect: used to be defect but ok now
    * 7: reversable defect: no proper blood movement when excercising 
14. target - have disease or not (1=yes, 0=no) (= the predicted attribute)
