# Machine Learning Model - Heart Disease Prediction 

The notebook uses Python machine learning and data science libraries to build a model to predict if a patient has heart disease based on underlying medical data. 

Outline of Project:
1. Problem Definition
2. Data
3. Evaluation
4. Data Features
5. Modeling
6. Experimentation


## 1. Problem Definition

Based on the patient medical data provided, the objective of this project is to implement a binary classification machine learning model to predict if the patient has heart disease.


## 2. Data

The original source of the data used to produce this model can be found at the UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/Heart%20Disease


## 3. Evaluation

Evaluation Metric - Accuracy (95%) - can the model be implemented and improved to predict patient heart disease 95% of the time?



## 4. Data Features


### Data Dictionary: 14 attributes total 


| # | Attribute                 | Field Name | Type    | Notes                                            |
|:--| :--                       | :--------- | :-----: | :------------                                    |
|01.| Age                       | age        | int64   | age in years                                     |
|02.| Sex                       | sex        | int64   | gender (1=male; 0=female)                        |
|03.| Chest Pain Type           | cp         | int64   | chest pain type (see below)                      |
|04.| Resting Blood Pressure    | trestbps   | int64   | resting blood pressure mm Hg                     |
|05.| Serum Cholesterol         | chol       | int64   | LDL + HDL + .2 * triglycerides (mg/dl)           |
|06.| Fasting Blood Sugar       | fbs        | int64   | > 120 mg/dl (1=True; 0=False)                    |
|07.| Resting ECG Results       | restecg    | int64   | resting electrocardiographic results (see below) |
|08.| Max Heart Rate            | thalach    | int64   | maximum heart rate                               |
|09.| Exercise-Induced Angina   | exang      | int64   | exercise induced angina (1=yes; 0=no)            |
|10.| ST Depression             | oldpeak    | float64 | ST depression by exercise relative to rest       |
|11.| ST Peak Slope             | slope      | int64   | slope of peak exercise ST segment (see below)    |
|12.| Flourosopy-Colored Vessels| ca         | int64   | number of major vessels 0-3                      |
|13.| Thallium Results          | thal       | int64   | Result of thallium stress test                   |
|14.| Heart Disease Diagnosis   | target     | int64   | Heart Disease (1=True; 0=False)                  |

#### Legend

##### 03. Chest Pain Type
| Value | Description      | 
| :---: | :--              |
| 0     | asymptomatic     | 
| 1     | typical angina   | 
| 2     | atypical angina  | 
| 3     | non-anginal pain | 


##### 07. Resting ECG Results
| Value | Description              |
| :--:  | :--                      |  
| 0     | normal                   | 
| 1     | ST-T wave abnormality    | 
| 2     | ventricular hypertrophy  | 

##### 11. ST Peak Slope 
| Value | Description                        |
| :--:  | :--                                |  
| 0     | upward (uncommon)                  | 
| 1     | flat (typical of healthy heart)    | 
| 2     | downward (sign of unhealthy heart) |     

##### 13. Thallium Results
| Value | Description                        |
| :--:  | :--                                |  
| 1,3   | normal                             | 
| 6     | fixed defect                       | 
| 7     | reversible defect                  | 
    
**Note**: No personal identifiable information (PPI) can be found in the dataset.  

In [11]:
# Formatting tables above
%%html
<style>
    table {
        display: inline-block
    }
</style>

## Tools used in Model:

Data analysis and modeling using Python - pandas, NumPy, scikit-learn, and Matplotlib. 

In [12]:
# Data 
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# sklearn classification models
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# sklearn model selection and evaluation metrics
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import plot_roc_curve