# Machine Learning Model - Heart Disease Prediction 

The notebook uses Python machine learning and data science libraries to build a model to predict if a patient has heart disease based on underlying medical data. 

Outline of Project:
1. Problem Definition
2. Data
3. Evaluation
4. Data Features
5. Modeling
6. Experimentation


## 1. Problem Definition

Based on the patient medical data provided, the objective of this project is to implement a binary classification machine learning model to predict if the patient has heart disease.


## 2. Data

The original source of the data used to produce this model can be found at the UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/Heart%20Disease


## 3. Evaluation

Evaluation Metric - Accuracy (95%) - can the model be implemented and improved to predict patient heart disease 95% of the time?



In [9]:
%%html
<style>
    table {
        display: inline-block
    }
</style>

## 4. Features


### Data Dictionary: 14 attributes total 


| # | Attribute                 | Field Name | Type    | Notes                                            |
|:--| :--                       | :--------- | :-----: | :------------                                    |
|01.| Age                       | age        | int64   | age in years                                     |
|02.| Sex                       | sex        | int64   | gender (1=male; 0=female)                        |
|03.| Chest Pain Type           | cp         | int64   | chest pain type (see below)                      |
|04.| Resting Blood Pressure    | trestbps   | int64   | resting blood pressure mm Hg                     |
|05.| Serum Cholesterol         | chol       | int64   | cholestoral in mg/dl                             |
|06.| Fasting Blood Sugar       | fbs        | int64   | > 120 mg/dl (1=True; 0=False)                    |
|07.| Resting ECG Results       | restecg    | int64   | resting electrocardiographic results (see below) |
|08.| Max Heart Rate            | thalach    | int64   | maximum heart rate                               |
|09.| Exercise-Induced Angina   | exang      | int64   | exercise induced angina (1=yes; 0=no)            |
|10.| ST Depression             | oldpeak    | float64 | ST depression by exercise relative to rest       |
|11.| ST Peak Slope             | slope      | int64   | slope of peak exercise ST segment (see below)    |
|12.| Flourosopy-Colored Vessels| ca         | int64   | number of major vessels 0-3                      |
|13.| Thallium Results          | thal       | int64   | Result of thallium stress test                   |
|14.| Heart Disease Diagnosis   | target     | int64   | Heart Disease (1=True; 0=False)                  |

#### Legend

##### 03. Chest Pain Type
| Value | Description      | 
| :---: | :--              |
| 0     | asymptomatic     | 
| 1     | typical angina   | 
| 2     | atypical angina  | 
| 3     | non-anginal pain | 


##### 07. Resting ECG Results
| Value | Description              |
| :--:  | :--                      |  
| 0     | normal                   | 
| 1     | ST-T wave abnormality    | 
| 2     | ventricular hypertrophy  | 

##### 11. ST Peak Slope 
| Value | Description                        |
| :--:  | :--                                |  
| 0     | upward (uncommon)                  | 
| 1     | flat (typical of healthy heart)    | 
| 2     | downward (sign of unhealthy heart) |     

##### 13. Thallium Results
| Value | Description                        |
| :--:  | :--                                |  
| 1,3   | normal                             | 
| 6     | fixed defect                       | 
| 7     | reversible defect                  | 
    
  

In [5]:
import pandas as pd
data1 = pd.read_csv('./data/heart-disease.csv')

In [6]:
data1

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


In [7]:
data1.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal          int64
target        int64
dtype: object