## Predicting Heart Disease Using Machine Learning

This notebook looks into using various Python-based machine learning and data science libraries in an attempt to build a machine learning model capable of predicting whether or not someone has heart disease based on their medical attributes.

We're going to take the following approach:
1. Problem definition
2. Data
3. Evaluation
4. Features
5. Modeling
6. Experimentation
___

### 1. Problem Definition
In a statement,
> Given clinical parameters about a patient, can we predict whether or not they have heart disease?

### 2. Data
The original data came from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Heart+Disease).

There is also a version of it available on [Kaggle](https://www.kaggle.com/ronitf/heart-disease-uci).

### 3. Evaluation
>If we can reach 95% accuracy at predicting whether or not a patient had heart disease during the proof of concept, we'll pursue the project.

### 4. Features
This is where you'll get different information about each of the features in your data. You can do this via doing your own research or by talking to a subject matter expert.

**Create a data dictionary**
1. age - age in years
2. sex - sex (1 = male; 0 = female)
3. cp - chest pain type
    * 0: Typical angina - chest pain related to decreased blood supply to the heart
    * 1: Atypical angina - chest pain not related to heart
    * 2: Non-anginal pain - typically esophageal spasms (not heart related)
    * 3: Asymptomatic - chest pain not showing signs of disease
4. trestbsp - resting blood pressure (in mmg Hg on admission to the hospital)
5. chol - serum cholesterol in mg/dl
    * serum = LDL + HDL + VLDL
    * above 200 is cause for concern
6. fbs - fasting blood sugar > 120 mg/dL)(1 = yes; 0 = no)
    * '>126'mg/dL signals diabetes
7. restecg - resting electrocardiographic results
    0: normal
    1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. #32 (thalach) 
9. #38 (exang) 
10. #40 (oldpeak) 
11. #41 (slope) 
12. #44 (ca) 
13. #51 (thal) 
14. #58 (num) (the predicted attribute) 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('heart-disease.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
