# Predicting heart disease using machine learning


<ol>
    <li>Problem definition</li>
    <li>Data</li>
    <li>Preparing tools</li>
    <li>Evaluation</li>
    <li>Features</li>
    <li>Modeling</li>
</ol>


# Data
    
   **Data dictionary** 
1. **age** in years
2. **sex:**
    * 1 = male
    * 0 = female
3. **cp** chest pain type:
    * 1: typical angina
    * 2: atypical angina
    * 3: non-anginal pain
    * 4: asymptomatic
4. **trestbps** resting blood pressure (in mm Hg on admission to the hospital)
5. **chol** serum cholestoral in mg/dl
6. **fbs** (fasting blood sugar > 120 mg/dl):
    * 1 = true
    * 0 = false
7. **restecg** resting electrocardiographic results:
    * 0: normal
    * 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    * 2: showing probable or definite left ventricular hypertrophy by Estes' criteria 
8. **thalach** maximum heart rate achieved
9. **exang** exercise induced angina:
    * 1 = yes
    * 0 = no
10. **oldpeak** ST depression induced by exercise relative to rest
11. **slope** the slope of the peak exercise ST segment:
    * 1: upsloping
    * 2: flat
    * 3: downsloping 
12. **ca** number of major vessels (0-3) colored by flourosopy
13. **thal:** 
    * 3 = normal 
    * 6 = fixed defect
    * 7 = reversable defect
14. **target:**
    * 1 = true
    * 0 = false

# Preparing tools

In [None]:
from __future__ import print_function, division
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# we want our plots to apper inside notebook
%matplotlib inline

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# model evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, recall_score, f1_score, r2_score
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.metrics import plot_roc_curve, plot_confusion_matrix