# Predicting heart disease using machine learning

This notebook looks into using various Python-based machine learning and data science libraries in an attempt to build a machine learning model capable of predicting whether or not someone has heart disease based on their medical attributes.

We're going to take the following approach:
1. Problem definition
2. Data
3. Evaluation
4. Features
5. Modelling
6. Experimentation

## 1. Problem Definition

In a statement, 
> Given clinical parameters about a patient, can we predict whether or not they have heart disease?

## 2. Data

The original data came from the Cleveland data from the UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/heart+disease

Kaggle version available: https://www.kaggle.com/c/heart-disease-uci/data

## 3. Evaluation

> If accuracy of 95% heart disease prediction is reached, project is viable.

## 4. Features

Features of the data are presented below:

**Create data dictionary**

1. age. The age of the patient.
2. sex. The gender of the patient. (1 = male, 0 = female).
3. cp. Type of chest pain. 
    * 1 = typical angina
    * 2 = atypical angina
    * 3 = non — anginal pain
    * 4 = asymptotic
4. trestbps. Resting blood pressure in mmHg.
5. chol. Serum Cholestero in mg/dl.
6. fbs. Fasting Blood Sugar. 
    * 1 = fasting blood sugar is more than 120mg/dl
    * 0 = otherwise
7. restecg. Resting ElectroCardioGraphic results
    * 0 = normal, 1 = ST-T wave abnormality
    * 2 = left ventricular hyperthrophy
8. thalach. Max heart rate achieved.
9. exang. Exercise induced angina (
    * 1 = yes 
    * 0 = no
10. oldpeak. ST depression induced by exercise relative to rest.
11. slope. Peak exercise ST segment (
    * 1 = upsloping 
    * 2 = flat
    * 3 = downsloping
12. ca. Number of major vessels (0–3) colored by flourosopy.
13. thal. Thalassemia (
    * 3 = normal, 
    * 6 = fixed defect
    * 7 = reversible defect
14. num. Diagnosis of heart disease 
    * 0 = absence, 
    * 1, 2, 3, 4 = present

## Preparing the tools

Used libraries: Pandas, Matplotlb, NumPy, Seaborn

In [8]:
# Import exploratory data analysis libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# To make plots visible in notebooks
%matplotlib inline

# Models from Scikit-Learn
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Model Evaluations
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import plot_roc_curve