# Predicting heart disease using machinge learning 
    this notedbook looks into using various python-based machine learning and data science libraries in an attempt to build a machine learning model capable of predicting wether or not someone has heart disease based on their medical attributes. 
    
    We are going to take the following approach: 
    1. Problem definition 
    2. Data
    3. Evaluation 
    4. Features
    5. Modelling
    6. Experimentation
    

## 1. Problem Definition 
    >Given clinical parameters about a patient, can we predict whether or not they have heart disease?

## 2. Data
    > Our data comes from UCI Machine Learning Repository, Heart Disease Data Set, 
    https://archive.ics.uci.edu/dataset/45/heart+disease
    but the CSV version came from Kaggle
    https://www.kaggle.com/datasets/cherngs/heart-disease-cleveland-uci
    
## 3. Evaluation
    if we can reach 95% accuracy at predicting whether or not a patient has heart disease during the proof of concept we'll puruse the project. 

## 4. Features
    This is where we will get different information about the features of your data
    **Create a Data Dictionary**
    age: age in years
    sex: sex (1 = male; 0 = female)
    cp: chest pain type
    -- Value 0: typical angina
    -- Value 1: atypical angina
    -- Value 2: non-anginal pain
    -- Value 3: asymptomatic
    trestbps: resting blood pressure (in mm Hg on admission to the hospitalchol: serum cholestoral in mg/dl
    fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
    chol- serum cholesterol in mg/dl
    --serum = LDL + HDL + .2 * triglyceride
    --above 200 is a convern
    restecg: resting electrocardiographic results
    -- Value 0: normal
    -- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
    thalach: maximum heart rate achieved
    exang: exercise induced angina (1 = yes; 0 = no)
    oldpeak = ST depression induced by exercise relative to rest
    slope: the slope of the peak exercise ST segment
    -- Value 0: upsloping
    -- Value 1: flat
    -- Value 2: downsloping
    ca: number of major vessels (0-3) colored by flourosopy
    thal: 0 = normal; 1 = fixed defect; 2 = reversable defect 
    and the label
    condition: 0 = no disease, 1 = disease






## Preparing the tools

we are going to use pandas, Matplotlib, and NumPy for data analysis and manipulation

In [1]:
#Imort all the tools we need

#Regular Exploritory Data Analysis (EDA) and Plotting libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

%matplotlib inline 
#we want the plots to appear inside of the jupyter notebook

#Models from Scikit-Learn THese are chosen from sklearn map, we are working with classification, yes or no
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

#Model Evaluations 
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import RocCurveDisplay





## Load the data 

In [5]:
df = pd.read_csv("heart-disease.csv")
df.shape #This will show us the number of rows and the columns in our dataframe

(303, 14)

## Data Exploration or Exporatory Data Analysis (EDA)

In [7]:
#No set way to do this, you just simply try to become more and more familiar with our data. 

In [8]:
df.describe()


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0
