# Heart Disease Prediction using Logistic Regression


## Problem statement
Heart diseases are fatal and their early detection can save numerous lives. The problem  is to predict if a patient has heart disease using general data points such as age, chest pain type, sex, Blood Pressure, and other similar data points.

## Possible solution
One of the possible solution can be a screening test provided by government or hospitals for free on their website. This can greatly benifit disease discovery at an early stage. These screening tests can use machine learning models to predict results.

In [45]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import recall_score, precision_score
from sklearn.linear_model import LogisticRegression
import joblib

In [46]:
dataset = pd.read_csv("heart.csv")

In [47]:
dataset.describe()

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,HeartDisease
count,918.0,918.0,918.0,918.0,918.0,918.0,918.0
mean,53.510893,132.396514,198.799564,0.233115,136.809368,0.887364,0.553377
std,9.432617,18.514154,109.384145,0.423046,25.460334,1.06657,0.497414
min,28.0,0.0,0.0,0.0,60.0,-2.6,0.0
25%,47.0,120.0,173.25,0.0,120.0,0.0,0.0
50%,54.0,130.0,223.0,0.0,138.0,0.6,1.0
75%,60.0,140.0,267.0,0.0,156.0,1.5,1.0
max,77.0,200.0,603.0,1.0,202.0,6.2,1.0


In [48]:
dataset.dtypes

Age                 int64
Sex                object
ChestPainType      object
RestingBP           int64
Cholesterol         int64
FastingBS           int64
RestingECG         object
MaxHR               int64
ExerciseAngina     object
Oldpeak           float64
ST_Slope           object
HeartDisease        int64
dtype: object

In [49]:
dataset.isnull().values.any()

False

In [50]:
dataset.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [51]:
y = dataset.HeartDisease
X = dataset.drop(columns=["HeartDisease"])

In [52]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [53]:
X_test = pd.get_dummies(X_test,columns=["ST_Slope","ExerciseAngina","RestingECG","ChestPainType","Sex"])
X_train = pd.get_dummies(X_train,columns=["ST_Slope","ExerciseAngina","RestingECG","ChestPainType","Sex"])

In [54]:
X_test.head()

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,ST_Slope_Down,ST_Slope_Flat,ST_Slope_Up,ExerciseAngina_N,ExerciseAngina_Y,RestingECG_LVH,RestingECG_Normal,RestingECG_ST,ChestPainType_ASY,ChestPainType_ATA,ChestPainType_NAP,ChestPainType_TA,Sex_F,Sex_M
668,63,140,195,0,179,0.0,0,0,1,1,0,0,1,0,0,1,0,0,1,0
30,53,145,518,0,130,0.0,0,1,0,1,0,0,1,0,0,0,1,0,0,1
377,65,160,0,1,122,1.2,0,1,0,1,0,0,0,1,1,0,0,0,0,1
535,56,130,0,0,122,1.0,0,1,0,0,1,1,0,0,1,0,0,0,0,1
807,54,108,309,0,156,0.0,0,0,1,1,0,0,1,0,0,1,0,0,0,1


In [55]:
scaler = MinMaxScaler().fit(X_train[["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]].values)
joblib.dump(scaler,"scaler.gz")

['scaler.gz']

In [56]:
X_train[["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]] = pd.DataFrame(scaler.transform(X_train[["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]].values),columns=["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]
                                        ,index=X_train.index)

In [57]:
X_test[["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]] = pd.DataFrame(scaler.transform(X_test[["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]].values),columns=["Age","RestingBP","Cholesterol","MaxHR","Oldpeak"]
                                        ,index=X_test.index)

In [58]:
X_train.head()

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,ST_Slope_Down,ST_Slope_Flat,ST_Slope_Up,ExerciseAngina_N,ExerciseAngina_Y,RestingECG_LVH,RestingECG_Normal,RestingECG_ST,ChestPainType_ASY,ChestPainType_ATA,ChestPainType_NAP,ChestPainType_TA,Sex_F,Sex_M
712,0.604167,0.5,0.38806,0,0.712,0.306818,0,0,1,1,0,0,1,0,1,0,0,0,0,1
477,0.666667,0.55,0.0,1,0.328,0.522727,1,0,0,0,1,0,1,0,1,0,0,0,0,1
409,0.5,0.8,0.0,1,0.44,0.295455,0,1,0,0,1,1,0,0,0,0,1,0,0,1
448,0.708333,0.8,0.381426,1,0.304,0.409091,0,1,0,0,1,0,1,0,1,0,0,0,0,1
838,0.708333,0.65,0.547264,1,0.52,0.5,0,0,1,0,1,1,0,0,1,0,0,0,0,1


In [59]:
model = LogisticRegression()

In [60]:
model.fit(X_train,y_train)

LogisticRegression()

In [61]:
y_pred = model.predict(X_test)

In [63]:
print("accuracy: " + str(model.score(X_test,y_test)*100.) + "%")
print("precision: " + str(precision_score(y_test,y_pred)))
print("recall: " + str(recall_score(y_test,y_pred)))

accuracy: 87.68115942028986%
precision: 0.9166666666666666
recall: 0.8719512195121951
