# Heart Disease Prediction

References:

    [1] https://www.kaggle.com/ronitf/heart-disease-uci
    [2] "Statistics and Machine Learning in Python"; Edouard Duchesnay, Tommy Löfstedt, Feki Younes; Oct 2020.   

In [1]:
import pandas as pd
from pandas import read_csv
import numpy as np
import sklearn.linear_model as lm
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report

In [2]:
filename = 'heart.csv'
df = read_csv(filename)

In [3]:
df

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


In [4]:
heart_data = df.values

In [5]:
X = heart_data[:, 0:13]
Y = heart_data[:, 13]
scaler = MinMaxScaler(feature_range=(0, 1))
rescaled_X = scaler.fit_transform(X)

In [6]:
my_test_size = 0.2
seed = 2 
X_train, X_test, Y_train, Y_test = train_test_split(rescaled_X, Y, test_size = my_test_size , random_state = seed)

In [7]:
logreg = lm.LogisticRegression(C = 1e8, solver = 'lbfgs')
lda = LDA()
qda = QDA()
abc = AdaBoostClassifier(logreg, n_estimators = 7, random_state = 0)

list_of_classifiers = [logreg, lda, qda, abc]

In [8]:
for clf in list_of_classifiers:
    print('____________________________________________________________________________')
    print('{0}'.format(type(clf).__name__))
    clf.fit(X_train, Y_train)
    Y_pred = clf.predict(X_test)
    errors = Y_pred != Y_test
    print("Nb errors=%i, error rate=%.2f" % (errors.sum(), errors.sum() / len(Y_pred)))
    try:
        coef = clf.coef_
        print('\n \n Coefficients:\n')
        print(coef) 
    except:
        print('')
    print('\n \n Classification Report: \n')
    print(classification_report(Y_test, Y_pred))    


    

____________________________________________________________________________
LogisticRegression
Nb errors=8, error rate=0.13

 
 Coefficients:

[[-0.01849841 -1.98057067  2.6957475  -1.89677374 -2.2621078  -0.11645788
   1.01481764  2.61340662 -0.92646819 -2.56775292  1.48503083 -2.82889993
  -2.34471   ]]

 
 Classification Report: 

              precision    recall  f1-score   support

         0.0       0.96      0.78      0.86        32
         1.0       0.80      0.97      0.88        29

    accuracy                           0.87        61
   macro avg       0.88      0.87      0.87        61
weighted avg       0.88      0.87      0.87        61

____________________________________________________________________________
LinearDiscriminantAnalysis
Nb errors=8, error rate=0.13

 
 Coefficients:

[[-0.02739152 -1.77374361  2.84080828 -1.33790884 -1.8385485  -0.01777089
   0.90881403  2.85673321 -1.07158584 -2.3403564   1.66017512 -3.15938157
  -2.53736927]]

 
 Classification R