## Heart Attack Analysis

The database named "Heart Disease Database" was created by collecting relevant data from the four following locations in July 1988:

 1. Cleveland Clinic Foundation (cleveland.data)
 2. Hungarian Institute of Cardiology, Budapest (hungarian.data)
 3. V.A. Medical Center, Long Beach, CA (long-beach-va.data)
 4. University Hospital, Zurich, Switzerland (switzerland.data)
 
 While the databases have 76 raw attributes, only 14 of them are actually used. 

Attribute Information: -- Only 14 used -- 1. #3 (age) 
-- 2. #4 (sex) 
-- 3. #9 (cp) cp: chest pain type -- Value 1: typical angina -- Value 2: atypical angina -- Value 3: non-anginal pain -- Value 4: asymptomatic
-- 4. #10 (trestbps) -- resting blood pressure (in mm Hg on admission to the hospital)
-- 5. #12 (chol) -- serum cholestoral in mg/dl 
-- 6. #16 (fbs) -- (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
-- 7. #19 (restecg) -- resting electrocardiographic results -- Value 0: normal -- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
-- 8. #32 (thalach) --  maximum heart rate achieved
-- 9. #38 (exang) -- exercise induced angina
-- 10. #40 (oldpeak) -- ST depression induced by exercise relative to rest
-- 11. #41 (slope) -- the slope of the peak exercise ST segment -- Value 1: upsloping -- Value 2: flat -- Value 3: downsloping
-- 12. #44 (ca) -- number of major vessels (0-3) colored by flourosopy
-- 13. #51 (thal) -- 3 = normal; 6 = fixed defect; 7 = reversable defect
-- 14. #58 (num) (the predicted attribute) -- diagnosis of heart disease (angiographic disease status) -- Value 0: < 50% diameter narrowing -- Value 1: > 50% diameter narrowing (in any major vessel: attributes

In [None]:
# import necessary libraries

import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

dataset = pd.read_csv("E:/abhi/Desktop/Intel-HPC/ml_algos/Databases/heart-attack-prediction.csv")


In [None]:
#Data Preprocessing: converting all string values to nan, remove NaN and Normalize

dataset = dataset.convert_objects(convert_numeric=True)

x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 13].values


from sklearn.preprocessing import Imputer
#from sklearn.impute import SimpleImputer
imputer = Imputer(missing_values = "NaN", strategy ="mean", axis = 0)
imputer = imputer.fit(x[:,0:13])   
x[:, 0:13] = imputer.transform(x[:, 0:13])
   

from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
x = sc_X.fit_transform(x)

In [None]:
# Train/Test Split
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x , y , test_size=0.2)


## Modelling

In [None]:
# Logistic Regression Modelling and calculate score

from sklearn.linear_model import LogisticRegression
lm = LogisticRegression(solver="liblinear",random_state = 0)
lm.fit(x_train,y_train)
lm.score(x_test,y_test)

In [None]:
# Calculate Accuracty score, confusion matric, precision, recall & f1 score

predictions = lm.predict(x_test)
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

In [None]:
# Decision Tree Modelling and calculate score

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier().fit(x_train, y_train)
clf.fit(x_train, y_train)
clf.score(x_test, y_test)

In [None]:
# Calculate Accuracty score, confusion matric, precision, recall & f1 score

predictions = clf.predict(x_test)
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

In [None]:
# KNN Modelling and calculate score

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
knn.score(x_test, y_test)

In [None]:
# Calculate Accuracty score, confusion matric, precision, recall & f1 score

predictions = knn.predict(x_test)
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

In [None]:
# SVM Modelling and calculate score

from sklearn.svm import SVC
svm = SVC()
svm.fit(x_train, y_train)
svm.score(x_test, y_test)

In [None]:
# Calculate Accuracty score, confusion matric, precision, recall & f1 score

predictions = svm.predict(x_test)
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

In [None]:
# Gausian Naive-Bayes Modelling and calculate score

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(x_train, y_train)
gnb.score(x_test, y_test)

In [None]:
# Calculate Accuracty score, confusion matric, precision, recall & f1 score

predictions = gnb.predict(x_test)
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))