# Heart Attack Prediction

**Goal:**
Build a heart attack prediction model 

A confusion matrix tutorial

In [4]:
#Importing libraries
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report


In [6]:
#Loading data
df = pd.read_csv(r'C:\Users\Sandiswe Buthelezi\Desktop\Heart_Attack_Demo\heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [9]:
Y = df['output']
X = df.drop('output', axis = 1)
print(X.head())

   age  sex  cp  trtbps  chol  fbs  restecg  thalachh  exng  oldpeak  slp  \
0   63    1   3     145   233    1        0       150     0      2.3    0   
1   37    1   2     130   250    0        1       187     0      3.5    0   
2   41    0   1     130   204    0        0       172     0      1.4    2   
3   56    1   1     120   236    0        1       178     0      0.8    2   
4   57    0   0     120   354    0        1       163     1      0.6    2   

   caa  thall  
0    0      1  
1    0      2  
2    0      2  
3    0      2  
4    0      2  


In [13]:
#split the data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

#Use standard scaler to scale the features for preprocessing
scaler = StandardScaler()
scale = scaler.fit(X_train)
X_train = scale.transform(X_train)
X_test = scale.transform(X_test)

In [14]:
#Creating Logistic Regression Model
model = LogisticRegression()
model.fit(X_train, Y_train)
pred = model.predict(X_test)

In [16]:
#Calculating accuracy score
score = accuracy_score(Y_test, pred)
score

0.8524590163934426

The model achieved an accuracy of approximately 85% on the test data. 

This suggests that a model is able to predict the correct class labels for approximately 85% of the test samples. It be 85% accurate as far as predicting if the person has a high risk of having heart attack.

In addiction, we"ll create a confusion matrix that will provide a detailed understanding of the model's performance beyond just the accuracy score.

In [17]:
#Create a confusion matrix
confusion_matrix(Y_test, pred)

array([[25,  4],
       [ 5, 27]], dtype=int64)

This is a table that summarizes the performance of a classification model by showing the counts of true positive, true negative, false positive, and false negative predictions.

The confusion matrix reveals that out of 29, 24 were correctly predicted as positive (having high chances of having heart attack)and 4 were missclassified.

Full Interpretation:

- **True Positive (TP):** The model correctly predicted 25 cases of heart attacks (positive samples) as positive. These are patients who had a heart attack, and the model correctly identified them as such.

- **True Negative (TN):** The model correctly predicted 27 cases of no heart attacks (negative samples) as negative. These are patients who did not have a heart attack, and the model correctly identified them as healthy individuals.

- **False Positive (FP):** The model incorrectly predicted 4 cases of no heart attacks as positive. These are patients who did not have a heart attack, but the model incorrectly classified them as having a heart attack (false alarms).

- **False Negative (FN):** The model incorrectly predicted 5 cases of heart attacks as negative. These are patients who actually had a heart attack, but the model incorrectly classified them as not having a heart attack (missed diagnoses).

In [18]:
#Extracting TP, FP, FN, TN
tp, fp, fn, tn = confusion_matrix(Y_test, pred).ravel()
(tp, fp, fn, tn)

(25, 4, 5, 27)

In [19]:
#Confusin matrix matrices
matrix = classification_report(Y_test, pred)
print('Classification Reports: \n', matrix)

Classification Reports: 
               precision    recall  f1-score   support

           0       0.83      0.86      0.85        29
           1       0.87      0.84      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61



**The Logistic Regression Model performs better!**

The model shows good performance in correctly identifying patients who had a heart attack and correctly identifying healthy patients without heart attacks.