<h1 style="text-align:center;"><b>HEART ATTACK PREDICTION</b></h1>

<p style="text-align:center"><img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRv3YD8qExb5ZmNZ7BFiCRazlXYZ3i9l5GDmA&usqp=CAU" alt="Heart-Attack" width="400" height="400" class="center"></p>

<h3>About the dataset:</h3>

<ul>
  <li>Age : Age of the patient</li>
  <li>Sex : Sex of the patient</li>
  <li>exang: exercise induced angina (1 = yes; 0 = no)</li>
  <li>ca: number of major vessels (0-3)</li>
  <li>cp : Chest Pain type chest pain type</li>
    <ul style="list-style-type:circle;">
      <li>Value 1: typical angina</li>
      <li>Value 2: atypical angina</li>
      <li>Value 3: non-anginal pain</li>
      <li>Value 4: asymptomatic</li>
    </ul>
  <li>trtbps : resting blood pressure (in mm Hg)</li>
  <li>chol : cholestoral in mg/dl fetched via BMI sensor</li>
  <li>fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)</li>
  <li>rest_ecg : resting electrocardiographic results</li>
    <ul style="list-style-type:circle;">
      <li>Value 0: normal</li>
      <li>Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)</li>
      <li>Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria</li>
    </ul>
  <li>thalach : maximum heart rate achieved</li>
  <li>target : 0= less chance of heart attack 1= more chance of heart attack</li>
</ul>

## Importing Libraries

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Importing Dataset

In [3]:
heart_ds = pd.read_csv('/kaggle/input/heart-attack-analysis-prediction-dataset/heart.csv')

## Understanding and Analysing Dataset

In [4]:
heart_ds.head()  # first 5 rows of dataframe 

In [6]:
print("Shape of the dataset",heart_ds.shape)

##### There are 303 rows and 14 columns

#### Checking for null values

In [7]:
heart_ds.isnull().sum()

##### There are no null values

#### Checking for duplicate rows

In [8]:
heart_ds[heart_ds.duplicated()]

##### There is one duplicate row

#### Droping duplicate rows

In [9]:
heart_ds.drop_duplicates(keep="first",inplace=True)  # keeping the first occurence and dropping remaining

In [10]:
heart_ds.shape  #new shape

In [11]:
heart_ds.describe()   # summary statistics

## Data Preprocessing

In [15]:
x = heart_ds.iloc[:,:-1].values 
y = heart_ds.iloc[:,-1].values

In [16]:
print(x)

In [17]:
print(y)

## Categorical Encoding

##### No need of categorical Encoding as there is no categorical data

## Splitting into Training and Testing dataset

In [18]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)

## Feature Scaling

Feature Scaling should be done only after splitting the dataset into training and testing dataset.
Reason: Test data is like a new data ever seen by the model.To avoid the leak of information,feature scaling should be done after splitting dataset.

In [19]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.transform(x_test)

In [20]:
print(x_train)   # after feature scaling

## 1) Logistic Regression

In [23]:
from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression()
lr_model.fit(x_train,y_train)        # training logistic regression model

In [30]:
y_pred1 = lr_model.predict(x_test)   # predicting test data

In [33]:
from sklearn.metrics import confusion_matrix,accuracy_score
lr_cm = confusion_matrix(y_test,y_pred1)
lr_accuracy = accuracy_score(y_test,y_pred1)
print("Confusion Matrix : ")
print()
print(lr_cm)
print()
print()
print("Accuracy Score : ",lr_accuracy*100)

## 2) K-Nearest Neighbors

In [29]:
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(x_train,y_train)

In [32]:
y_pred2 = knn_model.predict(x_test)

In [34]:
knn_cm = confusion_matrix(y_test,y_pred2)
knn_accuracy = accuracy_score(y_test,y_pred2)
print("Confusion Matrix : ")
print()
print(knn_cm)
print()
print()
print("Accuracy Score : ",knn_accuracy*100)

## 3) Support Vector Machine

In [35]:
from sklearn.svm import SVC
svc_model = SVC()
svc_model.fit(x_train,y_train)

In [36]:
y_pred3 = svc_model.predict(x_test)

In [37]:
svc_cm = confusion_matrix(y_test,y_pred3)
svc_accuracy = accuracy_score(y_test,y_pred3)
print("Confusion Matrix : ")
print()
print(svc_cm)
print()
print()
print("Accuracy Score : ",svc_accuracy*100)

## 4) Decision Tree

In [45]:
from sklearn.tree import DecisionTreeClassifier
dtc_model = DecisionTreeClassifier(criterion = 'entropy',random_state=0)
dtc_model.fit(x_train,y_train)

In [46]:
y_pred4 = dtc_model.predict(x_test)

In [47]:
dtc_cm = confusion_matrix(y_test,y_pred4)
dtc_accuracy = accuracy_score(y_test,y_pred4)
print("Confusion Matrix : ")
print()
print(dtc_cm)
print()
print()
print("Accuracy Score : ",dtc_accuracy*100)

## 5) Random Forest

In [48]:
from sklearn.ensemble import RandomForestClassifier
rfc_model = RandomForestClassifier(n_estimators=100,random_state=0)
rfc_model.fit(x_train,y_train)

In [49]:
y_pred5 = rfc_model.predict(x_test)

In [50]:
rfc_cm = confusion_matrix(y_test,y_pred5)
rfc_accuracy = accuracy_score(y_test,y_pred5)
print("Confusion Matrix : ")
print()
print(rfc_cm)
print()
print()
print("Accuracy Score : ",rfc_accuracy*100)

## 6) Naive Bayes

In [51]:
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(x_train,y_train)

In [52]:
y_pred6 = nb_model.predict(x_test)

In [53]:
nb_cm = confusion_matrix(y_test,y_pred6)
nb_accuracy = accuracy_score(y_test,y_pred6)
print("Confusion Matrix : ")
print()
print(nb_cm)
print()
print()
print("Accuracy Score : ",nb_accuracy*100)

## Model Comparison

In [56]:
models = pd.DataFrame({
    'Model' : ['Logistic Regression','K-Nearest Neighbors', 'Support Vector Machine', 'Decision Tree','Random Forest','Naive Bayes'],
    'Score' : [lr_accuracy*100,knn_accuracy*100,svc_accuracy*100,dtc_accuracy*100,rfc_accuracy*100,nb_accuracy*100]
})


models.sort_values(by = 'Score', ascending = False)

## Support Vector Machine gives the highest accuracy of 93.44%