# **DATASET INFORMATION**

* age: age in years.
* sex: sex (1=male; 0=female).
* cp: chest pain type (0 = typical angina; 1 = atypical angina; 2 = non-anginal pain; 3: asymptomatic).
* trestbps: resting blood pressure in mm Hg on admission to the hospital.
* chol: serum cholesterol in mg/dl.
* fbs: fasting blood sugar > 120 mg/dl (1=true; 0=false).
* restecg: resting electrocardiographic results ( 0=normal; 1=having ST-T wave abnormality; 2=probable or  definite left ventricular hypertrophy).
* thalach: maximum heart rate achieved.
* exang: exercise-induced angina (1=yes; 0=no).
* oldpeak: ST depression induced by exercise relative to rest.
* slope: the slope of the peak exercise ST segment (0=upsloping; 1=flat; 2=downsloping).
* ca: number of major vessels (0–3) colored by fluorosopy.
* thal: thalassemia (3=normal; 6=fixed defect; 7=reversable defect).
* target: heart disease (1=no, 2=yes).

In [None]:
!pip install lazypredict

In [None]:
pip install --upgrade pandas

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import xgboost as xgb
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
%matplotlib inline

In [None]:
data = pd.read_csv('../input/heart-attack-analysis-prediction-dataset/heart.csv')

In [None]:
data.head()

In [None]:
data.describe().T

In [None]:
data.info()

In [None]:
print('Then number of rows are {}'.format(data.shape[0]) ,'and the number of columns are {}'.format(data.shape[1]))

In [None]:
#finding out the NaN values in each column
null_features = [feature for feature in data.columns if data[feature].isnull().sum()>1]
print('There are {} null features'.format(null_features))

**As you can see there are no NaN features in this dataset.**

In [None]:
#checking for duplicate rows
data.duplicated().sum()

In [None]:
data.drop_duplicates(inplace=True)

In [None]:
s_0 = data.sex.value_counts()[0]  #sex 0 count
s_1 = data.sex.value_counts()[1]  #sex 1 count   
print('Sex 0 : ',s_0, 'Sex 1 : ',s_1)

In [None]:
#plotting Sex 0 and Sex 1
sns.countplot(data=data, x="sex", palette="flare")
plt.title("Sex")
plt.show()

In [None]:
#plotting chest pain type
sns.countplot(data=data, x='cp', palette='flare')
plt.title("CP")
plt.show()

In [None]:
sns.countplot(data=data, x='fbs', palette='flare')
plt.title("Fasting Blood Sugar")
plt.show()

In [None]:
sns.countplot(data=data, x='restecg', palette='flare')
plt.title("ECG results")
plt.show()

In [None]:
sns.countplot(data=data, x='exng', palette='flare')
plt.title("Exercise Induced Angina")
plt.show()

In [None]:
sns.countplot(data=data, x='thall', palette='flare')
plt.title("Thall")
plt.show()

In [None]:
plt.figure(figsize = (12,12))
sns.swarmplot(x=data['caa'],y=data['age'],hue=data['output'], palette='flare')
plt.show()

Category 0 - Most prone to Heart Attack
<br>
Category 1,2,3 - similar at risk
<br>
Category 4 - Very less people but majority are prone to heart attack

**OUTLIERS**

In [None]:
sns.boxplot(data=data, x='trtbps', palette='flare')
plt.title("Resting Blood Sugar")
plt.show()

In [None]:
sns.boxplot(data=data, x='chol', palette='magma')
plt.title("Cholestrol")
plt.show()

In [None]:
sns.boxplot(data=data, x='thalachh', palette='magma')
plt.title("Max Heart Rate")
plt.show()

**DENSITY DISTRIBUTION**

In [None]:
sns.displot(data=data, x='trtbps',kde=True,color='blue')
plt.title("Resting Blood Pressure")
plt.show()

As you can infer from this graph, maximum people have normal blood pressure while a few have abnormal blood pressures(too low or too high).

In [None]:
sns.displot(data=data, x='chol',kde=True,color='green')
plt.title("Cholestrol")
plt.show()

In [None]:
sns.displot(data=data, x='thalachh',kde=True,color='red')
plt.title("Max Heart Rate")
plt.show()

In [None]:
plt.figure(figsize=(12,10))
sns.distplot(data[data['output'] == 0]['age'], color='green',kde=True,) 
sns.distplot(data[data['output'] == 1]['age'], color='red',kde=True)
plt.title('Attack vs Age')
plt.show()

In [None]:
plt.figure(figsize=(12,10))
sns.distplot(data[data['output'] == 0]['chol'], color='green',kde=True,) 
sns.distplot(data[data['output'] == 1]['chol'], color='red',kde=True)
plt.title('Attack vs Cholestoral')
plt.show()

In [None]:
plt.figure(figsize=(12,10))
sns.distplot(data[data['output'] == 0]['trtbps'], color='green',kde=True,) 
sns.distplot(data[data['output'] == 1]['trtbps'], color='red',kde=True)
plt.title('Attack vs Resting Blood Pressure')
plt.show()

In [None]:
plt.figure(figsize=(10,10))
sns.distplot(data[data['output'] == 0]['thalachh'], color='green',kde=True,) 
sns.distplot(data[data['output'] == 1]['thalachh'], color='red',kde=True)
plt.title('Attack vs Max heart Rate')
plt.show()

In [None]:
plt.figure(figsize=(20,20))
sns.set(style = "darkgrid")
sns.pairplot(data, palette="icefire", diag_kind="kde")
plt.show()

In [None]:
fig=plt.figure(figsize=(12,10))
ax = fig.add_subplot(111, projection = '3d')
x = data['trtbps']
y = data['chol']
z = data['thalachh']

ax.set_xlabel("Resting Blood Pressure")
ax.set_ylabel("Cholestrol")
ax.set_zlabel("Max Heart Rate ")

ax.scatter(x,y,z)
plt.show()

**DATA PREPROCESSING**

In [None]:
X = data.iloc[:,:-1].values
y = data.iloc[:,-1].values

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state= 0)

In [None]:
print('Shape:',X_train.shape)
print('Shape:',X_test.shape)
print('Shape:',y_train.shape)
print('Shape:',y_test.shape)

**Feature Scaling**

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**Training using LazyClassifier**

In [None]:
clf = LazyClassifier(predictions=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

In [None]:
models

**FROM THIS WE CAN SEE THAT SVC HAS THE HIGHEST ACCURACY**

In [None]:
model = SVC()
model.fit(X_train, y_train)
  
predicted_svm = model.predict(X_test)
print("The accuracy of SVM is : ", accuracy_score(y_test, predicted_svm)*100, "%")

In [None]:
print(classification_report(y_test, predicted_svm))

In [None]:
param_grid = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)
grid.fit(X_train, y_train)

predicted = grid.predict(X_test)
print("\nThe accuracy of GridSearch is : ", accuracy_score(y_test, predicted)*100, "%")

In [None]:
print(grid.best_params_)

In [None]:
print(grid.best_estimator_)

In [None]:
grid_predictions = grid.predict(X_test)
  
# print classification report
print(classification_report(y_test, grid_predictions))


**FROM THIS WE CAN SEE THAT EVEN AFTER TUNING HYPERPARAMETERS USING GRIDSEARCHCV, THE RESULTS ARE IMPROVED BUT ONLY GOT WORSE.**

In [None]:
print("The accuracy of final model is : ", accuracy_score(y_test, predicted_svm)*100, "%")

**CONCLUSION**

Best model is SVM - 93.44% accuracy