# **Identifying the Most Suitable Model for Heart Attack Prediction**

Early detection for heart disease may be a life saving endeavour in the field of healthcare. This data project has been created for the purpose of developing preventive models to identify possible cases of cardiovascular disease, with the use of data and machine learning.To ensure the utmost accuracy and reliability, we explored the capabilities of three distinct machine learning models: 
* multinomial logistic regression
* support vector machines
* neural network(keras)

The main aim of the project was to determine which of these three models was most appropriate to solve the problem of predicting heart disease. We have carefully tested each model's ability to predict accuracy,  precision, recall, and F1-score using a rigorous training, test, and evaluation process.

**About The Dataset :**

age: Age of the patient

sex: Sex of the patient

cp: Chest pain type, 0 = Typical Angina, 1 = Atypical Angina, 2 = Non-anginal Pain, 3 = Asymptomatic

trtbps: Resting blood pressure (in mm Hg)

chol: Cholestoral in mg/dl fetched via BMI sensor

fbs: (fasting blood sugar > 120 mg/dl), 1 = True, 0 = False

restecg: Resting electrocardiographic results, 0 = Normal, 1 = ST-T wave normality, 2 = Left ventricular hypertrophy

thalachh: Maximum heart rate achieved

oldpeak: Previous peak

slp: Slope

caa: Number of major vessels

thall: Thalium Stress Test result ~ (0,3)

exng: Exercise induced angina ~ 1 = Yes, 0 = No

output: Target variable

# **Libraries**

In [None]:
import seaborn as sns
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import accuracy_score
from keras.utils.np_utils import to_categorical

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.layers import Dropout
from keras import regularizers
from sklearn.metrics import classification_report

# Read and Analyse Data

In [None]:
heart=pd.read_csv('/kaggle/input/heart-attack-analysis-prediction-dataset/heart.csv')
heart.head()

In [None]:
print('Wiersze ',heart.shape[0], 'Kolumny ',heart.shape[1])

# Check the data type and null value

In [None]:
heart.info()

Check of repeated data(removal of duplicates)

In [None]:
heart[heart.duplicated()]
heart.drop_duplicates(keep='first',inplace=True)

In [None]:
heart.describe()

In [None]:
heart.corr()

In [None]:
plt.figure(figsize=(20,20))
sns.pairplot(heart)
plt.show()

# Data preparation

In [None]:
x = heart.iloc[:, 1:-1].values
y = heart.iloc[:, -1].values

# Split train and test data

In [None]:
 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state= 0)

# Scaling

In [None]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# 1.Multinomial logistic regression

In [None]:
model = LogisticRegression()
model.fit(x_train, y_train)
predicted=model.predict(x_test)

In [None]:
precision, recall, fscore, support = precision_recall_fscore_support(y_test, predicted, average="weighted")
accuracy = accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F-score: {fscore}")

# 2.Support vector machines(SVM)

In [None]:
model = SVC()
model.fit(x_train, y_train)
  
predicted = model.predict(x_test)

In [None]:
precision, recall, fscore, support = precision_recall_fscore_support(y_test, predicted, average="weighted")
accuracy = accuracy_score(y_test, predicted)
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F-score: {fscore}")

# 3.Neural network(keras)

In [None]:
X = np.array(heart.drop(['output'],axis=1))
y = np.array(heart['output'])

In [None]:
mean = X.mean(axis=0)
X -= mean
std = X.std(axis=0)
X /= std

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, stratify=y, random_state=42, test_size = 0.2)


In [None]:
Y_train = to_categorical(y_train, num_classes=None)
Y_test = to_categorical(y_test, num_classes=None)
print (Y_train.shape)
print (Y_train[:10])

In [None]:
#neural network(keras)
def create_model():
    # tworzenie modelu
    model = Sequential()
    model.add(Dense(16, input_dim=13, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(8, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(2, activation='softmax'))
    
    # kompilowanie modelu
    adam = Adam(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
    return model

model = create_model()

print(model.summary())

# Matching model to train data

In [None]:
history=model.fit(X_train, Y_train, validation_data=(X_test, Y_test),epochs=250, batch_size=10)

In [None]:
categorical_pred = np.argmax(model.predict(X_test), axis=1)

print('Results for Categorical Model')
print(accuracy_score(y_test, categorical_pred))
print(classification_report(y_test, categorical_pred))

# Evaluation

For evaluation were used accuracy, precision, recall and F1-score metric.The results of the evaluation are shown in the table below:

|Model                     |    Accuracy |  Precision |  Recall |  F1-score|
|--------------------------|-------------|------------|---------|----------|
|Multinomial logistic regression 	   |        0.88 |       0.88 |   0.88  |   0.88   |
|Support vector machines  |      0.92	 |      0.91  |    0.92 |    0.91  |
|Neural network(Keras)	   |        0.73 |     0.73   |    0.68	|    0.70  |


# Conclusions
The best results in terms of F1-score were obtained using a support vector machine. The neural network model performed worse than the logistic regression model.Such a result is due to the fact that the network aims to differentiate between different degrees of severity of heart disease(1-4), which is quite a difficult task.