# Predicting Heart Disease by using Neural Networks.

**About Dataset**

The dataset contains the following features:

- age(in years)
- sex: (1 = male; 0 = female)
- cp: chest pain type
- trestbps: resting blood pressure (in mm Hg on admission to the hospital)
- chol: serum cholestoral in mg/dl
- fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
- restecg: resting electrocardiographic results
- thalach: maximum heart rate achieved
- exang: exercise induced angina (1 = yes; 0 = no)
- oldpeak: ST depression induced by exercise relative to rest
- slope: the slope of the peak exercise ST segment
- ca: number of major vessels (0-3) colored by flourosopy
- thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
- target: 1 or 0

#### To download the dataset<a href="https://drive.google.com/file/d/1R5SjStkUsgTgyoAjC_14v13siYh8AAF3/view?usp=sharing" title="Google Drive"> Click here </a>

## Step 1: Import necessary libraries and load the dataset

In [131]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [133]:
# Load the dataset
df = pd.read_csv('heart.csv')

## Step 2: Perform statistical analysis of the data

In [134]:
# Print the first few rows of the dataframe
print(df.head())

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   3       145   233    1        0      150      0      2.3      0   
1   37    1   2       130   250    0        1      187      0      3.5      0   
2   41    0   1       130   204    0        0      172      0      1.4      2   
3   56    1   1       120   236    0        1      178      0      0.8      2   
4   57    0   0       120   354    0        1      163      1      0.6      2   

   ca  thal  target  
0   0     1       1  
1   0     2       1  
2   0     2       1  
3   0     2       1  
4   0     2       1  


In [139]:
# Get the summary statistics of the dataframe
print(df.describe())


              age         sex          cp    trestbps        chol         fbs  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  303.000000   
mean    54.366337    0.683168    0.966997  131.623762  246.264026    0.148515   
std      9.082101    0.466011    1.032052   17.538143   51.830751    0.356198   
min     29.000000    0.000000    0.000000   94.000000  126.000000    0.000000   
25%     47.500000    0.000000    0.000000  120.000000  211.000000    0.000000   
50%     55.000000    1.000000    1.000000  130.000000  240.000000    0.000000   
75%     61.000000    1.000000    2.000000  140.000000  274.500000    0.000000   
max     77.000000    1.000000    3.000000  200.000000  564.000000    1.000000   

          restecg     thalach       exang     oldpeak       slope          ca  \
count  303.000000  303.000000  303.000000  303.000000  303.000000  303.000000   
mean     0.528053  149.646865    0.326733    1.039604    1.399340    0.729373   
std      0.525860   22.9051

In [141]:
# Check for missing values
print(df.isnull().sum())


age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64


## Step 3: Create training and testing datasets

In [150]:
# Split the dataframe into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

In [152]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [154]:
# Scale the data using StandardScaler

# Create StandardScaler object
scaler = StandardScaler()

In [156]:
# Fit and transform the training data
X_train = scaler.fit_transform(X_train)

In [158]:
# Fit testing data
X_test = scaler.fit_transform(X_test)

## Step 4. Build and train the neural network


In [160]:
# Create a Sequential neural network model
model = Sequential()

In [162]:
# Add input layer with 13 neurons
model.add(Dense(13, activation = 'relu'))

In [164]:
# Add hidden layer with 64 neurons

model.add(Dense(64, activation = 'relu'))

In [166]:
# Add output layer with one neuron

model.add(Dense(1, activation='sigmoid'))

In [168]:
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics= ['accuracy'])

In [169]:
# Train the model
model.fit(X_train, y_train, epochs= 10, batch_size=32, validation_data=(X_test,y_test))

Epoch 1/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 70ms/step - accuracy: 0.6666 - loss: 0.6030 - val_accuracy: 0.7049 - val_loss: 0.5806
Epoch 2/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.7021 - loss: 0.5720 - val_accuracy: 0.7705 - val_loss: 0.5476
Epoch 3/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.7848 - loss: 0.5214 - val_accuracy: 0.7869 - val_loss: 0.5176
Epoch 4/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.8047 - loss: 0.4947 - val_accuracy: 0.8033 - val_loss: 0.4904
Epoch 5/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.7977 - loss: 0.4785 - val_accuracy: 0.8197 - val_loss: 0.4664
Epoch 6/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.7964 - loss: 0.4387 - val_accuracy: 0.8361 - val_loss: 0.4447
Epoch 7/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x25848465f10>

## Step 5: Evaluate the model


In [170]:

# Evaluate the model on testing data
loss,accuracy=model.evaluate(X_test, y_test)
print(f'Test accuracy: {accuracy: 2f}')

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 0.8178 - loss: 0.4192
Test accuracy:  0.836066


In [172]:
# Make predictions on the testing data
y_pred = model.predict(X_test)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 60ms/step


In [176]:
# Convert predictions to binary labels
y_pred_binary = (y_pred > 0.5).astype('int32')


In [180]:
# Print the classification report and confusion matrix
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test, y_pred_binary))
print(confusion_matrix(y_test, y_pred_binary))

              precision    recall  f1-score   support

           0       0.88      0.76      0.81        29
           1       0.81      0.91      0.85        32

    accuracy                           0.84        61
   macro avg       0.84      0.83      0.83        61
weighted avg       0.84      0.84      0.83        61

[[22  7]
 [ 3 29]]


## Conclusion
In this project, we developed a neural network model to predict heart disease using a dataset of clinical features. Our model achieved a test accuracy of 84%, indicating that it is able to correctly predict the presence or absence of heart disease in approximately 84% of the test cases.

The results of this project demonstrate the potential of machine learning algorithms to support diagnosis and treatment decisions in clinical settings. However, further research is needed to improve the accuracy and robustness of the model, as well as to explore its applications in real-world clinical practice

## IMPROVING RESULTS

In [184]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

In [186]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [188]:
# Scale the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [190]:
# Define the models
models = [
    ('logistic_regression', LogisticRegression()),
    ('decision_tree', DecisionTreeClassifier()),
    ('random_forest', RandomForestClassifier()),
    ('svm', SVC())]

In [192]:
# Define the hyperparameter grid
param_grid = {
    'logistic_regression': {'C': [0.1, 1, 10]},
    'decision_tree': {'max_depth': [3, 5, 10]},
    'random_forest': {'n_estimators': [10, 50, 100]},
    'svm': {'C': [0.1, 1, 10]}}

In [194]:
# Perform grid search
for name, model in models:
    grid_search = GridSearchCV(model, param_grid[name], cv=5, scoring='accuracy')
    grid_search.fit(X_train, y_train)
    print(f'Best parameters for {name}: {grid_search.best_params_}')
    print(f'Best accuracy for {name}: {grid_search.best_score_}')


Best parameters for logistic_regression: {'C': 1}
Best accuracy for logistic_regression: 0.8180272108843537
Best parameters for decision_tree: {'max_depth': 3}
Best accuracy for decision_tree: 0.7685374149659864
Best parameters for random_forest: {'n_estimators': 50}
Best accuracy for random_forest: 0.7975340136054422
Best parameters for svm: {'C': 1}
Best accuracy for svm: 0.8224489795918368


In [196]:
# Evaluate the models
for name, model in models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f'Accuracy for {name}: {accuracy_score(y_test, y_pred)}')
    print(f'Classification report for {name}:')
    print(classification_report(y_test, y_pred))
    print(f'Confusion matrix for {name}:')
    print(confusion_matrix(y_test, y_pred))

Accuracy for logistic_regression: 0.8524590163934426
Classification report for logistic_regression:
              precision    recall  f1-score   support

           0       0.83      0.86      0.85        29
           1       0.87      0.84      0.86        32

    accuracy                           0.85        61
   macro avg       0.85      0.85      0.85        61
weighted avg       0.85      0.85      0.85        61

Confusion matrix for logistic_regression:
[[25  4]
 [ 5 27]]
Accuracy for decision_tree: 0.819672131147541
Classification report for decision_tree:
              precision    recall  f1-score   support

           0       0.78      0.86      0.82        29
           1       0.86      0.78      0.82        32

    accuracy                           0.82        61
   macro avg       0.82      0.82      0.82        61
weighted avg       0.82      0.82      0.82        61

Confusion matrix for decision_tree:
[[25  4]
 [ 7 25]]
Accuracy for random_forest: 0.8688524590163

## Improving the Results

By using Logistic Regression, Decision Tree classifier and Random forest classifier, we got the accuracy to be 85.2%, 81.9% and 86.8% respectively.
While neural networks are powerful models that can learn complex patterns in data, they may not always perform better than logistic regression and decision tree classifier and random forest classifier. This depends aon the quality of the dataset and the methods which are followed.