### **Code**

In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

# Load datasets
ptbdb_normal = pd.read_csv('ptbdb_normal.csv', header=None)
ptbdb_abnormal = pd.read_csv('ptbdb_abnormal.csv', header=None)
mitbih_train = pd.read_csv('mitbih_train.csv', header=None)
mitbih_test = pd.read_csv('mitbih_test.csv', header=None)

# Concatenate and preprocess data
ptbdb = pd.concat([ptbdb_normal, ptbdb_abnormal])
mitbih = pd.concat([mitbih_train, mitbih_test])

# Combine datasets
X = pd.concat([ptbdb.iloc[:, :-1], mitbih.iloc[:, :-1]]).values
y = pd.concat([ptbdb.iloc[:, -1], mitbih.iloc[:, -1]]).values

# Check for NaN values and handle them
if np.isnan(X).any():
    X = np.nan_to_num(X)

if np.isnan(y).any():
    y = np.nan_to_num(y)

# Correct labels that are not 0 or 1
y[y != 0] = 1

# Normalize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Reshape data for CNN
X = X.reshape(X.shape[0], X.shape[1], 1)

# Train-test split with shuffling
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)

# Ensure labels are integers
y_train = y_train.astype(int)
y_test = y_test.astype(int)

# Convert labels to categorical
num_classes = 2
y_train = to_categorical(y_train, num_classes=num_classes)
y_test = to_categorical(y_test, num_classes=num_classes)

# CNN Model
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=5, activation='relu', input_shape=(X_train.shape[1], 1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv1D(filters=64, kernel_size=5, activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

# Compile model with a lower learning rate
model.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=30, batch_size=64, validation_data=(X_test, y_test), verbose=1)

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy*100:.2f}%")


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Test Accuracy: 98.24%


### Code Explanation

The above code is developed for binary classification, aimed at distinguishing between healthy (normal) and anomalous (abnormal) heartbeats using ECG signal data. This task is crucial in medical diagnostics, particularly in the detection and monitoring of heart conditions.

The process begins with loading four separate datasets. Two of these datasets, labeled as 'ptbdb_normal' and 'ptbdb_abnormal', contain ECG signals representing normal and abnormal heartbeats, respectively. The other two datasets, 'mitbih_train' and 'mitbih_test', likely consist of a mixed collection of both normal and abnormal heartbeats.

After loading the datasets, the code combines them into a single dataset, creating a more comprehensive collection for analysis. It then separates the ECG signals (features) and their corresponding labels (normal or abnormal) into 'X' and 'y' variables.

An essential step in the preparation of the data involves handling missing or invalid values, known as NaNs (Not a Number), and ensuring all labels are correctly formatted as 0 (normal) or 1 (abnormal) for binary classification. This preparation is crucial for the accuracy and reliability of the classification model.

The code then standardizes the ECG signals, a process that normalizes the data to ensure uniformity and comparability. This step is vital for effective model training and prediction accuracy.

The next phase involves reshaping the standardized data to fit a Convolutional Neural Network (CNN) model. CNNs are highly effective in pattern recognition tasks, making them suitable for analyzing ECG signals. The data is divided into training and testing sets, allowing the model to learn from one set and validate its learning on the other.

The model architecture includes Convolutional, MaxPooling, Flatten, Dropout, and Dense layers, structured to extract significant features from the ECG signals and classify them accurately. The model is compiled with an Adam optimizer and a lower learning rate to enhance training stability.

After training the model with the specified number of epochs and batch size, its performance is evaluated using the testing data set. The accuracy metric provides insight into how well the model differentiates between normal and abnormal heartbeats.

In conclusion, this code represents a comprehensive approach to using machine learning, particularly CNNs, for the critical task of identifying heart conditions through ECG signal analysis.