## 🧪 Data Preparation and Model Setup

In this section, we import the necessary libraries for:
- Data manipulation
- Preprocessing and dimensionality reduction
- Handling imbalanced datasets
- Building and training a PyTorch model


In [None]:
# 📦 Importing libraries
import pandas as pd
import numpy as np

# ⚙️ Scikit-learn utilities
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.decomposition import PCA
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, classification_report

# ⚖️ Handling class imbalance
from imblearn.over_sampling import SMOTE

# 🔥 PyTorch modules
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader


## 🔧 Load and Preprocess Data

In this step, we:
- Load the dataset from a CSV file
- Separate the features `X` and target variable `y`
- Encode the target labels if they are categorical


In [None]:
# 🔧 Load and preprocess data
df = pd.read_csv("Sleep Train 5000.csv")

# 📊 Features and target
X = df.drop(columns=[df.columns[0]])  # Drop first column (assumed to be the target)
y = df[df.columns[0]]                 # Target variable

# 🔠 Encode labels if they are categorical
if y.dtype == 'object':
    y = LabelEncoder().fit_transform(y)


## ⚖️ Handle Class Imbalance with SMOTE

To address potential class imbalance in the dataset, we use **SMOTE (Synthetic Minority Over-sampling Technique)**.  
This generates synthetic samples for the minority class to balance the dataset.


In [None]:
# Apply SMOTE for balancing
X_res, y_res = SMOTE().fit_resample(X, y)


## 📏 Feature Scaling

We scale the features using **StandardScaler** to ensure all features contribute equally to the model.  
This transforms the data to have zero mean and unit variance.


In [29]:
# Scale and reduce features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_res)


## 📉 Dimensionality Reduction with PCA

To reduce computational complexity and remove noise, we apply **Principal Component Analysis (PCA)**.  
We keep enough components to preserve **95% of the variance** in the data.


In [30]:
# Reduce dimensionality
pca = PCA(n_components=0.95)  # preserve 95% variance
X_pca = pca.fit_transform(X_scaled)


## 🚦 Split Dataset into Training and Test Sets

We split the dataset into:
- **Training set** (80%)
- **Test set** (20%)

to evaluate model performance on unseen data.


In [31]:
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_pca, y_res, test_size=0.2, random_state=42)


## 🔥 Convert Data to PyTorch Tensors and Prepare DataLoader

- Convert NumPy arrays to PyTorch tensors for model training.
- Create a `TensorDataset` and a `DataLoader` for efficient batching and shuffling during training.


In [32]:
# Torch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.to_numpy(), dtype=torch.long)
y_test_tensor = torch.tensor(y_test.to_numpy(), dtype=torch.long)

train_ds = TensorDataset(X_train_tensor, y_train_tensor)
train_dl = DataLoader(train_ds, batch_size=64, shuffle=True)


## ✅ Improved MLP Model with GELU Activation and Xavier Weight Initialization

- A multi-layer perceptron with three hidden layers.
- Uses **Batch Normalization** and **Dropout** to improve training stability and reduce overfitting.
- Applies **GELU activation**, which often outperforms ReLU.
- Weights are initialized using **Xavier uniform initialization**.


In [33]:
# ✅ Improved MLP with GELU and weight init
class SuperMLP(nn.Module):
    def __init__(self, input_dim, num_classes):
        super(SuperMLP, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.BatchNorm1d(256),
            nn.GELU(),
            nn.Dropout(0.4),
            nn.Linear(256, 128),
            nn.BatchNorm1d(128),
            nn.GELU(),
            nn.Dropout(0.3),
            nn.Linear(128, 64),
            nn.BatchNorm1d(64),
            nn.GELU(),
            nn.Linear(64, num_classes)
        )
        self.init_weights()

    def init_weights(self):
        for m in self.net:
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.zeros_(m.bias)

    def forward(self, x):
        return self.net(x)


## ✅ Setup Device and Initialize Model

- Automatically use **GPU** if available, otherwise default to **CPU**.
- Instantiate the `SuperMLP` model with input dimension and number of output classes.
- Move the model to the selected device.


In [34]:
# ✅ Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SuperMLP(X_train.shape[1], len(np.unique(y))).to(device)


## ⚖️ Handle Class Imbalance with Class Weights and Setup Training Components

- Compute **class weights** to address imbalanced classes during loss calculation.
- Use **CrossEntropyLoss** with class weights.
- Set up **AdamW optimizer** for training.
- Use **Cosine Annealing LR scheduler** for learning rate adjustment.


In [35]:
# Class weights to handle imbalance
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train), y=y_train)
class_weights_tensor = torch.tensor(class_weights, dtype=torch.float32).to(device)

criterion = nn.CrossEntropyLoss(weight=class_weights_tensor)
optimizer = optim.AdamW(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=20)


## 🔄 Training Loop with Early Stopping

- Train the model for up to 130 epochs.
- Track training loss and validation accuracy.
- Use early stopping to stop training if validation accuracy doesn’t improve for 10 consecutive epochs.
- Save the best model state during training.


In [36]:
# ✅ Training loop with early stopping
best_acc = 0
epochs_no_improve = 0
for epoch in range(130):
    model.train()
    running_loss = 0
    for xb, yb in train_dl:
        xb, yb = xb.to(device), yb.to(device)
        optimizer.zero_grad()
        preds = model(xb)
        loss = criterion(preds, yb)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    scheduler.step()

    model.eval()
    with torch.no_grad():
        val_preds = model(X_test_tensor.to(device))
        val_pred_labels = torch.argmax(val_preds, dim=1).cpu().numpy()
        val_acc = accuracy_score(y_test, val_pred_labels)
    
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_dl):.4f}, Val Acc: {val_acc:.4f}")
    
    # Early stopping
    if val_acc > best_acc:
        best_acc = val_acc
        epochs_no_improve = 0
        best_model = model.state_dict()
    else:
        epochs_no_improve += 1
        if epochs_no_improve == 10:
            print(f"⏹️ Early stopping at epoch {epoch+1}")
            break


Epoch 1, Loss: 1.3719, Val Acc: 0.4675
Epoch 2, Loss: 1.2209, Val Acc: 0.5215
Epoch 3, Loss: 1.1516, Val Acc: 0.5530
Epoch 4, Loss: 1.1033, Val Acc: 0.5750
Epoch 5, Loss: 1.0504, Val Acc: 0.6005
Epoch 6, Loss: 1.0083, Val Acc: 0.6185
Epoch 7, Loss: 0.9671, Val Acc: 0.6265
Epoch 8, Loss: 0.9382, Val Acc: 0.6340
Epoch 9, Loss: 0.9082, Val Acc: 0.6570
Epoch 10, Loss: 0.8888, Val Acc: 0.6625
Epoch 11, Loss: 0.8695, Val Acc: 0.6615
Epoch 12, Loss: 0.8475, Val Acc: 0.6710
Epoch 13, Loss: 0.8364, Val Acc: 0.6810
Epoch 14, Loss: 0.8166, Val Acc: 0.6835
Epoch 15, Loss: 0.8024, Val Acc: 0.6945
Epoch 16, Loss: 0.8049, Val Acc: 0.6950
Epoch 17, Loss: 0.7908, Val Acc: 0.6865
Epoch 18, Loss: 0.7978, Val Acc: 0.6850
Epoch 19, Loss: 0.7877, Val Acc: 0.6925
Epoch 20, Loss: 0.7839, Val Acc: 0.6925
Epoch 21, Loss: 0.7952, Val Acc: 0.6920
Epoch 22, Loss: 0.7910, Val Acc: 0.6965
Epoch 23, Loss: 0.7928, Val Acc: 0.6815
Epoch 24, Loss: 0.7921, Val Acc: 0.6915
Epoch 25, Loss: 0.7894, Val Acc: 0.6990
Epoch 26,

## 📊 Final Evaluation

- Load the best model saved during training.
- Evaluate on the test set.
- Calculate and print the **accuracy** and detailed **classification report**.


In [37]:
# ✅ Evaluation
model.load_state_dict(best_model)
model.eval()
with torch.no_grad():
    preds = model(X_test_tensor.to(device))
    pred_labels = torch.argmax(preds, dim=1).cpu().numpy()

acc = accuracy_score(y_test, pred_labels)
print("\n📊 Final MLP Accuracy:", round(acc, 4))
print(classification_report(y_test, pred_labels))



📊 Final MLP Accuracy: 0.824
              precision    recall  f1-score   support

           0       0.93      0.98      0.95       402
           1       0.83      0.80      0.82       406
           2       0.72      0.54      0.62       408
           3       0.83      0.91      0.87       401
           4       0.78      0.91      0.84       383

    accuracy                           0.82      2000
   macro avg       0.82      0.83      0.82      2000
weighted avg       0.82      0.82      0.82      2000

