# Multi-Layer Perceptron (MLP)
This notebook builts an MLP for classification, same way as described in [Cepeda Humerez et al. (2019)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007290)

Hyperparameters to use:

````python
input_size = 200  # Adjust based on dataset
hidden_size = [300, 200] # 300 and 200 LTUs
output_size = 2  # Number of classes
dropout_rate = 0.5 # This wasn't specified in the paper, but choose any
learning_rate = 0.001 # Not specified in the paper
epochs = 100 # Not specified in the paper
batch_size = 16 # Not specified in the paper
````

The model architecture is in ``MLP.py``

Load the MLP model codes from ``src``

In [23]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
import tqdm
from sympy import sqrt
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Import all the functions from the 'src' directory, we import all the functions from each module so we can use them straight away
from ssa_simulation import *
from ssa_analysis import *
from ssa_classification import *
from models.MLP import MLP 
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Example usage to train an MLP

In [25]:
# Example usage
input_size = 200  # Adjust based on dataset
hidden_size = [300, 200]
output_size = 2  # Number of classes
dropout_rate = 0.5
learning_rate = 0.001
epochs = 100
batch_size = 16

# Generate synthetic data
X_train = torch.randn(1000, input_size)
y_train = torch.randint(0, output_size, (1000,))
X_val = torch.randn(200, input_size)
y_val = torch.randint(0, output_size, (200,))

# Convert to DataLoader
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

# Initialize and train model
file_path = "mlp_model.pth"
model = MLP(input_size, hidden_size, output_size, dropout_rate, learning_rate)
model.train_model(train_loader, val_loader, epochs, save_path=file_path)

# Load best model and evaluate
model.load_model(file_path)
test_acc = model.evaluate(val_loader)
print(f"Final Test Accuracy: {test_acc:.4f}")

# Make predictions
X_test = torch.randn(5, input_size)
predictions = model.predict(X_test)
print("Predicted classes:", predictions)


🔄 Using device: cuda (1 GPUs available)
Epoch [1/100], Loss: 120.1081, Train Acc: 0.4880
Validation Acc: 0.4750
✅ Model saved at mlp_model.pth (Best Validation Acc: 0.4750)
Epoch [2/100], Loss: 91.0734, Train Acc: 0.5300
Validation Acc: 0.4900
✅ Model saved at mlp_model.pth (Best Validation Acc: 0.4900)
Epoch [3/100], Loss: 70.9949, Train Acc: 0.5820
Validation Acc: 0.4800
Epoch [4/100], Loss: 61.3360, Train Acc: 0.6030
Validation Acc: 0.4900
Epoch [5/100], Loss: 55.5279, Train Acc: 0.6010
Validation Acc: 0.4700
Epoch [6/100], Loss: 50.5672, Train Acc: 0.6280
Validation Acc: 0.4700
Epoch [7/100], Loss: 43.7857, Train Acc: 0.6680
Validation Acc: 0.4350
Epoch [8/100], Loss: 42.5112, Train Acc: 0.6630
Validation Acc: 0.4550
Epoch [9/100], Loss: 41.4593, Train Acc: 0.6610
Validation Acc: 0.4400
Epoch [10/100], Loss: 39.5752, Train Acc: 0.6740
Validation Acc: 0.4400
Epoch [11/100], Loss: 39.6374, Train Acc: 0.6700
Validation Acc: 0.4650
Epoch [12/100], Loss: 33.6410, Train Acc: 0.7180
Valid

Train the MLP using SSA data

In [26]:
# Train MLP model using SSA data
output_file = 'data/mRNA_trajectories_example.csv'
X_train, X_test, y_train, y_test = load_and_split_data(output_file)

# Define model parameters
input_size = X_train.shape[1]
output_size = len(set(y_train))  # Number of classes
hidden_size = [300, 200]
dropout_rate = 0.5
learning_rate = 0.001
epochs = 100
batch_size = 32

train_loader = DataLoader(TensorDataset(
    torch.tensor(X_train, dtype=torch.float32),
    torch.tensor(y_train, dtype=torch.long)),
    batch_size=batch_size, shuffle=True
)

test_loader = DataLoader(TensorDataset(
    torch.tensor(X_test, dtype=torch.float32),
    torch.tensor(y_test, dtype=torch.long)),
    batch_size=batch_size, shuffle=False
)

model = MLP(input_size, hidden_size, output_size, dropout_rate, learning_rate)
model.train_model(train_loader, epochs=epochs)

# Evaluate MLP model
mlp_accuracy = model.evaluate(test_loader)
print(f"MLP Test Accuracy: {mlp_accuracy:.4f}")

🔄 Using device: cuda (1 GPUs available)
Epoch [1/100], Loss: 6.7796, Train Acc: 0.6719
Epoch [2/100], Loss: 5.7806, Train Acc: 0.7156
Epoch [3/100], Loss: 5.6211, Train Acc: 0.7312
Epoch [4/100], Loss: 5.0766, Train Acc: 0.7594
Epoch [5/100], Loss: 4.8404, Train Acc: 0.7656
Epoch [6/100], Loss: 4.8021, Train Acc: 0.7625
Epoch [7/100], Loss: 4.6222, Train Acc: 0.7875
Epoch [8/100], Loss: 4.7001, Train Acc: 0.7812
Epoch [9/100], Loss: 4.7044, Train Acc: 0.7750
Epoch [10/100], Loss: 4.4635, Train Acc: 0.8000
Epoch [11/100], Loss: 4.4858, Train Acc: 0.7812
Epoch [12/100], Loss: 4.2082, Train Acc: 0.8063
Epoch [13/100], Loss: 4.3141, Train Acc: 0.8031
Epoch [14/100], Loss: 4.2332, Train Acc: 0.7906
Epoch [15/100], Loss: 4.3645, Train Acc: 0.7906
Epoch [16/100], Loss: 4.1452, Train Acc: 0.8000
Epoch [17/100], Loss: 4.2009, Train Acc: 0.8000
Epoch [18/100], Loss: 4.2486, Train Acc: 0.7969
Epoch [19/100], Loss: 4.1704, Train Acc: 0.7969
Epoch [20/100], Loss: 4.3815, Train Acc: 0.7969
Epoch [21

Same as above, but in a one-liner

In [30]:
# Train SVM model using SSA data
output_file = 'data/mRNA_trajectories_example.csv'
X_train, X_test, y_train, y_test = load_and_split_data(output_file)
mlp_accuracy = mlp_classifier(X_train, X_test, y_train, y_test, epochs=100)

🔄 Using device: cuda (1 GPUs available)
Epoch [1/100], Loss: 6.8887, Train Acc: 0.6188
Epoch [2/100], Loss: 5.5465, Train Acc: 0.7406
Epoch [3/100], Loss: 5.0381, Train Acc: 0.7656
Epoch [4/100], Loss: 4.8078, Train Acc: 0.7812
Epoch [5/100], Loss: 4.6351, Train Acc: 0.7719
Epoch [6/100], Loss: 4.5014, Train Acc: 0.8031
Epoch [7/100], Loss: 4.3527, Train Acc: 0.8031
Epoch [8/100], Loss: 4.2208, Train Acc: 0.8031
Epoch [9/100], Loss: 4.1826, Train Acc: 0.8094
Epoch [10/100], Loss: 4.2329, Train Acc: 0.8031
Epoch [11/100], Loss: 4.1091, Train Acc: 0.8031
Epoch [12/100], Loss: 4.1589, Train Acc: 0.7906
Epoch [13/100], Loss: 3.9525, Train Acc: 0.8094
Epoch [14/100], Loss: 4.0407, Train Acc: 0.8031
Epoch [15/100], Loss: 4.0987, Train Acc: 0.8063
Epoch [16/100], Loss: 3.9645, Train Acc: 0.8031
Epoch [17/100], Loss: 3.9095, Train Acc: 0.8094
Epoch [18/100], Loss: 3.9992, Train Acc: 0.7969
Epoch [19/100], Loss: 3.8833, Train Acc: 0.8281
Epoch [20/100], Loss: 4.0402, Train Acc: 0.7969
Epoch [21