<!-- Notebook title -->
# Neural Networks

# 1. Notebook Description

### 1.1 Task Description
<!-- 
- A brief description of the problem you're solving with machine learning.
- Define the objective (e.g., classification, regression, clustering, etc.).
-->

#### Introduction

Implement 2 types of mulilayer Perceptrons:
1. Using only Python. 
2. Using a high level library

* Download the Ecoli dataset: https://archive.ics.uci.edu/ml/datasets/Ecoli

----


#### Data Preparation
* Filter the dataset to include only the classes cp and im, and remove
the rest of the data.
* Make necessary adjustments to the data to prepare it for the Multi-layer Perceptron models.

#### Implement the MLP from Scratch
* Implement a Multilayer Perceptron from scratch using only Python
and standard libraries.
* Ensure that your implementation includes the following:
    * Forward propagation to compute the output.
    * Backpropagation to calculate the gradients.
    * Weight update mechanism based on the gradients.
Note: You do not need to train this model, just ensure the implementation is functional.

#### Implement using a High-Level Library
* Implement a Multilayer Perceptron using a high-level library such as PyTorch.
* Train the model on the prepared dataset.
* Evaluate the model’s performance on a test set using appropriate metrics.

#### Evaluate the MLP Implementations
* Compare the performance of the MLP implemented from scratch with the one implemented using the high-level library.
* Discuss the differences in ease of implementation, training time, and performance.

#### Additional Instructions
* Choose the network architecture with care.
* Train and validate all algorithms.
* Make the necessary assumptions.

### 1.2 Useful Resources
<!--
- Links to relevant papers, articles, or documentation.
- Description of the datasets (if external).
-->

### 1.2.1 Data

#### 1.2.1.1 Common

#### 1.2.1.2 Project

https://archive.ics.uci.edu/dataset/39/ecoli 

### 1.2.2 Learning

### 1.2.3 Documentation

---

# 2. Setup

## 2.1 Imports
<!--
- Import necessary libraries (e.g., `numpy`, `pandas`, `matplotlib`, `scikit-learn`, etc.).
-->

In [1]:

from ikt450.src.common_imports import *
from ikt450.src.config import get_paths
from ikt450.src.common_func import load_dataset, save_dataframe, ensure_dir_exists
import pandas as pd
import numpy as np
import random
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

import time

In [2]:
#device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Force CPU for time comparisons
device = torch.device("cpu")

## 2.2 Global Variables
<!--
- Define global constants, paths, and configuration settings used throughout the notebook.
-->

### 2.2.1 Paths

In [3]:
paths = get_paths()

### 2.2.2 Seed

In [4]:
RANDOM_SEED = 7

### 2.2.3 Split ratio

In [5]:
SPLITRATIO = 0.8

### 2.2.4 Learning rate

In [6]:
LR = 0.001

### 2.2.5 Batch size

In [7]:
BATCH_SIZE = 16

### 2.2.6 Epochs

In [8]:
EPOCHS = 1000

## 2.3 Function Definitions
<!--
- Define helper functions that will be used multiple times in the notebook.
- Consider organizing these into separate sections (e.g., data processing functions, model evaluation functions).
-->

In [9]:
def mse (y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

## 2.4 Classes

In [10]:
class Perceptron():
    def __init__(self, n_inputs):
        self.w = np.random.rand(n_inputs)*0.001
        self.b = np.random.rand(1)

    def sigmoid(self, x):
        x = np.clip(x, -500, 500)  # prevent overflow
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def forward(self, x):
        self.latest_output = self.sigmoid(np.dot(x, self.w) + self.b)
        self.latest_input = x
        return self.latest_output
    
    def backward(self,error,lr):
        gradient = error * self.sigmoid_derivative(self.latest_output)
        self.w += lr * gradient * self.latest_input
        self.b += lr * gradient
        return gradient * self.w

In [11]:
class MLP_scratch():
    def __init__(self, layer1_size, layer2_size, layer3_size, lr=0.1):
        self.lr = lr
        self.layer1_size = layer1_size
        self.layer2_size = layer2_size
        self.layer3_size = layer3_size
        self.layer1 = [Perceptron(6) for _ in range(layer1_size)]
        self.layer2 = [Perceptron(layer1_size) for _ in range(layer2_size)]
        self.layer3 = [Perceptron(layer2_size) for _ in range(layer3_size)]
        
    def forward(self, inputs):
        inputs = np.array(inputs)
        
        x = []
        for l in self.layer1:
            x.append(l.forward(inputs))
        inputs = np.array(x)
        inputs = np.reshape(inputs, (len(inputs)))
        
        x = []
        for l in self.layer2:
            x.append(l.forward(inputs))
        inputs = np.array(x)
        inputs = np.reshape(inputs, (len(inputs)))
        x = []
        for l in self.layer3:
            x.append(l.forward(inputs))
        return np.array(x)
    
    def backward(self,error):
        initial_error = error
        new_error = []
        for l, err in zip(self.layer3,initial_error):
            new_error.append(l.backward(err,self.lr))
        initial_error = np.array(new_error)
        initial_error = np.sum(initial_error, axis=0)
        new_error = []
        for l, err in zip(self.layer2,initial_error):
            new_error.append(l.backward(err,self.lr))
        initial_error = np.array(new_error)
  
        initial_error = np.sum(initial_error, axis=0)
    
        new_error = []
        for l, err in zip(self.layer1,initial_error):
            new_error.append(l.backward(err,self.lr))

In [12]:
class MLP_Torch(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP_Torch, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        return x

---

# 3. System Setup 
<!-- (Optional but recommended) -->

## 3.1 Styling
<!--
- Set up any visual styles (e.g., for plots).
- Configure notebook display settings (e.g., `matplotlib` defaults, pandas display options).
-->

## 3.2 Environment Configuration
<!--
- Check system dependencies, versions, and ensure reproducibility (e.g., set random seeds).
-->

### 3.2.1 Seed

In [13]:
np.random.seed(RANDOM_SEED)

---

# 4. Data Processing

## 4.1 Data loading
<!--
- Load datasets from files or other sources.
-->

ref. [Introduction](#introduction)

In [14]:
%ls {paths['PATH_COMMON_DATASETS']}

ecoli.data  ecoli.names  pima-indians-diabetes.data.csv


In [15]:
df = pd.read_csv(f"{paths['PATH_COMMON_DATASETS']}/ecoli.data",  sep="\s+", header=None)

## 4.2 Data inspection
<!--
- Preview the data (e.g., `head`, `describe`).
-->

### 4.2.1 Info

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 336 entries, 0 to 335
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       336 non-null    object 
 1   1       336 non-null    float64
 2   2       336 non-null    float64
 3   3       336 non-null    float64
 4   4       336 non-null    float64
 5   5       336 non-null    float64
 6   6       336 non-null    float64
 7   7       336 non-null    float64
 8   8       336 non-null    object 
dtypes: float64(7), object(2)
memory usage: 23.8+ KB


### 4.2.2 Describe

In [17]:
df.describe()

Unnamed: 0,1,2,3,4,5,6,7
count,336.0,336.0,336.0,336.0,336.0,336.0,336.0
mean,0.50006,0.5,0.495476,0.501488,0.50003,0.500179,0.499732
std,0.194634,0.148157,0.088495,0.027277,0.122376,0.215751,0.209411
min,0.0,0.16,0.48,0.5,0.0,0.03,0.0
25%,0.34,0.4,0.48,0.5,0.42,0.33,0.35
50%,0.5,0.47,0.48,0.5,0.495,0.455,0.43
75%,0.6625,0.57,0.48,0.5,0.57,0.71,0.71
max,0.89,1.0,1.0,1.0,0.88,1.0,0.99


### 4.2.3 Head

In [18]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,AAT_ECOLI,0.49,0.29,0.48,0.5,0.56,0.24,0.35,cp
1,ACEA_ECOLI,0.07,0.4,0.48,0.5,0.54,0.35,0.44,cp
2,ACEK_ECOLI,0.56,0.4,0.48,0.5,0.49,0.37,0.46,cp
3,ACKA_ECOLI,0.59,0.49,0.48,0.5,0.52,0.45,0.36,cp
4,ADI_ECOLI,0.23,0.32,0.48,0.5,0.55,0.25,0.35,cp


## 4.3 Data Visualization

In [19]:
# TODO Add code for visualization

## 4.4 Data Cleaning
<!--
- Handle missing values, outliers, and inconsistencies.
- Remove or impute missing data.
-->

### 4.4.1 NULL, NaN, Missing values

In [20]:
df.isnull().sum()

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

In [21]:
df.isna().sum()

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

In [22]:
df.duplicated().sum()

0

In [23]:
#df.corr()

## 4.5 Feature Engineering
<!--
- Create new features from existing data.
- Normalize or standardize features.
- Encode categorical variables.
-->

### 4.5.1 Normalize

#### 4.5.1.1 Feature Selection / Data Separation

ref. [Data Preparation](#data-preparation)

In [24]:
df.columns

Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

In [25]:
df.rename(columns={0: 'id', 1: 'mcg', 2: 'gvh', 3: 'lip', 4: 'chg', 5: 'aac', 6: 'alm1', 7: 'alm2', 8: 'class'}, inplace=True)

# remove id column
df.drop('id', axis=1, inplace=True)

# keep only the rows with class 'cp' or 'im'
df = df[df['class'].isin(['cp', 'im'])]

# remove column chg
df.drop('chg', axis=1, inplace=True)

In [26]:
# encode class
df['class'] = df['class'].map({'cp': 0, 'im': 1})

# split data into X and y
X_data = df.drop('class', axis=1)
Y_data = df['class']

# standardize the data
X_data = (X_data - X_data.mean()) / X_data.std()


In [27]:
# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=1-SPLITRATIO, random_state=RANDOM_SEED)

In [28]:
# Prepare the data for PyTorch
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1).to(device)
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32).to(device)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1).to(device)

#### 4.5.1.2 Target Variable Extraction

---

# 5. Model Development

## 5.1 Model Selection
<!--
- Choose the model(s) to be trained (e.g., linear regression, decision trees, neural networks).
-->

ref. [Implement the MLP from Scratch](#implement-the-mlp-from-scratch)

In [29]:
m_scratch = MLP_scratch(5,5,1, LR)

In [30]:
m_torch = MLP_Torch(input_size=X_train.shape[1], hidden_size=5, output_size=1).to(device)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
#criterion = nn.BCELoss()  # Binary Cross Entropy
criterion = nn.MSELoss()  # Mean Squared Error 
optimizer = optim.SGD(m_torch.parameters(), lr=LR)

## 5.2 Model Training
<!--
- Train the selected model(s) using the training data.
-->

### From Scratch

In [31]:
# train the model
t_train_scratch = time.time()
for i in range(EPOCHS):
    epoch_loss = 0

    # shuffle the training data
    X_train, y_train = shuffle(X_train, y_train)
    for x,y in zip(X_train.values, y_train.values):
       
        y_pred = m_scratch.forward(x)
        
        error = y - y_pred
        epoch_loss += mse(y, y_pred)
       
        m_scratch.backward(error)
        
    # print the loss every 100 epochs 313 
    if i % 100 == 0:
        print(f"Epoch {i} loss: {epoch_loss}")

t_train_end_scratch = time.time()

Epoch 0 loss: 58.95117774421368
Epoch 100 loss: 41.533571016515396
Epoch 200 loss: 41.501187306250664
Epoch 300 loss: 41.50071146123878
Epoch 400 loss: 41.50030163615343
Epoch 500 loss: 41.4998460024477
Epoch 600 loss: 41.49934981858935
Epoch 700 loss: 41.498747013108186
Epoch 800 loss: 41.498070900762336
Epoch 900 loss: 41.49726345030262


### With PyTorch

In [32]:
# Train the model
t_train_torch = time.time()
for epoch in range(EPOCHS):
    for inputs, targets in train_loader:
        # Forward pass
        outputs = m_torch(inputs)
        loss = criterion(outputs, targets)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

t_train_end_torch = time.time()

Epoch 0, Loss: 0.252521812915802
Epoch 100, Loss: 0.26075366139411926
Epoch 200, Loss: 0.22248348593711853
Epoch 300, Loss: 0.2005651891231537
Epoch 400, Loss: 0.22413106262683868
Epoch 500, Loss: 0.19130033254623413
Epoch 600, Loss: 0.2210959494113922
Epoch 700, Loss: 0.22816258668899536
Epoch 800, Loss: 0.18558578193187714
Epoch 900, Loss: 0.19498419761657715


## 5.3 Model Evaluation
<!--
- Evaluate model performance on validation data.
- Use appropriate metrics (e.g., accuracy, precision, recall, RMSE).
-->

In [33]:
# Test scratch model
t_test_scratch = time.time()
y_pred_scratch = []
for x in X_test.values:
    y_pred_scratch.append(m_scratch.forward(x))
y_pred_scratch = np.array(y_pred_scratch)
y_pred_scratch = np.reshape(y_pred_scratch, (len(y_pred_scratch)))
print(mse(y_test.values, y_pred_scratch))
acc_scratch = np.mean(y_test.values == np.round(y_pred_scratch))
t_test_end_scratch = time.time()

0.19910255311048905


In [34]:
# Test pytorch model
t_test_torch = time.time()
with torch.no_grad():
    predictions = m_torch(X_test_tensor)
    predictions = (predictions > 0.5).float()
    accuracy = (predictions == y_test_tensor).float().mean()
t_test_enc_end_torch = time.time()

In [35]:
print("\t | Scratch \t| Torch")
print(f"Train \t | {t_train_end_scratch - t_train_scratch:.2f} s\t| {t_train_end_torch - t_train_torch:.2f} s")
print(f"Test  \t | {t_test_end_scratch - t_test_scratch:.2f} s \t| {t_test_enc_end_torch - t_test_torch:.2f} s")
print(f"ACC \t | {acc_scratch*100:.2f} % \t| {accuracy.item()*100:.2f} %")

	 | Scratch 	| Torch
Train 	 | 45.09 s	| 6.98 s
Test  	 | 0.01 s 	| 0.00 s
ACC 	 | 77.27 % 	| 79.55 %


## 5.4 Hyperparameter Tuning
<!--
- Fine-tune the model using techniques like Grid Search or Random Search.
- Evaluate the impact of different hyperparameters.
-->

## 5.5 Model Testing
<!--
- Evaluate the final model on the test dataset.
- Ensure that the model generalizes well to unseen data.
-->

## 5.6 Model Interpretation (Optional)
<!--
- Interpret the model results (e.g., feature importance, SHAP values).
- Discuss the strengths and limitations of the model.
-->

---

# 6. Predictions


## 6.1 Make Predictions
<!--
- Use the trained model to make predictions on new/unseen data.
-->

## 6.2 Save Model and Results
<!--
- Save the trained model to disk for future use.
- Export prediction results for further analysis.
-->

---

# 7. Documentation and Reporting

## 7.1 Summary of Findings
<!--
- Summarize the results and findings of the analysis.
-->

## 7.2 Next Steps
<!--
- Suggest further improvements, alternative models, or future work.
-->

## 7.3 References
<!--
- Cite any resources, papers, or documentation used.
-->