## Basic Feedforward Neural Network From Scratch

---

### Project Overview

This report aims to build a fully custom feedforward neural network framework using Python and NumPy, which will be accomplished namely by:

1. Implementing core components such as layers, activation functions, loss functions, and an optimizer
2. Demonstrating training on a sample dataset, visualize performance, and compare results against a baseline
3. Exploring advanced features like different optimizers, regularization, and potential improvements

---
## Table of Contents
---
##### 1. Imports and Setup
##### 2. Core Implementation
##### $~~~~~$ 2.1 Architecture and Data Structures
##### $~~~~~$ 2.2 Building the Model
##### 3. Data Preparation
##### 4. Training and Evaluation
##### 5. Hyperparameter Tuning and Results
##### $~~~~~$ 5.1 Grid Search / Random Search (Conceptual)
##### 6. Advanced Features
##### $~~~~~$ 6.1 Alternative Optimizers (Adam, RMSProp, etc.)
##### $~~~~~$ 6.2 Batch Normalization and Dropout
##### 7. Conclusions and Future Work
---

## 1. Imports and Setup

Below, we load the necessary Python libraries for matrix operations, plotting, and performance tracking

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# for repproducibility 
np.random.seed(42)

## 2. Core Implementation

### 2.1 Architecture and Data Structures

We will implement a modular, object-oriented design. Each component—Layer, Activation, Loss, and Optimizer—will have clear responsibilities. This allows easy extension and maintenance.

In [None]:
class DenseLayer:
    def __init__(self, input_dim, output_dim):
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.weights = np.random.randn(input_dim, output_dim) * 0.01
        self.biases = np.zeros((1, output_dim))
        
    def forward(self, X):
        self.X = X  # store for backward pass
        return np.dot(X, self.weights) + self.biases
    
    def backward(self, dZ, learning_rate=0.01, reg_lambda=0.0):
        m = self.X.shape[0]
        dW = (1/m) * np.dot(self.X.T, dZ) + reg_lambda * self.weights
        db = (1/m) * np.sum(dZ, axis=0, keepdims=True)
        
        # gradient for next layer
        dX = np.dot(dZ, self.weights.T)
        
        # update weights
        self.weights -= learning_rate * dW
        self.biases -= learning_rate * db
        
        return dX

In [None]:
class Activation:
    def forward(self, Z):
        raise NotImplementedError
    
    def backward(self, dA, Z):
        raise NotImplementedError

class ReLU(Activation):
    def forward(self, Z):
        self.Z = Z
        return np.maximum(0, Z)
    
    def backward(self, dA, Z=None):
        # If we haven't stored Z, use self.Z
        Z = self.Z if Z is None else Z
        return dA * (Z > 0)

class Sigmoid(Activation):
    def forward(self, Z):
        self.A = 1 / (1 + np.exp(-Z))
        return self.A
    
    def backward(self, dA, Z=None):
        # Use stored output A = sigmoid(Z)
        A = self.A
        return dA * A * (1 - A)


In [None]:
class LossFunction:
    def forward(self, y_pred, y_true):
        raise NotImplementedError
    
    def backward(self, y_pred, y_true):
        raise NotImplementedError

class MeanSquaredError(LossFunction):
    def forward(self, y_pred, y_true):
        return np.mean(0.5 * (y_true - y_pred)**2)
    
    def backward(self, y_pred, y_true):
        return (y_pred - y_true) / y_true.shape[0]

class BinaryCrossEntropy(LossFunction):
    def forward(self, y_pred, y_true):
        eps = 1e-8
        return -np.mean(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))
    
    def backward(self, y_pred, y_true):
        eps = 1e-8
        return -(y_true / (y_pred + eps) - (1 - y_true) / (1 - y_pred + eps)) / y_true.shape[0]

### 2.2 Building the Model

We create a `NeuralNetwork` class to combine layers and activations into a cohesive model. This class will handle:
- Forward propagation through all layers
- Calculating the loss
- Backward propagation to update weights

In [None]:
class NeuralNetwork:
    def __init__(self, layers, activations, loss_func, learning_rate=0.01, reg_lambda=0.0):
        self.layers = layers
        self.activations = activations
        self.loss_func = loss_func
        self.learning_rate = learning_rate
        self.reg_lambda = reg_lambda
    
    def forward(self, X):
        out = X
        for i, layer in enumerate(self.layers):
            Z = layer.forward(out)
            if i < len(self.activations):
                out = self.activations[i].forward(Z)
            else:
                out = Z
        return out
    
    def compute_loss(self, y_pred, y_true):
        return self.loss_func.forward(y_pred, y_true)
    
    def backward(self, y_pred, y_true):
        dA = self.loss_func.backward(y_pred, y_true)
        
        for i in reversed(range(len(self.layers))):
            if i < len(self.activations):
                dZ = self.activations[i].backward(dA)
            else:
                dZ = dA
            dA = self.layers[i].backward(dZ, self.learning_rate, self.reg_lambda)
    
    def fit(self, X, y, epochs=100, verbose=True):
        history = []
        for epoch in range(epochs):
            y_pred = self.forward(X)
            loss = self.compute_loss(y_pred, y)
            history.append(loss)
            
            # Backprop
            self.backward(y_pred, y)
            
            if verbose and (epoch+1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {loss:.4f}")
        return history
    
    def predict(self, X, threshold=0.5):
        prob = self.forward(X)
        return (prob >= threshold).astype(int)

### 3. Data Preparation

For demonstration, let's create a synthetic binary classification dataset. In a real project, you might load data from an external source, perform cleaning, normalizing, and potentially augmenting it.

In [None]:
from sklearn.model_selection import train_test_split

N = 1000
X_data = np.random.randn(N, 2)
y_data = (X_data[:, 0] + X_data[:, 1] > 0).astype(int).reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.2, random_state=42)

print("Train set size:", X_train.shape, y_train.shape)
print("Test set size:", X_test.shape, y_test.shape)

### 4. Training and Evaluation

We construct a small feedforward network with one hidden layer and a sigmoid output for binary classification. We'll use the Binary Cross-Entropy loss function.

In [None]:
layer1 = DenseLayer(input_dim=2, output_dim=4)
act1 = ReLU()

layer2 = DenseLayer(input_dim=4, output_dim=1)
act2 = Sigmoid()

loss_fn = BinaryCrossEntropy()

network = NeuralNetwork(
    layers=[layer1, layer2],
    activations=[act1, act2],
    loss_func=loss_fn,
    learning_rate=0.05,
    reg_lambda=0.01
)

history = network.fit(X_train, y_train, epochs=200, verbose=True)

plt.plot(history)
plt.title("Training Loss Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

# Evaluate on the test set
y_pred_test = network.predict(X_test)
accuracy = (y_pred_test == y_test).mean()
print(f"Test Accuracy: {accuracy * 100:.2f}%")

### 5. Hyperparameter Tuning and Results

#### 5.1 Grid Search / Random Search (Conceptual)

While we won’t execute an extensive hyperparameter search here, your project could include a systematic procedure to explore:
- Learning rates (e.g., 0.01, 0.05, 0.1)
- Regularization parameters (0, 0.001, 0.01, 0.1)
- Different numbers of hidden units
- Batch sizes

You could automate experiments and store results for analysis:

In [None]:
# not complete
learning_rates = [0.01, 0.05, 0.1]
reg_lambdas = [0.0, 0.01, 0.1]
best_acc = 0
best_params = None

for lr in learning_rates:
    for rl in reg_lambdas:
        model = NeuralNetwork(layers=..., activations=..., loss_func=..., learning_rate=lr, reg_lambda=rl)
        history = model.fit(X_train, y_train, epochs=100, verbose=False)
        acc = (model.predict(X_test) == y_test).mean()
        
        if acc > best_acc:
            best_acc = acc
            best_params = (lr, rl)

print("Best Accuracy:", best_acc)
print("Best Params (LR, REG):", best_params)


### 6. Advanced Features

#### 6.1 Alternative Optimizers (Adam, RMSProp, etc.)

#### 6.2 Batch Normalization and Dropout