# My Neural Network

I built a Neural Network from Scratch, using Python and a whole lot of calculus

## Neural Network Overview 

**Neural Network Architecture (2-4-1)**
- **Input**: 2 features
- **Hidden Layer**: 4 neurons with sigmoid activation
- **Output Layer**: 1 neuron with sigmoid activation
- **Total Parameters**: (2×4 + 4) + (4×1 + 1) = 17 parameters

**Trained**
- XOR gate truth table (4 samples)
- Non-linearly separable binary classification problem

**Performance Results**
- **Final Loss**: 0.0032 (Sum of Squared Errors)
- **Accuracy**: 100% on all training samples
- **Training**: 20,000 epochs with learning rate α = 0.05
- **Convergence**: Successfully learned the XOR function


## Lessons Learned

I am extremely **grateful** python libraries such as TensorFlow and PyTorch

In retrospect, I would **utilize more Linear Algebra**. I misunderstood how neurons worked in neural networks, and made a separate `Neuron` class for them, which my `Layer` class instantiated. Rather, I should have just built out 2D matrixes in the `Layer` class, and just indexed into them and used np vector functions such as `np.sum()` or `np.dot()`. 

In [216]:
from typing import Callable, List
import numpy as np
from numpy.typing import NDArray

## Common Methods

**Sigmoid**
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

In [217]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

**Derivative of Sigmoid**
$$\sigma'(z) = \sigma(z)(1 - \sigma(z))$$

In [218]:
def deriv_sigmoid(z):
    sig = sigmoid(z);
    return sig * (1 - sig)

**Loss Function (Sum of Squared Errors)**
$$L(a, y) = \sum_{i=1}^{n} (a_i - y_i)^2$$

In [219]:
def loss(a: NDArray[np.float64], y: NDArray[np.float64]):
    if(len(a) != len(y)):
        raise RuntimeError(f"Length of calculated ({len(a)}) != Length of y {len(y)}")
    
    return np.sum((a - y) ** 2)

## Classes

**Neuron Activation Function**
$$z = \mathbf{w} \cdot \mathbf{x} + b$$
$$a = \sigma(z)$$

In [None]:
class Neuron():
    activation_func: Callable[[float], float];
    w: NDArray[np.float64];
    b: float
    a: float
    z_stored: float
    
    prev_layer_activations: NDArray[np.float64]

    def __init__(self, activation_func: Callable[[float], float], input_size: int):
        self.activation_func = activation_func
        self.w = np.random.randn(input_size) * 0.1
        self.b = 0.0

    def z(self, a_input_vec: NDArray[np.float64]) -> float:
        self.z_stored = np.dot(self.w, a_input_vec) + self.b
        return self.z_stored
        
    def get_activation(self, a_input_vec: NDArray[np.float64]) -> float:
        self.prev_layer_activations = a_input_vec.copy();
        self.a = self.activation_func(self.z(a_input_vec)) 
        return self.a
    

In [None]:
class Layer():
    neurons: List[Neuron] = []
    input_size: int;
    activations: List[float] = []
    alpha = 0.1 # learning rate
    
    def __init__(self, width: int, activation_func: Callable[[float], float], input_size: int):
        self.neurons = []
        self.input_size = input_size

        for i in range(width):
            self.neurons.append(Neuron(activation_func, input_size))
    
    def get_activations(self, a: NDArray[np.float64]) -> NDArray[np.float64]:
        self.activations : List[float] = []
        for n in self.neurons:
            self.activations.append(n.get_activation(a))
    
        return np.array(self.activations)
    
    # Wrong approach to update weight + biases
    # Need to find delta to see how much error each neuron is responsible for
    def _update_weights_and_bias_by_y(self, y: NDArray[np.float64]):

        for j, n in enumerate(self.neurons):
            z = n.z_stored
            a_j = n.a # The activation value of the current neuron
            
            dL_aj = 2 * (a_j - y[j]) 
            daj_dzj = deriv_sigmoid(z) 

            for k in range(len(n.prev_layer_activations)):
                dzj_dwjk = n.prev_layer_activations[k]
                n.w[k] -= self.alpha * dL_aj * daj_dzj * dzj_dwjk


            n.b -= self.alpha * dL_aj * daj_dzj 

**Backpropagation Algorithm**

**Step 1: Output Layer Delta**
$$\delta_i^{(L)} = \frac{\partial L}{\partial a_i^{(L)}} \cdot \sigma'(z_i^{(L)})$$

**Step 2: Hidden Layer Delta (Backward Propagation)**
$$\delta_i^{(l)} = \sigma'(z_i^{(l)}) \cdot \sum_{j} w_{ji}^{(l+1)} \delta_j^{(l+1)}$$

**Step 3: Weight Updates**
$$w_{ij}^{(l)} = w_{ij}^{(l)} - \alpha \cdot \delta_j^{(l)} \cdot a_i^{(l-1)}$$
**Step 4: Bias Updates**
$$b_j^{(l)} = b_j^{(l)} - \alpha \cdot \delta_j^{(l)}$$


Where:
$$ \frac{\partial L}{\partial a_i^{(L)}} = 2(a_i^{(L)} - y_i)$$

$$\sigma'(z_i^{(l)}) = a_i^{(l)} \cdot (1 - a_i^{(l)})$$

**Notation:**
- $L$ = output layer
- $l$ = current layer index
- $\delta_i^{(l)}$ = error signal for neuron $i$ in layer $l$
- $w_{ij}^{(l)}$ = weight from neuron $i$ in layer $(l-1)$ to neuron $j$ in layer $l$
- $a_i^{(l)}$ = activation of neuron $i$ in layer $l$
- $\alpha$ = learning rate
- $y_i$ = target output

In [None]:
class NeuralNetwork():
    layers: List[Layer]
    alpha = 0.05
    
    def __init__(self, layers: List[Layer]):
        if len(layers) == 0:
            raise RuntimeError("Layers should not be empty")

        self.layers = layers;

    def evaluate(self, x_input: NDArray[np.float64]) -> NDArray[np.float64]:
        if(len(x_input) != self.layers[0].input_size):
            raise RuntimeError("Input size layer mismatch")
            
        # forward propogation
        a = x_input
        for layer in self.layers:
            a = layer.get_activations(a)
        
        return a
    
    def train(self, x_input: NDArray[np.float64], y: NDArray[np.float64]):
        # forward propagation
        self.evaluate(x_input)

        # backpropogation
        all_deltas = [[] for _ in range(len(self.layers))]


        output_deltas = []
        for L, neuron in enumerate(self.layers[-1].neurons):
            delta : float = 2 * (neuron.a - y[L]) * neuron.a * (1 - neuron.a)
            output_deltas.append(delta)

        all_deltas[-1] = output_deltas
         
        for L in reversed(range(len(self.layers) - 1)):
            for i, neuron in enumerate(self.layers[L].neurons):

                deriv_activation = neuron.a * (1 - neuron.a)

                summation : float = 0.0
                for j, next_neuron in enumerate(self.layers[L + 1].neurons):
                    summation += next_neuron.w[i] * all_deltas[L + 1][j]

                delta : float = deriv_activation * summation
                all_deltas[L].append(delta)

        # update weights
        for L in reversed(range(len(self.layers))):
            for i, neuron in enumerate(self.layers[L].neurons):
                for j in range(len(neuron.w)):
                    if L == 0:
                        prev_activation = x_input[j]
                    else:
                        prev_activation = self.layers[L - 1].neurons[j].a
                    neuron.w[j] -= self.alpha * all_deltas[L][i] * prev_activation
                    neuron.b -= self.alpha * all_deltas[L][i]
        
        
        

## Implementation

### Training Data

XOR Gate



| $\mathbf{x_1}$ | $\mathbf{x_2}$ | $\mathbf{y}$ |
|----------------|----------------|--------------|
|       0        |       0        |      0       |
|       0        |       1        |      1       |
|       1        |       0        |      1       |
|       1        |       1        |      0       |


In [224]:
# Inputs
X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])

# Expected outputs (labels)
y = np.array([
    [0],
    [1],
    [1],
    [0]
])

**Neural Network Architecture (2-4-1)**
- **Input**: 2 features
- **Hidden Layer**: 4 neurons with sigmoid activation
- **Output Layer**: 1 neuron with sigmoid activation
- **Total Parameters**: (2×4 + 4) + (4×1 + 1) = 17 parameters

In [228]:
hidden_layer = Layer(width=4, activation_func=sigmoid, input_size=2)
output_layer = Layer(width=1, activation_func=sigmoid, input_size=4)

network = NeuralNetwork(layers=[hidden_layer, output_layer])

### Train

In [230]:
for epoch in range(20000):  # or 10_000
    for x, y_i in zip(X, y):
        network.train(np.array(x), np.array(y_i))
    
    if epoch % 1000 == 0:
        predictions = np.array([network.evaluate(x)[0] for x in X])
        total_loss = loss(predictions, y.flatten())
        print(f"Epoch {epoch} - Loss: {total_loss}")

Epoch 0 - Loss: 0.9998972827455818
Epoch 1000 - Loss: 0.9996693898807136
Epoch 2000 - Loss: 0.9990363424160935
Epoch 3000 - Loss: 0.9966779637195304
Epoch 4000 - Loss: 0.9804012119739234
Epoch 5000 - Loss: 0.7163377562734814
Epoch 6000 - Loss: 0.12903391010319082
Epoch 7000 - Loss: 0.04688575182005066
Epoch 8000 - Loss: 0.026603891722236865
Epoch 9000 - Loss: 0.018123502970270215
Epoch 10000 - Loss: 0.013589689371613808
Epoch 11000 - Loss: 0.010803540554730193
Epoch 12000 - Loss: 0.008931484330769464
Epoch 13000 - Loss: 0.007593201939469047
Epoch 14000 - Loss: 0.006591992890099658
Epoch 15000 - Loss: 0.005816467837993307
Epoch 16000 - Loss: 0.005199061927128873
Epoch 17000 - Loss: 0.004696523485616833
Epoch 18000 - Loss: 0.004279936365207648
Epoch 19000 - Loss: 0.0039292652661560455


A *beautiful* **Loss of 0.0039** 🤩🤩🤩🤩🤩

### Evaluate

In [231]:
for i, x_input in enumerate(X):
    output = network.evaluate(x_input)
    prediction = 1 if output[0] > 0.5 else 0
    print(f"[{i}] Input: {x_input} -> Output: {output[0]:.4f} | Prediction: {prediction} | Actual: {y[i][0]}")


[0] Input: [0 0] -> Output: 0.0254 | Prediction: 0 | Actual: 0
[1] Input: [0 1] -> Output: 0.9704 | Prediction: 1 | Actual: 1
[2] Input: [1 0] -> Output: 0.9701 | Prediction: 1 | Actual: 1
[3] Input: [1 1] -> Output: 0.0349 | Prediction: 0 | Actual: 0


And as you can see, my neural network learned. Im so proud 🥲