### 1. What Is a Neural Network?

A **neural network** is a computational model inspired by the structure of the brain. It is designed to approximate complex functions that map input data to output labels or values.

At its core, it performs a series of matrix operations that transform inputs through layers of neurons using trainable parameters called **weights** and **biases**.

The **goal of training** a neural network is to find the optimal set of weights and biases that minimize the difference between predicted and actual outputs — a quantity measured by the **loss function**.

A basic feedforward neural network can be represented as:

$$
\hat{y} = f(X) = A_L = \phi_L(W_L \cdot A_{L-1} + b_L)
$$

Where:
- $X$: input data
- $W_i$, $b_i$: weights and biases of layer $i$
- $\phi_i$: activation function at layer $i$
- $A_L$: final output of the network
- $\hat{y}$: predicted output
- $y$: true label

The training involves two key phases:
1. **Forward pass**: compute output predictions $\hat{y}$.
2. **Backward pass**: compute gradients of the loss $\mathcal{L}$ and update weights to reduce this loss.



### 2. Dataset — Spiral Classification

We use a synthetic dataset called the **spiral dataset**, where each class lies along a spiral-shaped trajectory in 2D space.

This task is **not linearly separable**, so it is a good test for neural networks that can learn non-linear boundaries.

Each data point:
- $X_i \in \mathbb{R}^2$: a 2D feature vector
- $y_i \in \{0, 1\}$: its binary class label

We generate 100 samples per class.




In [None]:
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()

# Create dataset
X, y = spiral_data(samples=100, classes=2)
y = y.reshape(-1, 1)  # reshape to (100, 1)


### 3. What Is a Dense Layer? (Weights and Biases)

A **layer** is a block of a neural network that transforms input data using learnable parameters. The most common is the **dense layer** (fully connected layer).

Each neuron in a layer performs:

$$
z_i = \sum_{j=1}^n x_j w_{ji} + b_i = \vec{x} \cdot \vec{w}_i + b_i
$$

Where:
- $x_j$: input features
- $w_{ji}$: weight for feature $j$ for neuron $i$
- $b_i$: bias for neuron $i$
- $z_i$: output (logit) for neuron $i$

For a batch of inputs $X \in \mathbb{R}^{m \times n}$:
\[
Z = XW + b
\]

Where:
- $W \in \mathbb{R}^{n \times h}$: weights matrix
- $b \in \mathbb{R}^{1 \times h}$: biases vector
- $Z \in \mathbb{R}^{m \times h}$: output of layer

**Why weights and biases?**
- **Weights ($W$)** scale input features; they define what the neuron is sensitive to.
- **Biases ($b$)** allow the activation to shift left/right, enabling better fitting of data.

These parameters are **learned** during training to reduce the prediction error (loss).





In [None]:
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons,
                 weight_regularizer_l1=0, weight_regularizer_l2=0,
                 bias_regularizer_l1=0, bias_regularizer_l2=0):
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        self.weight_regularizer_l1 = weight_regularizer_l1
        self.weight_regularizer_l2 = weight_regularizer_l2
        self.bias_regularizer_l1 = bias_regularizer_l1
        self.bias_regularizer_l2 = bias_regularizer_l2

    def forward(self, inputs):
        self.inputs = inputs
        self.output = np.dot(inputs, self.weights) + self.biases


### 4. Forward Pass — Making Predictions

The **forward pass** refers to computing the output of the neural network from input to output layer.

Each layer computes:

$$
Z = XW + b, \quad A = \phi(Z)
$$

Where $\phi$ is the activation function (e.g. ReLU, Sigmoid, etc.).

The result of the forward pass is a prediction $\hat{y}$. We compare this prediction with the ground truth $y$ using a **loss function**.


In [None]:
dense1 = Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4)
dense1.forward(X)


### 5. What Is a Gradient? (Backpropagation)

A **gradient** is a partial derivative of a function with respect to its input — it tells us how sensitive the output is to changes in that input.

During training, we want to minimize the **loss function** $\mathcal{L}(\hat{y}, y)$.

To do this, we compute:

$$
\frac{\partial \mathcal{L}}{\partial W}, \quad
\frac{\partial \mathcal{L}}{\partial b}
$$

These gradients tell us how to **tweak weights and biases** to reduce the error.

This process is called **backpropagation**.

For a dense layer, the gradients are:

$$
\frac{\partial \mathcal{L}}{\partial W} = X^\top \delta, \quad
\frac{\partial \mathcal{L}}{\partial b} = \sum \delta
$$


In [None]:
def backward(self, dvalues):
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)

        # Regularization terms
        if self.weight_regularizer_l1 > 0:
            dL1 = np.ones_like(self.weights)
            dL1[self.weights < 0] = -1
            self.dweights += self.weight_regularizer_l1 * dL1
        if self.weight_regularizer_l2 > 0:
            self.dweights += 2 * self.weight_regularizer_l2 * self.weights

        if self.bias_regularizer_l1 > 0:
            dL1 = np.ones_like(self.biases)
            dL1[self.biases < 0] = -1
            self.dbiases += self.bias_regularizer_l1 * dL1
        if self.bias_regularizer_l2 > 0:
            self.dbiases += 2 * self.bias_regularizer_l2 * self.biases

        self.dinputs = np.dot(dvalues, self.weights.T)


### 6. ReLU Activation Function

ReLU (Rectified Linear Unit) introduces **non-linearity** into the network:

$$
f(x) = \max(0, x)
$$

Why ReLU?
- Keeps positive values
- Sets negatives to zero
- Fast and efficient
- Helps avoid vanishing gradient issues

Its derivative is simple:

$$
f'(x) =
\begin{cases}
1 & x > 0 \\
0 & x \le 0
\end{cases}
$$

This is used in the backward pass to propagate gradients only through active neurons.


In [None]:
class Activation_ReLU:
    def forward(self, inputs):
        self.inputs = inputs
        self.output = np.maximum(0, inputs)

    def backward(self, dvalues):
        self.dinputs = dvalues.copy()
        self.dinputs[self.inputs <= 0] = 0
