**What is a Dense Layer?**  
A dense (or fully connected) layer connects every input node to every output node.  

- **Input** \( x \): A batch of data of shape \( (N, D_\text{in}) \).  
- **Weights** \( W \): Learnable parameters of shape \( (D_\text{in}, D_\text{out}) \).  
- **Biases** \( b \): Learnable parameters of shape \( (1, D_\text{out}) \).  
- **Output** \( z \): \( z = x W + b \)

**Why?**  
The linear transformation \( z = xW + b \) allows the network to learn complex patterns.


In [None]:
#Dense Layer
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons,
                 weight_regularizer_l1=0, weight_regularizer_l2=0,
                 bias_regularizer_l1=0, bias_regularizer_l2=0):

        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        self.weight_regularizer_l1 = weight_regularizer_l1
        self.weight_regularizer_l2 = weight_regularizer_l2
        self.bias_regularizer_l1 = bias_regularizer_l1
        self.bias_regularizer_l2 = bias_regularizer_l2

    def forward(self, inputs):
        self.inputs = inputs
        self.output = np.dot(inputs, self.weights) + self.biases

    def backward(self, dvalues):
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)

        if self.weight_regularizer_l1 > 0:
            dL1 = np.ones_like(self.weights)
            dL1[self.weights < 0] = -1
            self.dweights += self.weight_regularizer_l1 * dL1
        if self.weight_regularizer_l2 > 0:
            self.dweights += 2 * self.weight_regularizer_l2 * self.weights
        if self.bias_regularizer_l1 > 0:
            dL1 = np.ones_like(self.biases)
            dL1[self.biases < 0] = -1
            self.dbiases += self.bias_regularizer_l1 * dL1
        if self.bias_regularizer_l2 > 0:
            self.dbiases += 2 * self.bias_regularizer_l2 * self.biases

        self.dinputs = np.dot(dvalues, self.weights.T)


**What is Dropout?**  
Dropout randomly "switches off" a fraction \( p \) of units during training. 

- **Why?** To prevent overfitting and encourage the network to learn redundant, robust patterns.
- The kept units are scaled by \( \frac{1}{1-p} \) so the total output stays the same.


In [None]:
# Dropout Layer
class Layer_Dropout:
    def __init__(self, rate):
        self.rate = 1 - rate

    def forward(self, inputs):
        self.inputs = inputs
        self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
        self.output = inputs * self.binary_mask

    def backward(self, dvalues):
        self.dinputs = dvalues * self.binary_mask


**ReLU (Rectified Linear Unit):**  
$ f(x) = \max(0, x) $  
Adds non-linearity, allowing the network to model complex relationships.

**Sigmoid:**  
$ \sigma(x) = \frac{1}{1 + e^{-x}} $
Squashes outputs between (0, 1), ideal for binary classification.

**Softmax:**  
$ f_i(x) = \frac{e^{x_i}}{\sum_j e^{x_j}} $ 
Converts outputs into probabilities for multi-class classification.


In [None]:
# ReLU
class Activation_ReLU:
    def forward(self, inputs):
        self.inputs = inputs
        self.output = np.maximum(0, inputs)

    def backward(self, dvalues):
        self.dinputs = dvalues.copy()
        self.dinputs[self.inputs <= 0] = 0

# Sigmoid
class Activation_Sigmoid:
    def forward(self, inputs):
        self.inputs = inputs
        self.output = 1 / (1 + np.exp(-inputs))

    def backward(self, dvalues):
        self.dinputs = dvalues * (1 - self.output) * self.output

# Softmax
class Activation_Softmax:
    def forward(self, inputs):
        self.inputs = inputs
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        self.output = exp_values / np.sum(exp_values, axis=1, keepdims=True)

    def backward(self, dvalues):
        self.dinputs = np.empty_like(dvalues)
        for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
            single_output = single_output.reshape(-1, 1)
            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
            self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)
