### Importing Required Libraries

We start by importing the necessary Python libraries for numerical computation, data visualization, and data handling:

- **NumPy** – for numerical operations  
- **Matplotlib** – for creating visualizations  
- **Pandas** – for data manipulation and analysis  

We also define the mathematical constant **Euler's Number (E)** for later use.


In [None]:
import numpy as np
# import nnfs
# from nnfs.datasets import spiral_data
import matplotlib.pyplot as plt
import pandas as pd
# nnfs.init()
E = 2.71828182846



-Creating a Dense Layer class for a neural network.
-Initializes small random weights and zero biases.
-Forward pass computes outputs, backward pass calculates gradients for weights, biases, and inputs.

In [None]:
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    def forward(self, inputs):
        self.inputs = inputs
        self.output = np.dot(inputs, self.weights) + self.biases

    def backward(self, dvalues):
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
        self.dinputs = np.dot(dvalues, self.weights.T)


Defines two activation functions for neural networks:

* ReLU: outputs positive values, zeroing negatives; backward pass stops gradients for inactive neurons.
* Softmax: converts outputs to probabilities; backward pass uses the Jacobian matrix for gradient calculation.


In [None]:
class Activation_ReLU:
    def forward(self,inputs):
        self.inputs = inputs
        self.output = np.maximum(0,inputs)
    def backward(self,dvalues):
        self.dinputs = dvalues.copy()
        self.dinputs[self.inputs <=0] = 0


class Activation_Softmax:
    def forward(self,inputs):

        exp_values = np.exp(inputs - np.max(inputs,axis=1,keepdims=True))
        probablities = exp_values/np.sum(exp_values,axis=1,keepdims=True)

        self.output = probablities
    def backward(self,dvalues):
        self.dinputs = np.empty_like(dvalues)

        for index, (single_output,single_dvalues)in enumerate(zip(self.output,dvalues)):
            single_output = single_output.reshape(-1,1)
            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
            self.dinputs[index] = np.dot(jacobian_matrix,single_dvalues)



Base Loss class with a `Calculate` method.
Calls `forward` to compute per-sample losses, then returns their mean as the final loss value.


In [None]:
class Loss:
    def Calculate(self,output,y):
        sample_losses = self.forward(output,y)
        data_loss = np.mean(sample_losses)

        return data_loss


Categorical Cross-Entropy loss implementation.
In `forward`, it clips predictions to avoid log(0), selects the correct class probabilities, and returns the negative log-likelihoods.
In `backward`, it converts class indices to one-hot if needed and computes the gradient with respect to predictions.


In [None]:
class Loss_CategoricalCrossentropy(Loss):
    def forward(self,y_pred,y_true):
        samples = len(y_pred)
        y_pred_clpped = np.clip(y_pred,1e-7,1-1e-7)

        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clpped[
                range(samples),
                y_true
            ]

        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clpped * y_true,
                axis = 1
            )
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods

    def backward(self,dvalues,y_true):
        samples = len(dvalues)
        labels = len(dvalues[0])

        if(len(y_true.shape) == 1):
            y_true = np.eye(labels)[y_true]
        self.dinputs = -y_true / dvalues
        self.dinputs = self.dinputs / samples


Combined softmax activation and categorical cross-entropy loss for efficiency.
`forward` runs softmax then computes the loss.
`backward` simplifies gradient calculation using the derivative of softmax + CCE, avoiding the full Jacobian.


In [None]:
class Activation_Softmax_Loss_CategoricalCrossentropy():
    def __init__(self):
        self.activation = Activation_Softmax()
        self.loss = Loss_CategoricalCrossentropy()

    def forward(self, inputs, y_true):
        self.activation.forward(inputs)
        self.output = self.activation.output
        return self.loss.Calculate(self.output, y_true)

    def backward(self, dvalues, y_true):
        samples = len(dvalues)

        if len(y_true.shape) == 2:
            y_true = np.argmax(y_true, axis=1)

        self.dinputs = dvalues.copy()
        self.dinputs[range(samples), y_true] -= 1
        self.dinputs = self.dinputs / samples