# Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) derive their name from the convolution operation that occurs within them. This type of neural network builds upon the Multi-Layer Perceptron (MLP) model. 

## Why CNNs?
The limitation of MLPs is that they only work with flattened arrays of data. For example, an image must be flattened into a one-dimensional array to be processed by an MLP. However, this flattening process often results in the loss of spatial information. Additionally, most real-world data, such as images and audio, is multi-dimensional. 

### Example:
- Images are represented as matrices (e.g., a 1020x720 image has 1020 rows and 720 columns of pixels).
- Each pixel has an RGB value, adding a third dimension to the data.

Flattening such data into a single array can be computationally expensive and inefficient. CNNs address these challenges by preserving the spatial structure of the data and efficiently processing multi-dimensional inputs.

---

## CNN Architecture

CNNs are specialized neural networks designed for processing data with a grid-like topology, such as images. They consist of the following layers:

### 1. **Convolutional Layers**
- These layers apply convolutional filters (kernels) to the input data to extract local features such as edges, textures, and patterns.
- The kernels contain weights that are learned through backpropagation, similar to the MLP model.

### 2. **Pooling Layers**
- Pooling layers perform sub-sampling or down-sampling, reducing the dimensions of the input data. This helps the network recognize objects even when they are deformed or appear in different lighting conditions.
- **Max Pooling** is a common pooling technique. It extracts the maximum value within a selected region of the feature map.

#### Example of Max Pooling:
**Input Feature Map**:
\[
\begin{bmatrix}
1 & 3 & 2 & 4 \\
5 & 6 & 7 & 8 \\
9 & 2 & 4 & 3 \\
6 & 7 & 8 & 9
\end{bmatrix}
\]

If we perform max pooling with a stride of 2, the output feature map will be:

**Output Feature Map**:
\[
\begin{bmatrix}
6 & 8 \\
9 & 9
\end{bmatrix}
\]

### 3. **Fully Connected Layers**
- These layers are similar to those in MLPs and are typically used for the final output.
- They are used for tasks such as:
    - **Classification**: Predicting categories.
    - **Regression**: Predicting continuous values.
    - **Probability Estimation**: Outputting probabilities for different classes.

### 4. **Activation Layers**
- Activation layers, such as ReLU (Rectified Linear Unit), introduce non-linearity into the model.
- They can down-sample the output from previous layers into a range (e.g., 0 to 1) or compute binary values, depending on the task.

---

CNNs are powerful tools for processing multi-dimensional data, especially images, and have become a cornerstone of modern deep learning applications.

In [None]:
import numpy as np
import pandas as pd
import scipy
import scipy.signal
import pickle
import os
from typing import Union


class NeuralNetwork():
    def __init__(self,
                 input_size: int,
                 hidden_nodes: np.ndarray,
                 output_size: np.ndarray,
                 learning_rate: Union[int, float] = 0.001, # This can be a float or an int, even though an int is not something i recommend.(I am a good engineer)
                 activation:str = 'relu',
                 output_activation:str ='sigmoid'):
        """
        Initializes the Neural Network with error checking for parameters.

        :param input_size: Number of input features (must be positive integer)
        :param hidden_nodes: Integer or List specifying number of neurons in each hidden layer (each must be positive integer)
        :param output_size: Number of output neurons (must be positive integer)
        :param learning_rate: Learning rate for optimizer (must be positive float)
        :param activation: Activation function for hidden layers ('sigmoid', 'relu', 'leaky_relu', 'linear')
        :param output_activation: Activation function for the output layer ('sigmoid', 'softmax', 'linear')

        Raises:
            TypeError: If input types are incorrect.
            ValueError: If input values are invalid (e.g., non-positive sizes, invalid activation names).
        """
        # Input Validation
        if not isinstance(input_size, int) or input_size <= 0:
            raise ValueError(f"input_size must be a positive integer, got {input_size}")
        if not isinstance(output_size, int) or output_size <= 0:
            raise ValueError(f"output_size must be a positive integer, got {output_size}")
        if not isinstance(learning_rate, (float, int)) or learning_rate <= 0:
            raise ValueError(f"learning_rate must be a positive number, got {learning_rate}")
        if not isinstance(activation, str):
             raise TypeError(f"activation must be a string, got {type(activation)}")
        if not isinstance(output_activation, str):
             raise TypeError(f"output_activation must be a string, got {type(output_activation)}")

        # alidate Hidden_nodes content
        if isinstance(hidden_nodes, int):
            if hidden_nodes <= 0:
                 raise ValueError(f"If hidden_nodes is an integer, it must be positive, got {hidden_nodes}")
            processed_hidden_nodes = [hidden_nodes] # convert single int to list
        elif isinstance(hidden_nodes, list):
            if not all(isinstance(n, int) and n > 0 for n in hidden_nodes):
                 raise ValueError(f"If hidden_nodes is a list, all elements must be positive integers, got {hidden_nodes}")
            processed_hidden_nodes = hidden_nodes # a list already..
        else:
            raise TypeError(f"hidden_nodes must be a positive integer or a list of positive integers, got {type(hidden_nodes)}")

        self.input_size = input_size
        self.hidden_nodes = processed_hidden_nodes
        self.output_size = output_size
        self.learning_rate = float(learning_rate)
        self.activation_type = activation
        self.output_activation_type = output_activation

        # This part is safe now, the checks above have saved it...
        layer_sizes = [self.input_size] + self.hidden_nodes + [self.output_size]
        self.num_layers = len(layer_sizes)

        # Init weights and
        self.weights = []
        self.biases = []
        for i in range(self.num_layers - 1):
            # Layer sizes are guaranteed positive ints here
            # Now Check for potential division by zero
            fan_in = layer_sizes[i]
            fan_out = layer_sizes[i+1]
            limit = np.sqrt(6 / (fan_in + fan_out)) # Use Xavier Method.. Safer..

            self.weights.append(np.random.uniform(-limit, limit, (fan_in, fan_out)))
            self.biases.append(np.zeros(fan_out))

        try:
            self.activation_func = self._get_activation(self.activation_type)
            self.activation_derivative = self._get_activation_derivative(self.activation_type)
            self.output_activation_func = self._get_activation(self.output_activation_type)
            self.output_activation_derivative = self._get_activation_derivative(self.output_activation_type)
        except ValueError as e:
             raise ValueError(f"Initialization failed: {e}") from e

        """ The adams variables initiated using the valid values of wieghts and biases...
            m - first moment,
            v - second moment,
            t - times step.
            - We are changing both the biases and weights from back prop,
             hence the two moments ...
        """
        self.m_weights = [np.zeros_like(w) for w in self.weights]
        self.v_weights = [np.zeros_like(w) for w in self.weights]
        self.m_biases = [np.zeros_like(b) for b in self.biases]
        self.v_biases = [np.zeros_like(b) for b in self.biases]
        self.t = 0 # Time step

    #
    def _get_activation(self, name):

        if not isinstance(name, str):
             raise TypeError(f"Activation name must be a string, got {type(name)}")

        if name == 'sigmoid':
            return self.sigmoid
        elif name == 'relu':
            return self.relu
        elif name == 'leaky_relu':
            return self.leaky_relu
        elif name == 'softmax':
            return self.softmax
        elif name == 'linear':
            return lambda x: x # No activation func applied,
        else:

            raise ValueError(f"Unknown activation function: '{name}'. Valid options are 'sigmoid', 'relu', 'leaky_relu', 'softmax', 'linear'.")

    def _get_activation_derivative(self, name):
        # Added check for name type, again...
        if not isinstance(name, str):
             raise TypeError(f"Activation name must be a string, got {type(name)}")

        if name == 'sigmoid':
            return self.sigmoid_derivative
        elif name == 'relu':
            return self.relu_derivative
        elif name == 'leaky_relu':
            return self.leaky_relu_derivative
        elif name == 'linear':
             return lambda x: np.ones_like(x) # Derivative of x => 1? y=mx+c
        elif name == 'softmax':
          ### Need to research about the derivative of this... Buggy
             return lambda activated_output: activated_output * (1 - activated_output) # need to research abot this part more..
        else:
            raise ValueError(f"Unknown activation function derivative for: '{name}'. Valid options are 'sigmoid', 'relu', 'leaky_relu', 'linear', 'softmax'.")

    # Definition of the activation functons..
    def sigmoid(self, x):
        x_clipped = np.clip(x, -500, 500)
        return 1 / (1 + np.exp(-x_clipped))

    def sigmoid_derivative(self, activated_output):
        return activated_output * (1 - activated_output)

    def relu(self, x):
        return np.maximum(0, x)

    def relu_derivative(self, activated_output):
        return np.where(activated_output > 0, 1, 0)

    def leaky_relu(self, x, alpha=0.01):
        return np.where(x > 0, x, x * alpha)

    def leaky_relu_derivative(self, activated_output, alpha=0.01):
        dx = np.ones_like(activated_output)
        dx[activated_output < 0] = alpha
        return dx

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
        # Add small epsilon to prevent division by zero if all exp(x) are zero,
        # Even thoo its unikely...
        return exp_x / (np.sum(exp_x, axis=-1, keepdims=True) + 1e-9)


    def feedForward(self, inputs):
        """ Performs forward pass storing outputs and pre-activations (z values """

        if not isinstance(inputs, np.ndarray):
            raise TypeError(f"Input to feedForward must be a numpy array, got {type(inputs)}")
        if inputs.ndim == 1:
            if inputs.shape[0] != self.input_size:
                 raise ValueError(f"Input sample has shape {inputs.shape} ({inputs.shape[0]} features), but network expects {self.input_size} features.")
            current_activation = inputs # Keep as 1D for first dot product? Let's stick to 2D internal standard
            current_activation = current_activation.reshape(1, -1)

        elif inputs.ndim == 2:
            # Batch input, check feature dimension..
            if inputs.shape[1] != self.input_size:
                raise ValueError(f"Input batch has shape {inputs.shape} ({inputs.shape[1]} features/sample), but network expects {self.input_size} features.")
            current_activation = inputs
        else:
             raise ValueError(f"Input array must be 1D (single sample) or 2D (batch), but got ndim={inputs.ndim}")

        self.layer_inputs = [current_activation] # Store inputs (batch_size, features)
        self.z_values = [] # Store pre-activation values (weighted sum + bias)

        # (Error checking for matrix multiplication compatibility....
        for i in range(self.num_layers - 2):
            # Check dimensions before dot product
            if current_activation.shape[1] != self.weights[i].shape[0]:
                raise RuntimeError(f"Dimension mismatch before layer {i}: Activation shape {current_activation.shape} incompatible with weight shape {self.weights[i].shape}")

            z = np.dot(current_activation, self.weights[i]) + self.biases[i]
            self.z_values.append(z)
            current_activation = self.activation_func(z)
            self.layer_inputs.append(current_activation)

        if current_activation.shape[1] != self.weights[-1].shape[0]:
             raise RuntimeError(f"Dimension mismatch before output layer: Activation shape {current_activation.shape} incompatible with weight shape {self.weights[-1].shape}")

        z_out = np.dot(current_activation, self.weights[-1]) + self.biases[-1]
        self.z_values.append(z_out)
        output = self.output_activation_func(z_out)
        self.layer_inputs.append(output) # Store final output activation(Output Node?)

        # Final output shape check
        if output.shape[1] != self.output_size:
            raise RuntimeError(f"Internal Error: Final output shape {output.shape} does not match network output_size {self.output_size}")
            # The maths i have done in here should be buggy if this error shows up...

        return output

    def mean_squared_error(self, y_true, y_pred):
        return np.mean((y_true - y_pred) ** 2)

    def mean_squared_error_derivative(self, y_true, y_pred):
        return y_pred - y_true

    # Back prop
    def backpropagation(self, y_true, y_pred):
        """ Performs backpropagation and calculates gradients for weights and biases. """
        # Check for errors in the input values parsed
        if not isinstance(y_true, np.ndarray) or not isinstance(y_pred, np.ndarray):
            raise TypeError(f"y_true and y_pred must be numpy arrays, got {type(y_true)}, {type(y_pred)}")
        if y_true.shape != y_pred.shape:
            raise ValueError(f"Shape mismatch between y_true {y_true.shape} and y_pred {y_pred.shape}")
        if y_pred.ndim != 2: # Should be (batch_size, output_size) coming from feedForward
             raise ValueError(f"y_pred should be a 2D array (batch_size, output_size), got shape {y_pred.shape}")
        if y_pred.shape[1] != self.output_size:
             raise ValueError(f"y_pred second dimension ({y_pred.shape[1]}) does not match network output_size ({self.output_size})")
        if self.layer_inputs[-1].shape != y_pred.shape:
             raise RuntimeError(f"Internal state mismatch: Last layer input shape {self.layer_inputs[-1].shape} differs from y_pred shape {y_pred.shape}")


        # Initialize gradients For Adams....
        grad_weights = [np.zeros_like(w) for w in self.weights]
        grad_biases = [np.zeros_like(b) for b in self.biases]

        try:
            if self.output_activation_type in ['sigmoid', 'linear']:
                error_derivative = self.mean_squared_error_derivative(y_true, y_pred)

                deriv_output = self.output_activation_derivative(y_pred)
                if deriv_output.shape != y_pred.shape:
                     raise RuntimeError(f"Derivative of output activation {self.output_activation_type} produced unexpected shape {deriv_output.shape}, expected {y_pred.shape}")
                output_delta = error_derivative * deriv_output
            elif self.output_activation_type == 'softmax':
                 # Assume Cross-Entropy Loss implicitly used in training loop / gradient calc
                 output_delta = y_pred - y_true # This is ~ Chain rule,dE/dz directly for Cross Entroopy Loss + Softmax
            else:
                 raise ValueError(f"Unsupported output activation '{self.output_activation_type}' encountered during backpropagation.")
        except Exception as e:
             print(f"Error during output delta calculation: {e}")
             print(f"y_true shape: {y_true.shape}, y_pred shape: {y_pred.shape}, Output activation: {self.output_activation_type}")
             raise e # Re-raise after printing info

        # Shape check for deltas
        if output_delta.shape != y_pred.shape:
             raise RuntimeError(f"Internal Error: output_delta shape {output_delta.shape} does not match y_pred shape {y_pred.shape}")

        # --- Calculate Grads fro output nodes
        last_hidden_activation = self.layer_inputs[-2] # Input that produced y_pred
        if last_hidden_activation.shape[0] != output_delta.shape[0]: # Batch size check
            raise RuntimeError(f"Batch size mismatch: last hidden activation {last_hidden_activation.shape[0]} vs output delta {output_delta.shape[0]}")
        if last_hidden_activation.shape[1] != grad_weights[-1].shape[0] or output_delta.shape[1] != grad_weights[-1].shape[1]:
            raise RuntimeError(f"Dimension mismatch for output weights gradient: Activ {last_hidden_activation.shape}, Delta {output_delta.shape}, Expected Weight Grad {grad_weights[-1].shape}")

        grad_weights[-1] = np.dot(last_hidden_activation.T, output_delta)
        grad_biases[-1] = np.sum(output_delta, axis=0)

        # gradient shapes match parameter shapes?
        if grad_weights[-1].shape != self.weights[-1].shape:
             raise RuntimeError(f"Output weight gradient shape {grad_weights[-1].shape} mismatch with weight shape {self.weights[-1].shape}")
        if grad_biases[-1].shape != self.biases[-1].shape:
             raise RuntimeError(f"Output bias gradient shape {grad_biases[-1].shape} mismatch with bias shape {self.biases[-1].shape}")

        # Propagate Error Backwards Through Hidden Layers
        delta = output_delta
        for i in range(self.num_layers - 2, 0, -1): # Iterate backwards from last hidden layer index (num_layers-2) down to 1
            # Dimension checks before dot product
            if delta.shape[1] != self.weights[i].shape[1]:
                 raise RuntimeError(f"Dimension mismatch backpropagating error at layer {i}: delta shape {delta.shape} vs weight shape {self.weights[i].shape}")

            error_hidden = np.dot(delta, self.weights[i].T)

            activation_h = self.layer_inputs[i]
            # Shape check: error_hidden should match activation_h shape
            if error_hidden.shape != activation_h.shape:
                 raise RuntimeError(f"Shape mismatch for hidden error: Error shape {error_hidden.shape} vs Activation shape {activation_h.shape} at layer {i}")

            # Calculate delta for this hidden layer: dE/dz_h = dE/da_h * da_h/dz_h -> Chain Rule...
            deriv_activation_h = self.activation_derivative(activation_h)
            if deriv_activation_h.shape != activation_h.shape:
                 raise RuntimeError(f"Derivative of hidden activation {self.activation_type} produced unexpected shape {deriv_activation_h.shape}, expected {activation_h.shape} at layer {i}")

            delta = error_hidden * deriv_activation_h
            prev_layer_activation = self.layer_inputs[i-1]

            # Dimension checks before dot produc
            if prev_layer_activation.shape[0] != delta.shape[0]:
                raise RuntimeError(f"Batch size mismatch computing hidden grad at layer {i-1}: Activ {prev_layer_activation.shape[0]} vs Delta {delta.shape[0]}")
            if prev_layer_activation.shape[1] != grad_weights[i-1].shape[0] or delta.shape[1] != grad_weights[i-1].shape[1]:
                raise RuntimeError(f"Dimension mismatch for hidden weights gradient layer {i-1}: Activ {prev_layer_activation.shape}, Delta {delta.shape}, Expected Grad {grad_weights[i-1].shape}")

            grad_weights[i-1] = np.dot(prev_layer_activation.T, delta)
            grad_biases[i-1] = np.sum(delta, axis=0)
            if grad_weights[i-1].shape != self.weights[i-1].shape:
                 raise RuntimeError(f"Hidden weight gradient shape {grad_weights[i-1].shape} mismatch with weight shape {self.weights[i-1].shape} at layer {i-1}")
            if grad_biases[i-1].shape != self.biases[i-1].shape:
                 raise RuntimeError(f"Hidden bias gradient shape {grad_biases[i-1].shape} mismatch with bias shape {self.biases[i-1].shape} at layer {i-1}")


        return grad_weights, grad_biases


    # --- Adams
    def apply_adam_optimizer(self, grad_weights, grad_biases, beta1=0.9, beta2=0.999, epsilon=1e-8):
        """ Updates weights and biases using Adam optimizer. """
        if not isinstance(grad_weights, list) or not all(isinstance(gw, np.ndarray) for gw in grad_weights):
             raise TypeError("grad_weights must be a list of numpy arrays.")
        if not isinstance(grad_biases, list) or not all(isinstance(gb, np.ndarray) for gb in grad_biases):
             raise TypeError("grad_biases must be a list of numpy arrays.")
        if len(grad_weights) != len(self.weights) or len(grad_biases) != len(self.biases):
             raise ValueError("Number of gradient arrays does not match number of parameter arrays.")
        for i in range(len(self.weights)):
             if grad_weights[i].shape != self.weights[i].shape:
                  raise ValueError(f"Shape mismatch for weight gradient at index {i}: got {grad_weights[i].shape}, expected {self.weights[i].shape}")
             if grad_biases[i].shape != self.biases[i].shape:
                  raise ValueError(f"Shape mismatch for bias gradient at index {i}: got {grad_biases[i].shape}, expected {self.biases[i].shape}")

        self.t += 1

        for i in range(len(self.weights)):

            self.m_weights[i] = beta1 * self.m_weights[i] + (1 - beta1) * grad_weights[i]
            self.m_biases[i] = beta1 * self.m_biases[i] + (1 - beta1) * grad_biases[i]

            self.v_weights[i] = beta2 * self.v_weights[i] + (1 - beta2) * (grad_weights[i] ** 2)
            self.v_biases[i] = beta2 * self.v_biases[i] + (1 - beta2) * (grad_biases[i] ** 2)

            m_hat_weights = self.m_weights[i] / (1 - beta1 ** self.t)
            m_hat_biases = self.m_biases[i] / (1 - beta1 ** self.t)
            v_hat_weights = self.v_weights[i] / (1 - beta2 ** self.t)
            v_hat_biases = self.v_biases[i] / (1 - beta2 ** self.t)

            self.weights[i] -= self.learning_rate * m_hat_weights / (np.sqrt(v_hat_weights) + epsilon)
            self.biases[i] -= self.learning_rate * m_hat_biases / (np.sqrt(v_hat_biases) + epsilon)


    # Training Lo[]s
    def train(self, X, y, epochs=1000, batch_size=32):
        """ Trains the network using mini-batch gradient descent and Adam optimizer. """
        if not isinstance(X, np.ndarray) or not isinstance(y, np.ndarray):
             raise TypeError(f"X and y must be numpy arrays, got {type(X)}, {type(y)}")
        if X.ndim != 2:
             raise ValueError(f"Input data X must be a 2D array (samples, features), got ndim={X.ndim}")
        if X.shape[0] != y.shape[0]:
             raise ValueError(f"Number of samples mismatch between X ({X.shape[0]}) and y ({y.shape[0]})")
        if X.shape[1] != self.input_size:
             raise ValueError(f"Input data X features ({X.shape[1]}) does not match network input_size ({self.input_size})")

        num_samples = X.shape[0]

        if not isinstance(epochs, int) or epochs <= 0:
             raise ValueError(f"epochs must be a positive integer, got {epochs}")
        if not isinstance(batch_size, int) or batch_size <= 0:
             raise ValueError(f"batch_size must be a positive integer, got {batch_size}")
        if batch_size > num_samples:
             print(f"Warning: batch_size ({batch_size}) is larger than number of samples ({num_samples}). Setting batch_size to {num_samples}.")
             batch_size = num_samples

        expected_y_dim = self.output_size
        if y.ndim == 1:
             if self.output_size != 1:
                 raise ValueError(f"Target data y is 1D, but network output_size is {self.output_size}. Reshape y or adjust network.")
             y = y.reshape(-1, 1) #
        elif y.ndim == 2:
             if y.shape[1] != self.output_size:
                 raise ValueError(f"Target data y has {y.shape[1]} features, but network output_size is {self.output_size}.")
        else:
             raise ValueError(f"Target data y must be 1D or 2D array, got ndim={y.ndim}")


        for epoch in range(epochs):
            permutation = np.random.permutation(num_samples)
            X_shuffled = X[permutation]
            y_shuffled = y[permutation] # y is now guaranteed 2D

            total_loss = 0
            num_batches = 0 # Count actual batches processed

            for i in range(0, num_samples, batch_size):
                end_idx = min(i + batch_size, num_samples)
                if i == end_idx: continue

                X_batch = X_shuffled[i:end_idx]
                y_batch = y_shuffled[i:end_idx]

                if y_batch.shape[1] != self.output_size:
                    raise RuntimeError(f"Internal Error: y_batch shape {y_batch.shape} inconsistent with output_size {self.output_size}")
                y_pred = self.feedForward(X_batch)


                try:
                    if self.output_activation_type == 'softmax':
                         # Check if y_batch looks like one-hot encoding for softmax/CCE
                         if not np.all((y_batch == 0) | (y_batch == 1)) or not np.all(np.sum(y_batch, axis=1) == 1):
                              print(f"Warning: Using Softmax/CrossEntropy loss, but y_batch doesn't appear to be one-hot- at epoch {epoch}, batch {i}.")
                              pass
                         # epsilon for log stability
                         loss = -np.mean(np.sum(y_batch * np.log(np.clip(y_pred, 1e-9, 1.0)), axis=1))
                    elif self.output_activation_type in ['sigmoid', 'linear']:
                        loss = self.mean_squared_error(y_batch, y_pred)
                    else:
                         # Scaught earlier, but for safety
                         raise RuntimeError(f"Unsupported output activation '{self.output_activation_type}' during loss calculation.")

                    if np.isnan(loss) or np.isinf(loss):
                         raise ValueError(f"Loss became NaN or Inf at epoch {epoch}, batch start {i}. Check learning rate, data scaling, or model stability.")
                    total_loss += loss
                    num_batches += 1

                except Exception as e:
                    print(f"Error during loss calculation: {e}")
                    print(f"y_batch shape: {y_batch.shape}, y_pred shape: {y_pred.shape}, Loss type based on: {self.output_activation_type}")
                    raise e
                grad_weights, grad_biases = self.backpropagation(y_batch, y_pred)
                self.apply_adam_optimizer(grad_weights, grad_biases)
            avg_loss = total_loss / num_batches if num_batches > 0 else total_loss # Avoid division by zero if dataset smaller than batch_size

            if epoch % max(1, epochs // 10) == 0 or epoch == epochs - 1: # Avoid modulo zero
                print(f"Epoch {epoch}, Loss: {avg_loss:.4f}")


    def predict(self, X):
        """ Predicts output for new input data X. """
        if not isinstance(X, np.ndarray):
            raise TypeError(f"Input X must be a numpy array, got {type(X)}")

        original_ndim = X.ndim
        if original_ndim == 1:
            # Check shape for single sample
            if X.shape[0] != self.input_size:
                raise ValueError(f"Input sample has shape {X.shape} ({X.shape[0]} features), but network expects {self.input_size} features.")
            X_proc = X.reshape(1, -1) # Reshape tO 2D for feedForward prop
        elif original_ndim == 2:
             # Check feature dimension for batch
             if X.shape[1] != self.input_size:
                 raise ValueError(f"Input batch has shape {X.shape} ({X.shape[1]} features/sample), but network expects {self.input_size} features.")
             X_proc = X
        else:
             raise ValueError(f"Input array X must be 1D (single sample) or 2D (batch), but got ndim={X.ndim}")
        output = self.feedForward(X_proc)

        if original_ndim == 1:
            return output.flatten()
        else:
            return output


    # Save thie model as a file that would be loaded later on.. Sucka move here
    def save_model(self, filename):
        """ Saves the model's architecture and parameters using pickle. """
        if not isinstance(filename, str) or not filename:
            raise ValueError("Filename must be a non-empty string.")

        model_data = {
            'input_size': self.input_size, 'hidden_nodes': self.hidden_nodes, 'output_size': self.output_size,
            'learning_rate': self.learning_rate, 'activation': self.activation_type, 'output_activation': self.output_activation_type,
            'weights': self.weights, 'biases': self.biases,
            'adam_state': {'m_weights': self.m_weights, 'v_weights': self.v_weights, 'm_biases': self.m_biases, 'v_biases': self.v_biases, 't': self.t}
        }
        try:
            with open(filename, 'wb') as file: # Opwn the file as binary writing
                pickle.dump(model_data, file)
            print(f"Model saved to {filename}")
        except IOError as e:
             raise IOError(f"Could not write model to file '{filename}': {e}") from e
        except pickle.PicklingError as e:
             raise pickle.PicklingError(f"Could not serialize model data for saving: {e}") from e


    @classmethod
    def load_model(cls, filename):
        """ Loads a model from a file saved by save_model. """
        if not isinstance(filename, str) or not filename:
            raise ValueError("Filename must be a non-empty string.")
        if not os.path.exists(filename):
             raise FileNotFoundError(f"Model file not found at '{filename}'")

        try:
            with open(filename, 'rb') as file:
                model_data = pickle.load(file)
        except FileNotFoundError:
            raise FileNotFoundError(f"Model file not found at '{filename}'")
        except pickle.UnpicklingError as e:
            raise pickle.UnpicklingError(f"Error unpickling model file '{filename}'. File might be corrupted or incompatible: {e}") from e
        except IOError as e:
            raise IOError(f"Could not read model file '{filename}': {e}") from e

        # Validate Teh Data structure of the pickle file if format is what we expect..
        required_keys = ['input_size', 'hidden_nodes', 'output_size', 'weights', 'biases']
        optional_keys_with_defaults = {
            'learning_rate': 0.001, 'activation': 'relu', 'output_activation': 'sigmoid',
            'adam_state': None
        }
        loaded_keys = model_data.keys()

        for key in required_keys:
            if key not in loaded_keys:
                raise ValueError(f"Loaded model data from '{filename}' is missing required key: '{key}'")

        # Use get with defaults for optional keys
        init_args = {key: model_data[key] for key in required_keys}
        for key, default in optional_keys_with_defaults.items():
            init_args[key] = model_data.get(key, default)

        try:
             nn = cls(input_size=init_args['input_size'],
                      hidden_nodes=init_args['hidden_nodes'],
                      output_size=init_args['output_size'],
                      learning_rate=init_args['learning_rate'],
                      activation=init_args['activation'],
                      output_activation=init_args['output_activation'])
        except (TypeError, ValueError) as e:
             raise ValueError(f"Loaded parameters from '{filename}' are invalid for network initialization: {e}") from e


        expected_num_param_layers = len(nn.weights) # Based on loaded sizes
        if len(model_data['weights']) != expected_num_param_layers or len(model_data['biases']) != expected_num_param_layers:
             raise ValueError(f"Architecture mismatch in '{filename}': Expected {expected_num_param_layers} weight/bias layers based on loaded sizes, but file contains {len(model_data['weights'])}/{len(model_data['biases'])}.")

        # Check shapes within each layer match
        for i in range(expected_num_param_layers):
            if not isinstance(model_data['weights'][i], np.ndarray) or model_data['weights'][i].shape != nn.weights[i].shape:
                 raise ValueError(f"Weight shape mismatch in layer {i} of '{filename}': Expected {nn.weights[i].shape}, file has {model_data['weights'][i].shape if isinstance(model_data['weights'][i], np.ndarray) else type(model_data['weights'][i])}")
            if not isinstance(model_data['biases'][i], np.ndarray) or model_data['biases'][i].shape != nn.biases[i].shape:
                 raise ValueError(f"Bias shape mismatch in layer {i} of '{filename}': Expected {nn.biases[i].shape}, file has {model_data['biases'][i].shape if isinstance(model_data['biases'][i], np.ndarray) else type(model_data['biases'][i])}")

        nn.weights = model_data['weights']
        nn.biases = model_data['biases']

        if init_args['adam_state'] is not None:
             try:
                adam_state = init_args['adam_state']
                if isinstance(adam_state, dict) and all(k in adam_state for k in ['m_weights', 'v_weights', 'm_biases', 'v_biases', 't']):
                     if len(adam_state['m_weights']) == expected_num_param_layers and \
                        len(adam_state['v_weights']) == expected_num_param_layers and \
                        len(adam_state['m_biases']) == expected_num_param_layers and \
                        len(adam_state['v_biases']) == expected_num_param_layers and \
                        isinstance(adam_state['t'], int):
                          nn.m_weights = adam_state['m_weights']
                          nn.v_weights = adam_state['v_weights']
                          nn.m_biases = adam_state['m_biases']
                          nn.v_biases = adam_state['v_biases']
                          nn.t = adam_state['t']
                          print("Adam optimizer state loaded.")
                     else:
                          print("Warning: Adam state found in file but structure/size mismatch. Optimizer state not loaded.")
                else:
                     print("Warning: Adam state found in file but format is invalid. Optimizer state not loaded.")
             except Exception as e:
                  print(f"Warning: Error loading Adam state ({e}). Optimizer state not loaded.")


        print(f"Model loaded successfully from {filename}")
        return nn
# Or gate training
X_or = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])
# Expected Output
y_or = np.array([
    [0],
    [1],
    [1],
    [1]
])


nn_or = NeuralNetwork(input_size=2, hidden_nodes=[8], output_size=1,
                      learning_rate=0.01, activation='relu', output_activation='sigmoid')
nn_or.train(X_or, y_or, epochs=5000, batch_size=4)

predictions = nn_or.predict(X_or)
for i in range(len(X_or)):
    print(f"Input: {X_or[i]}, Target: {y_or[i]}, Prediction: {predictions[i][0]:.4f} -> {int(predictions[i][0] > 0.5)}")


del nn_or ## Remove the instance of the Neural Network whenever done withit..