
# APPROACH:-

## 1. Neural Networks:
A neural network is a system of algorithms that attempts to recognize underlying relationships in a set of data. It emulates the way the human brain works.

## 2. Architecture:
In the problem, we were dealing with a fully connected feedforward neural network with:
- **Input layer**: Initial data for the neural network.
- **Hidden layers**: Intermediate layer between input and output.
- **Output layer**: Final decision-making layer.

## 3. Weight Initialization (`init_weights`):
In a neural network, weights are the adjustable parameters that the network learns. Properly initializing these weights can greatly affect the training performance.
- **Random Initialization**: We initialized weights randomly from a normal distribution.

## 4. Activation Functions:
Activation functions introduce non-linearity to the model, allowing the network to learn from error and make adjustments, which is essential for learning complex patterns.
- **Sigmoid Function**: It squashes values between 0 and 1. It is commonly used for binary classification.

## 5. Feedforward (`feedforward`):
This is the process neural networks use to turn the input into an output.
- We took the input data and passed it through the network, layer by layer, until the output layer was reached.
- The data is multiplied by the weights and biased, and then passed through an activation function.

## 6. Loss Function (`loss`):
This measures how far off our predictions are from the actual values.
- **Mean Squared Error (MSE)**: It measures the average squared differences between predicted and actual values. It's widely used for regression problems but can also be used in classification.

## 7. Backpropagation (`backprop`):
It is the method used to update the neural network's weights.
- We computed the gradient of the loss function with respect to each weight by the chain rule.
- We then updated the weights in the direction that reduces the loss.

## 8. Training (`train`):
Training a neural network means adjusting its weights based on the data and the defined loss function.
- We repeatedly applied the feedforward and backpropagation operations over a defined number of iterations (epochs).
- After each epoch, we updated the weights to minimize the error.

## 9. Predictions (`predict`):
Once the model has been trained, it can make predictions. We fed a new input into the trained network and obtained the output.
"""



# LET US FIRST INITIALISE THE WEIGHTS

## LET US START WITH FIRST FUNCTION

## `init_weights` Function Explanation

This function initializes the weights for a fully connected neural network.

### Parameters:
- **`n_inputs`**: This indicates the number of input nodes.
- **`n_hidden`**: This represents the number of nodes in each of the two hidden layers.
- **`n_output`**: This is the number of output nodes.

### Inside the function:

1. **Weight Matrix `W0`**:
    - Connects the input layer to the first hidden layer.
    - Its dimensions are `(n_inputs + 1, n_hidden)`. We add 1 to account for the bias node.
    - Initialized randomly using a normal distribution.

2. **Weight Matrix `W1`**:
    - Connects the first hidden layer to the second hidden layer.
    - Dimensions are `(n_hidden + 1, n_hidden)`, with the additional 1 for the bias node.
    - Initialized randomly using a normal distribution.

3. **Weight Matrix `W2`**:
    - Connects the second hidden layer to the output layer.
    - Dimensions are `(n_hidden + 1, n_output)`.
    - Initialized randomly using a normal distribution.

### Returns:
The function returns the three weight matrices: `W0`, `W1`, and `W2`.

---

### Test:

After defining our function, we then test it. We initialize a neural network with:
- 10 nodes in the input layer.
- 5 nodes in each of the two hidden layers.
- 3 nodes in the output layer.

We then print out the initialized weights for `W0`, `W1`, and `W2`.


In [1]:
import numpy as np

def init_weights(n_inputs, n_hidden, n_output):
    # Initialize weights from input to hidden layer
    # We add +1 for the bias unit
    W0 = np.random.randn(n_inputs + 1, n_hidden)
    
    # Initialize weights from hidden to hidden layer
    W1 = np.random.randn(n_hidden + 1, n_hidden)
    
    # Initialize weights from hidden to output layer
    W2 = np.random.randn(n_hidden + 1, n_output)
    
    return W0, W1, W2

# Test the function
W0, W1, W2 = init_weights(10, 5, 3)
W0, W1, W2


(array([[ 1.29993947,  0.13788758, -1.07631303, -0.68016606, -1.15312932],
        [ 0.09221565,  0.61257967, -0.97300133,  1.12316341,  0.53495125],
        [ 0.22216226,  0.0224775 ,  1.66905806, -0.64047523, -1.04593303],
        [-0.29761435, -0.45045373,  0.04203176, -0.50638727, -1.25595819],
        [ 0.42881808, -0.37531254,  1.1830806 , -1.19682885, -0.95179684],
        [ 1.87365521,  1.15504088,  0.45363739,  1.06597301,  0.15452554],
        [-0.55625774, -0.98304081, -0.64691355, -0.52090097,  0.68190889],
        [-1.68681552, -0.04825258,  1.27320166, -0.65022106, -1.24150116],
        [ 1.17079087,  0.02805608,  2.26899707, -0.63604714,  0.23275924],
        [ 0.18311584,  0.1127135 , -0.67584401,  0.02156453,  0.51861442],
        [ 0.6116853 , -1.26759622,  0.66129634, -1.50376682,  0.39514773]]),
 array([[-1.20099435,  2.14924263,  1.13635234,  1.38092987,  0.9582902 ],
        [ 0.3914314 , -0.14233191, -0.74220784,  3.43380691, -0.03390262],
        [-0.96602902,  

# LETS US TRY TO CREATE A SIGMOID FUNCTION AND FORWARD PROPAGATION

## Explanation

### `sigmoid` Function:

This function calculates the sigmoid activation, which is a common activation function in neural networks. The sigmoid function maps any input into a value between 0 and 1, making it useful for binary classification problems.

#### Formula:
\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

---

### `feedforward` Function:

This function represents the feedforward process of a neural network. During feedforward, the data moves from the input layer, through the hidden layers, and finally to the output layer.

#### Parameters:
- **`x`**: Input data.
- **`W0`, `W1`, `W2`**: Weight matrices for the layers.

#### Process:

1. **Input Layer**:
    - The activations `a0` are set to the input `x`.

2. **Input to First Hidden Layer**:
    - Pre-activations `z1` are calculated by multiplying the activations `a0` with the weight matrix `W0`.
    - Activations `a1` are computed by applying the sigmoid function to `z1`.
    - A bias unit (a column of ones) is added to the activations `a1`.

3. **First Hidden Layer to Second Hidden Layer**:
    - Pre-activations `z2` are calculated by multiplying `a1` with the weight matrix `W1`.
    - Activations `a2` are computed by applying the sigmoid function to `z2`.
    - A bias unit (a column of ones) is added to the activations `a2`.

4. **Second Hidden Layer to Output Layer**:
    - Pre-activations `z3` are calculated by multiplying `a2` with the weight matrix `W2`.
    - Activations `a3` are computed by applying the sigmoid function to `z3`.

#### Returns:
The function returns the pre-activations `z1`, `z2`, `z3` and activations `a0`, `a1`, `a2`, `a3` for each layer.

---

### Test:

The function is then tested using a sample input `x_sample` of shape `(1, 11)`. The activations of the output layer `a3` are printed.


In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def feedforward(x, W0, W1, W2):
    # Input layer activations
    a0 = x
    
    # Input to Hidden layer
    z1 = np.dot(a0, W0) #a.W
    a1 = sigmoid(z1) #simoid(a.w)=sigmoid(Z)------implementation of function
    
    # Add bias to the activations of the hidden layer
    a1 = np.concatenate((a1, np.ones((a1.shape[0], 1))), axis=1)
    
    # Hidden to Hidden layer
    z2 = np.dot(a1, W1)
    a2 = sigmoid(z2)
    
    # Add bias to the activations of the hidden layer
    a2 = np.concatenate((a2, np.ones((a2.shape[0], 1))), axis=1)
    
    # Hidden to Output layer
    z3 = np.dot(a2, W2)
    a3 = sigmoid(z3)
    
    return z1, z2, z3, a0, a1, a2, a3

# Test the function with a sample input
x_sample = np.random.randn(1, 11)
z1, z2, z3, a0, a1, a2, a3 = feedforward(x_sample, W0, W1, W2)
a3


array([[0.1530338 , 0.7409219 , 0.89245982]])

# LET US TRY TO IMPLEMENT PREDICT FUNCTION AND TEST IT

# `predict` Function Explanation:

This function is designed to predict the output of the neural network given a set of input data.

### Parameters:
- **`x`**: The input data for which we want to make a prediction.
- **`W0`, `W1`, `W2`**: The weight matrices of the neural network.

### Inside the function:

1. **Feedforward Process**:
    - The function utilizes the previously defined `feedforward` function to get the activations of the output layer, `a3`.
    - While the `feedforward` function returns the activations and pre-activations for all layers, the `predict` function only requires the final output, so we use `_` to discard the other returned values.

2. **Return**: 
    - The function then returns the activations of the output layer, `a3`, as the prediction.

---

### Test:

After defining the `predict` function, we test it using a sample input, `x_test`, with a shape of `(1, 11)`. The predicted output, `y_pred`, is then printed.


In [11]:
 def predict(x, W0, W1, W2):
    _, _, _, _, _, _, a3 = feedforward(x, W0, W1, W2)
    return a3

# Test the function with a sample input
x_test = np.random.randn(1, 11)
y_pred = predict(x_test, W0, W1, W2)
y_pred


array([[0.26239176, 0.33761267, 0.33455973]])

## `loss` Function Explanation:

The `loss` function computes the Mean Squared Error (MSE) loss, which measures the average squared differences between the predicted and actual values.

### Formula for Mean Squared Error:
\[ \text{MSE} = \frac{1}{2m} \sum_{i=1}^{m} (Y_{\text{pred}[i]} - Y_{\text{actual}[i]})^2 \]
Where:
- \( m \) is the number of examples.
- \( Y_{\text{pred}} \) is the predicted output.
- \( Y_{\text{actual}} \) is the actual output.

### Inside the function:

1. **Number of Examples**:
    - The variable `m` represents the number of examples in our dataset, which is derived from the shape of the actual values, `Y`.

2. **Loss Computation**:
    - The difference between the predicted values (`Y_pred`) and the actual values (`Y`) is squared.
    - The squared differences are summed up.
    - The sum is then divided by `2m` to get the average loss.

3. **Return**: 
    - The function returns the computed MSE loss.

---

### Test:

After defining the `loss` function, we test its accuracy using:
- A sample predicted output, `y_pred`.
- A sample actual output, `Y_actual` with the shape `(1, 3)`.

The computed loss value, `loss_value`, is then printed.


In [13]:
def loss(Y_pred, Y):
    """Mean Squared Error Loss"""
    m = Y.shape[0]  # number of examples
    loss = (1 / (2 * m)) * np.sum((Y_pred - Y) ** 2)
    return loss

# Test the function with a sample prediction and actual value
Y_actual = np.array([[0, 1, 0]])
loss_value = loss(y_pred, Y_actual)
loss_value

0.3097683100963687

## `backprop` Function Explanation:

The `backprop` function performs the backpropagation algorithm, which is an essential component in training a neural network. Backpropagation adjusts the weights of the network to reduce the prediction error.

### Parameters:

- **`X_train`**: The input training data.
- **`Y_train`**: The actual output for the training data.
- **`W0`, `W1`, `W2`**: The weight matrices of the neural network.
- **`learning_rate`**: The rate at which the weights are updated.

### Inside the function:

1. **Forward Pass**:
    - The feedforward process is executed to obtain the pre-activations (`z1`, `z2`, `z3`) and activations (`a0`, `a1`, `a2`, `a3`) for each layer.

2. **Loss Derivative with Respect to Output**:
    - The derivative of the loss with respect to the output, `dZ3`, is computed. It represents the difference between the predicted output (`a3`) and the actual values (`Y_train`).
    - The gradient of the weights connecting the second hidden layer to the output layer, `dW2`, is computed using `dZ3` and the activations `a2`.

3. **Backpropagate Through Second Hidden Layer**:
    - The error is backpropagated to the second hidden layer using the transpose of the weight matrix `W2` and the derivative of the sigmoid activation function.
    - The bias term is removed from the error term `dZ2`.
    - The gradient of the weights connecting the first and second hidden layers, `dW1`, is computed using `dZ2` and the activations `a1`.

4. **Backpropagate Through First Hidden Layer**:
    - The error is further backpropagated to the first hidden layer using the transpose of the weight matrix `W1` and the derivative of the sigmoid activation function.
    - The bias term is removed from the error term `dZ1`.
    - The gradient of the weights connecting the input layer to the first hidden layer, `dW0`, is computed using `dZ1` and the input data `a0`.

5. **Update Weights**:
    - The weights `W0`, `W1`, and `W2` are updated by subtracting the product of the learning rate and their respective gradients.

### Returns:
The function returns the updated weights: `W0`, `W1`, and `W2`.

---

### Test:

After defining the `backprop` function, we test it using:
- A sample input, `x_test`.
- A sample actual output, `Y_actual`.
- The weight matrices `W0`, `W1`, and `W2`.
- A learning rate of 0.1.

The updated weights `W0`, `W1`, and `W2` are then


In [5]:
def backprop(X_train, Y_train, W0, W1, W2, learning_rate):
    m = X_train.shape[0]
    
    # Forward pass
    z1, z2, z3, a0, a1, a2, a3 = feedforward(X_train, W0, W1, W2)
    
    # Calculate the loss derivative w.r.t the output
    dZ3 = a3 - Y_train
    dW2 = (1 / m) * np.dot(a2.T, dZ3)
    
    # Backpropagate through the second hidden layer
    dZ2 = np.dot(dZ3, W2.T) * (a2 * (1 - a2))
    dZ2 = dZ2[:, :-1]  # Remove the bias term
    dW1 = (1 / m) * np.dot(a1.T, dZ2)
    
    # Backpropagate through the first hidden layer
    dZ1 = np.dot(dZ2, W1.T) * (a1 * (1 - a1))
    dZ1 = dZ1[:, :-1]  # Remove the bias term
    dW0 = (1 / m) * np.dot(a0.T, dZ1)
    
    # Update weights
    W0 -= learning_rate * dW0
    W1 -= learning_rate * dW1
    W2 -= learning_rate * dW2
    
    return W0, W1, W2

# Test the function with sample data
W0, W1, W2 = backprop(x_test, Y_actual, W0, W1, W2, learning_rate=0.1)
W0, W1, W2


(array([[ 1.29993992,  0.13852882, -1.07631499, -0.68175352, -1.151879  ],
        [ 0.09221523,  0.61198514, -0.97299952,  1.12463525,  0.533792  ],
        [ 0.22216159,  0.02154447,  1.6690609 , -0.6381654 , -1.0477523 ],
        [-0.29761462, -0.45083646,  0.04203293, -0.50543978, -1.25670446],
        [ 0.42881818, -0.37518099,  1.1830802 , -1.19715452, -0.95154033],
        [ 1.87365463,  1.15422887,  0.45363987,  1.06798323,  0.15294225],
        [-0.55625747, -0.98265547, -0.64691473, -0.52185492,  0.68266024],
        [-1.68681482, -0.04727326,  1.27319868, -0.6526455 , -1.23959163],
        [ 1.17078997,  0.02678696,  2.26900094, -0.63290529,  0.23028465],
        [ 0.18311615,  0.11315367, -0.67584535,  0.02047485,  0.51947268],
        [ 0.611685  , -1.26801703,  0.66129763, -1.50272506,  0.39432722]]),
 array([[-1.20445494,  2.14900468,  1.14122057,  1.38083286,  0.94568505],
        [ 0.38835061, -0.14254374, -0.7378739 ,  3.43372055, -0.04512435],
        [-0.96948964,  

## `train` Function Explanation:

The `train` function trains the neural network using the provided training data and labels.

### Parameters:

- **`X_train`**: The input training data.
- **`Y_train`**: The actual output for the training data.
- **`n_inputs`**: Number of input nodes.
- **`n_hidden`**: Number of nodes in each of the two hidden layers.
- **`n_output`**: Number of output nodes.
- **`n_epochs`**: Number of training iterations.
- **`learning_rate`**: The rate at which the weights are updated.

### Inside the function:

1. **Weight Initialization**:
    - The weights for the network are initialized using the `init_weights` function.

2. **Training Loop**:
    - For each epoch, the following steps are performed:
        - The `backprop` function is called to adjust the weights using the backpropagation algorithm.
        - Every 10 epochs, the current loss is computed using the `predict` and `loss` functions and is then printed to provide insight into the training process.

3. **Return**: 
    - The function returns the trained weights: `W0`, `W1`, and `W2`.

---

### Test:

The `train` function is tested using the following steps:

1. **Sample Data Generation**:
    - A training dataset `X_train` of shape `(1000, 10)` is randomly generated.
    - A bias column `b` is added to the `X_train` dataset.

2. **Class Labels Generation**:
    - Random class labels are generated for `Y_train`. Each row corresponds to a one-hot encoded class label, meaning only one element in each row is set to 1, while the others are set to 0.

3. **Training the Network**:
    - The neural network is trained using the `train` function with the generated sample data, with 100 epochs and a learning rate of 0.1.
    - The trained weights `W0`, `W1`, and `W2` are returned.

---




In [6]:
def train(X_train, Y_train, n_inputs, n_hidden, n_output, n_epochs, learning_rate):
    # Initialize weights
    W0, W1, W2 = init_weights(n_inputs, n_hidden, n_output)
    
    # Training loop
    for epoch in range(n_epochs):
        # Feedforward and Backpropagation
        W0, W1, W2 = backprop(X_train, Y_train, W0, W1, W2, learning_rate)
        
        # Compute loss for logging
        if epoch % 10 == 0:
            y_pred = predict(X_train, W0, W1, W2)
            current_loss = loss(y_pred, Y_train)
            print(f"Epoch {epoch}, Loss: {current_loss:.4f}")
            
    return W0, W1, W2

# Test the training function using the sample data provided in the problem
n_samples = 1000
n_inputs = 10
n_hidden = 5
n_output = 3
X_train = np.random.randn(n_samples, 10)
b = np.ones((X_train.shape[0], 1))
X_train = np.concatenate((X_train, b), axis=1)

# Generate random class labels for Y_train (0 or 1)
Y_train = np.zeros((n_samples, n_output), dtype=int)
for i in range(n_samples):
    random_idx = np.random.randint(n_output)
    Y_train[i, random_idx] = 1

# Train the network
W0, W1, W2 = train(X_train, Y_train, n_inputs, n_hidden, n_output, n_epochs=100, learning_rate=0.1)


Epoch 0, Loss: 0.4708
Epoch 10, Loss: 0.3979
Epoch 20, Loss: 0.3596
Epoch 30, Loss: 0.3447
Epoch 40, Loss: 0.3392
Epoch 50, Loss: 0.3370
Epoch 60, Loss: 0.3361
Epoch 70, Loss: 0.3356
Epoch 80, Loss: 0.3353
Epoch 90, Loss: 0.3352


# LETS TEST AND PREDICT IT

In [7]:
# Test the network with a sample input
x_test = np.random.randn(1, 11)
y_pred = predict(x_test, W0, W1, W2)
y_pred


array([[0.30547235, 0.36754361, 0.26902723]])