## Introduction

In this notebook, we'll explore the concept of forward propagation in neural networks. Forward propagation is the process by which input data is passed forward through the network, layer by layer, to produce an output. We'll examine the implementation of forward propagation in the provided neural network code, which is encapsulated within a Python class.

Readers should have a basic understanding of object-oriented programming (OOP) concepts to comprehend the structure of the network class.

Throughout this notebook, we'll delve into the workings of forward propagation within this network class, understand how it computes the output of the network given an input, and provide detailed explanations and code snippets to facilitate understanding.

Before we proceed, ensure that you have the NumPy library installed. If not, you can install it using the following command:


In [1]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


First, here's the full code for the network class. You can run the cell below in one go and i'll explain it in more detail below.

In [8]:
import random
import numpy as np

class Network(object):

    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data):

        training_data = list(training_data)
        n = len(training_data)

        if test_data:
            test_data = list(test_data)
            n_test = len(test_data)

        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in range(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print("Epoch {} : {} / {}".format(j,self.evaluate(test_data),n_test))
            
            else:
                print("Epoch {} complete".format(j))

    def update_mini_batch(self, mini_batch, eta):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        return (output_activations-y)

#### Miscellaneous functions
def sigmoid(z):
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    return sigmoid(z)*(1-sigmoid(z))

# 1. Initialization
First, let's start by initializing the neural network with random weights and biases. We'll create an instance of the Network class with a specified architecture (number of layers and neurons in each layer) and random initialization for weights and biases.

```python
import numpy as np
import random

class Network(object):

    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

The first couple of lines are self-explanatory.The sizes parameter take a vector which contain the number of neurons in each layer. For example:

```python
net = network.Network([784, 50, 10])  #784 neurons in first layer, 50 neurons in the second layer and 10 neurons in the third layer
```

We first initialize the weights and biases with a random value within the Gaussian distribution, which means the values generated is of mean 0 and standard deviation of 1. 
Also, note that the weights and biases are in the form of a numpy array.

The biases is a numpy array that contain the arrays of biases in each layer except the first layer because there is no biases in the first layer (we will call it as input layer from now, and the last layer is the output layer). For example:
```python
biases = [
    # Biases for the second layer (index 0 corresponds to layer 1)
    [0.1,  # Bias for neuron 1 in layer 2
     0.2,  # Bias for neuron 2 in layer 2
     0.3], # Bias for neuron 3 in layer 2

    # Biases for the third layer (index 1 corresponds to layer 2)
    [0.4,  # Bias for neuron 1 in layer 3
     0.5]   # Bias for neuron 2 in layer 3
]
```

The weights is kinda similar, but the the weights contain arrays of weights in each layer of neuron. For example:
```python
weights = [
    # Weights for connections from the input layer to the hidden layer
    [[0.1, 0.2],     # Weights for connections from neuron 1 in the input layer to neurons in the hidden layer
     [0.3, 0.4],     # Weights for connections from neuron 2 in the input layer to neurons in the hidden layer
     [0.5, 0.6]],    # Weights for connections from neuron 3 in the input layer to neurons in the hidden layer

    # Weights for connections from the hidden layer to the output layer
    [[0.5],          # Weights for connections from neurons in the hidden layer to the neuron in the output layer
     [0.6]]          # Weights for connections from neurons in the hidden layer to the neuron in the output layer
]
```
Here's the visual representation of the example network architecture:

![Neural Network Architecture Example](./images/1.1.jpg)






# 2. Feedforward Function
Next, we'll dive into the feedforward function, which performs the forward propagation through the network.The feedforward method iterates through each layer of the network, applying the weights and biases to the input and passing it through the activation function.The output of each layer becomes the input to the next layer, and this process continues until the final layer, producing the output of the network.

This method(function of a class) takes an input vector and computes the output of the network by passing the input forward through each layer, applying the appropriate weights and biases, and applying the activation function (sigmoid) to produce the output of each neuron.Let's add the feed forward method to our network class.

```python
def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a
```

This one is pretty straightforward. When this method is called, first it will combine the weights and biases arrays using the zip function. Then, for every weights and biases(denoted as w and b), it will find the dot product between the weights and inputs and add the result with the biases array.Notice that all of this is wrapped around a custom function called sigmoid. Don't worry as this function will be clarified below. Finally, the method return the updated array a.

### 3. Sigmoid Activation Function

Before we proceed, let's briefly explain the sigmoid activation function used in the feedforward process. The sigmoid function maps any real-valued number to the range (0, 1), making it suitable for producing probabilities or binary outputs in classification tasks.

```python
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))
```

The sigmoid graph looks like this:

![Graph of sigmoid function](./images/1.2.jpg)

### 4. Example
Let's demonstrate the forward propagation process with an example. We'll provide an input vector to the network and observe how it produces the corresponding output using the feedforward method.

In [11]:
net = Network([3,2,1])

print(net.feedforward([0.4,1.2,1.9]))

[[0.67138945 0.68356306 0.7185193 ]]


As you can see, the forward propagation give the output of an array whihc contain the array of two elements, corresponding to two neuron in the hidden layer. You can play around with the second value in the input vector when we first initialize the network and see how the output shape changes.

### Conclusion
Forward propagation is a fundamental process in neural networks where input data is passed forward through the network to produce an output. In this notebook, we explored how forward propagation is implemented in the provided neural network code, focusing on the feedforward method and the sigmoid activation function. Understanding forward propagation is essential for grasping the functioning of neural networks and their ability to make predictions based on input data.
