## Forward and Backward Propagation Assignment questions.

### 1.Explain the concept of forward propagation in a neural network.

Forward propagation is the process of passing input data through a neural network to produce an output. It is a key step in training and using neural networks for tasks like classification, regression, or any predictive modeling. The concept involves a series of computations through the network's layers, where each layer transforms its input into an output using a set of weights, biases, and an activation function.

Here's a detailed breakdown of forward propagation:

#### 1. Input Layer
The input layer receives the raw data (e.g., images, text, or numerical values).
Each input feature is assigned to a neuron in this layer.

#### 2. Hidden Layers
Each neuron in a hidden layer computes a weighted sum of its inputs. Mathematically:
z (l) = W(l) ⋅a(l−1) + b(l)
where:

𝑧(𝑙) : Weighted sum for the l-th layer.
W (l) : Weight matrix connecting layer l−1 to layer 𝑙
a (l−1) : Activations (output) from the previous layer.
b (l) : Bias vector for the L-th layer.
The result 𝑧(𝑙)  is then passed through an activation function f to introduce non-linearity:
a(l) =f(z(l)) # Z POWER L
Common activation functions include ReLU, sigmoid, and tanh.
This process is repeated for all neurons in the hidden layers.

#### 3. Output Layer
The final layer of the network aggregates the results from the last hidden layer and produces the network's output. For example:
Regression tasks: The output is a single number (e.g., using no activation or linear activation).
Classification tasks: The output is a probability distribution (e.g., using a softmax activation).

#### 4. Example of Forward Propagation
Consider a simple network with:

1 input layer with two features (𝑥1,𝑥2)

1 hidden layer with two neurons, and

1 output neuron.

Step-by-Step:
1.Compute the weighted sum and activation for the first hidden layer:
𝑧1=𝑤11𝑥1+𝑤12𝑥2+𝑏1,𝑎1 = 𝑓(𝑧1)

𝑧2=𝑤21𝑥1+𝑤22𝑥2+𝑏2,𝑎2=𝑓(𝑧2)

2.Compute the output:
𝑧out =𝑤𝑜1𝑎1+𝑤𝑜2𝑎2+𝑏out,𝑦=𝑓(𝑧out)


#### 5. Purpose of Forward Propagation
To generate predictions: This is used during inference.
To calculate the loss: During training, forward propagation is followed by backpropagation, where the network updates its weights to minimize the loss.
Forward propagation is efficient and is the forward "pass" in the neural network's operation.

## Q2.What is the purpose of the activation function in forward propagation?

The purpose of the activation function in forward propagation is to enhance the functionality and expressiveness of a neural network by introducing non-linearity and controlling the output of neurons. Here are the key purposes in detail:

#### 1. Introducing Non-Linearity
Real-world data often involves complex, non-linear relationships. Activation functions allow the network to learn and model such patterns.
Without non-linearity, the network would be limited to solving only linearly separable problems, regardless of its depth.
Activation functions transform the linear combinations of inputs and weights into non-linear outputs, enabling the network to approximate any function.

#### 2. Allowing Hierarchical Feature Learning
In multi-layer neural networks, activation functions enable each layer to learn more abstract and meaningful features from the previous layer's output.
Example: In an image classifier, early layers might learn edges, while deeper layers learn complex shapes or objects.
This progressive abstraction is crucial for tasks like image recognition, language processing, and other complex problems.

#### 3. Controlling the Range of Outputs
Activation functions often restrict the output to a specific range (e.g., 0 to 1, −1 to 1).
This helps:
Prevent large, unbounded values from destabilizing the network.
Provide interpretable outputs, such as probabilities in classification tasks (e.g., sigmoid or softmax).

#### 4. Enabling Backpropagation
Most activation functions are differentiable, which is essential for backpropagation during training.
Backpropagation relies on the derivative of the activation function to compute gradients for adjusting weights and biases.
Choosing an activation function with an appropriate gradient helps ensure effective learning.

#### 5. Improving Model Performance
Different activation functions are suited to different tasks, and choosing the right one can significantly affect the network's performance:
Avoiding vanishing gradients: ReLU (Rectified Linear Unit) and its variants help address the vanishing gradient problem that occurs with sigmoid or tanh in deep networks.
Sparsity: ReLU introduces sparsity by outputting zero for negative inputs, which can improve computational efficiency and reduce overfitting.

In summary, the activation function transforms the raw outputs of neurons in a way that allows the neural network to learn non-linear patterns, represent hierarchical features, stabilize computations, and support the training process via backpropagation. It is a critical component that makes deep learning practical and effective for complex problems.





## 3.Describe the steps involved in the backward propagation (backpropagation) algorithm.

Here is a concise summary of the steps involved in the backpropagation algorithm:

#### 1. Forward Pass
Pass input data through the network to compute the predicted output.

Calculate the loss (error) using a loss function.

#### 2. Compute Gradients at the Output Layer
Calculate the gradient of the loss with respect to the output layer’s pre-activation values (z(L)) using the chain rule.

Compute gradients of the weights and biases in the output layer.

#### 3. Backward Pass Through Hidden Layers

For each hidden layer:
Calculate the error term (𝛿(𝑙) ) using the weights and errors from the next layer.

Compute gradients of the weights and biases in the current layer.


#### 4. Update Weights and Biases
    
Adjust the weights and biases using an optimization algorithm like gradient descent:
    W (l) ←W (l) −η ⋅ ∂L/ ∂W (l)

    b (l) ←b (l) −η⋅ ∂L / ∂b(l)
 
#### 5. Repeat
Iterate steps 1–4 for multiple epochs or until the loss converges.

In essence, backpropagation computes gradients using the chain rule, propagates the error backward through the network, and updates parameters to minimize the loss.




    

## 4.What is the purpose of the chain rule in backpropagation?

The chain rule is fundamental to the backpropagation algorithm as it enables the calculation of gradients for deep neural networks. Specifically, it allows the error (loss) to be propagated backward from the output layer to the earlier layers, ensuring that each layer's weights and biases are updated correctly. Here's the purpose of the chain rule in backpropagation:

#### 1. Efficient Gradient Computation
The chain rule provides a systematic way to compute the gradient of the loss function with respect to each weight and bias in the network, even when the network has many layers.
It breaks the computation into manageable steps by considering the relationships between successive layers.

#### 2. Linking Layers in the Network
In a neural network, the output of one layer is the input to the next. The chain rule helps in computing how the change in a weight or bias in one layer affects the loss, accounting for all intermediate transformations.

Mathematically : ∂L/∂W(l) =∂L/∂z(l) ⋅ ∂z(l)/∂W(l)

#### 3. Handling Non-Linear Activation Functions
Neural networks use non-linear activation functions, making direct gradient computation challenging. The chain rule enables differentiation through these non-linearities by combining their derivatives with those of the previous layers.

#### 4. Backward Propagation of Error
The chain rule allows errors to flow backward through the network:

The gradient of the loss at the output layer is computed first.

This gradient is then propagated backward to compute gradients for all preceding layers by chaining the partial derivatives layer by layer. 

#### 5. Parameter Optimization
The gradients computed using the chain rule are used in optimization algorithms (e.g., gradient descent) to update weights and biases, minimizing the loss function.

##### The purpose of the chain rule in backpropagation is to compute the gradient of the loss with respect to each weight and bias in a multi-layer neural network. It achieves this by breaking the gradient computation into smaller steps, linking the layers, and propagating the error backward from the output to the input. This allows efficient and accurate updates of the model parameters during training.



## 5.Implement the forward propagation process for a simple neural network with one hidden layer using NumPy.

In [1]:
import numpy as np

# Define the activation function (ReLU for hidden layer, sigmoid for output layer)
def relu(x):
    return np.maximum(0, x)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Define the forward propagation function
def forward_propagation(X, weights, biases):
    """
    Perform forward propagation through a neural network with one hidden layer.

    Parameters:
    - X: Input data, shape (n_features, n_samples)
    - weights: A dictionary with weights for the hidden and output layers
    - biases: A dictionary with biases for the hidden and output layers

    Returns:
    - A dictionary containing intermediate and final outputs (activations)
    """
    # Compute the hidden layer
    Z1 = np.dot(weights['W1'], X) + biases['b1']  # Weighted sum for hidden layer
    A1 = relu(Z1)                                # Activation for hidden layer

    # Compute the output layer
    Z2 = np.dot(weights['W2'], A1) + biases['b2']  # Weighted sum for output layer
    A2 = sigmoid(Z2)                               # Activation for output layer

    # Store intermediate results for potential backpropagation
    activations = {
        'Z1': Z1, 'A1': A1,
        'Z2': Z2, 'A2': A2
    }
    return activations

# Example setup
np.random.seed(42)  # For reproducibility

# Input data (2 features, 3 samples)
X = np.array([[0.5, 1.5, -1.0],
              [1.0, -0.5,  2.0]])

# Neural network parameters
weights = {
    'W1': np.random.randn(4, 2),  # 4 neurons in hidden layer, 2 input features
    'W2': np.random.randn(1, 4)  # 1 output neuron, 4 hidden neurons
}
biases = {
    'b1': np.random.randn(4, 1),  # Bias for 4 hidden neurons
    'b2': np.random.randn(1, 1)  # Bias for 1 output neuron
}

# Perform forward propagation
activations = forward_propagation(X, weights, biases)

# Output results
print("Hidden layer activations (A1):")
print(activations['A1'])
print("\nOutput layer activations (A2):")
print(activations['A2'])


Hidden layer activations (A1):
[[0.35205505 1.05616565 0.        ]
 [0.         0.         0.48509093]
 [0.         0.         0.        ]
 [0.99475361 1.42281433 0.        ]]

Output layer activations (A2):
[[0.16227489 0.10235562 0.32089971]]
