**ForwardandBackwardPropagation**

# Question 1: Explain the concept of forward propagation in a neural network.
"""
Answer:
Forward propagation is the process by which an input is passed through a neural network to generate an output. In a typical feedforward neural network, forward propagation involves moving the input through multiple layers, where each layer performs some computation and passes its result to the next layer.

In the case of a neural network with one or more hidden layers:
1. The input is multiplied by the weights (which are initially random).
2. A bias term is added to this weighted sum.
3. An activation function is applied to the result to introduce non-linearity.
4. This process is repeated layer by layer, until the final output layer is reached, producing the network’s prediction.

Mathematically:
- For each layer \( i \), the output is computed as:
  \[
  \mathbf{a}_i = \text{activation}(\mathbf{W}_i \mathbf{a}_{i-1} + \mathbf{b}_i)
  \]
  where \( \mathbf{a}_{i-1} \) is the output from the previous layer (starting with the input layer), \( \mathbf{W}_i \) are the weights, and \( \mathbf{b}_i \) is the bias.
"""


# Question 2: What is the purpose of the activation function in forward propagation?
"""
Answer:
The activation function introduces non-linearity into the network. Without activation functions, the neural network would behave like a linear regression model, regardless of the number of layers. This is because a composition of linear functions is still a linear function.

Activation functions allow the network to model complex relationships between input and output, making it capable of solving more complex tasks, such as image classification, object detection, etc.

Common activation functions include:
1. **Sigmoid**: Outputs values between 0 and 1, used primarily in binary classification.
2. **ReLU (Rectified Linear Unit)**: Outputs values greater than or equal to 0, used to introduce sparsity and avoid vanishing gradients.
3. **Tanh (Hyperbolic Tangent)**: Outputs values between -1 and 1, often used in hidden layers.

The choice of activation function affects the learning and performance of the model.
"""


# Question 3: Describe the steps involved in the backward propagation (backpropagation) algorithm.
"""
Answer:
Backpropagation is the process used to train neural networks by adjusting the weights based on the error (or loss) of the output. The steps involved are:

1. **Calculate the Loss**: The first step is to calculate the error or loss using a loss function (e.g., mean squared error or cross-entropy). This is computed as the difference between the predicted output and the actual target.
   
2. **Backward Pass (Error Propagation)**:
   - Starting from the output layer, backpropagate the error backwards through the network.
   - For each layer, compute the gradient of the loss with respect to the weights and biases using the chain rule.

3. **Update Weights and Biases**: Once the gradients are computed, update the weights and biases using an optimization algorithm like gradient descent.

The goal of backpropagation is to minimize the loss function, ultimately allowing the neural network to make better predictions.
"""


# Question 4: What is the purpose of the chain rule in backpropagation?
"""
Answer:
The chain rule is a fundamental concept in calculus used to compute derivatives of composite functions. In the context of backpropagation, the chain rule allows us to compute the gradient of the loss function with respect to each parameter (weight and bias) in the neural network.

The backpropagation algorithm involves computing gradients layer by layer. The chain rule is used to propagate the error backward through the network, starting from the output layer and going toward the input layer.

Mathematically, if the loss function is \( L \), the weight updates for a layer \( k \) depend on the gradient of \( L \) with respect to the weight at layer \( k \). Using the chain rule, we can express the gradient as:
\[
\frac{\partial L}{\partial W_k} = \frac{\partial L}{\partial a_k} \cdot \frac{\partial a_k}{\partial z_k} \cdot \frac{\partial z_k}{\partial W_k}
\]
where \( a_k \) is the activation of layer \( k \), and \( z_k \) is the input to the activation function in layer \( k \). The chain rule ensures that the error is appropriately propagated and used to update the parameters in the network.
"""


In [5]:
# Question 5: Implement the forward propagation process for a simple neural network with one hidden layer using NumPy

#Answer:In this example, we will implement forward propagation for a simple neural network with one hidden layer using NumPy.

 #Let's assume:
#- The input layer has 3 features (3 input nodes).
#- The hidden layer has 4 neurons.
#- The output layer has 2 neurons (for binary classification or multi-class classification with 2 classes).

# We will use ReLU activation for the hidden layer and softmax activation for the output layer.


import numpy as np

# Define the input features (3 features)
X = np.array([[0.5, 0.6, 0.1]])

# Define the weights and biases
# Input to hidden layer weights (3 input nodes, 4 hidden neurons)
W1 = np.random.randn(3, 4)
b1 = np.random.randn(1, 4)

# Hidden to output layer weights (4 hidden neurons, 2 output neurons)
W2 = np.random.randn(4, 2)
b2 = np.random.randn(1, 2)

# Forward Propagation

# 1. Compute hidden layer activations (using ReLU)
Z1 = np.dot(X, W1) + b1  # Linear transformation
A1 = np.maximum(0, Z1)   # ReLU activation

# 2. Compute output layer activations (using softmax)
Z2 = np.dot(A1, W2) + b2  # Linear transformation
A2 = np.exp(Z2) / np.sum(np.exp(Z2), axis=1, keepdims=True)  # Softmax activation

print("Output of the network: ", A2)


Output of the network:  [[0.02914407 0.97085593]]
