In [1]:
# 1. Explain the concept of forward propagation in a neural network.
# Ans: Forward propagation is the process through which input data is passed through a neural network layer by layer to generate an output or prediction. It is the first phase in training or evaluating a neural network, followed by the backpropagation phase where the network learns by updating its weights.

# Key Steps in Forward Propagation:
# Input Layer:

# The network receives input data, which is represented as a vector of features.
# Example: If the input is an image, each pixel value would serve as an input feature.
# Linear Transformation (Weights and Bias):

# Each input feature is multiplied by its corresponding weight.
# A bias term is added to the weighted sum to allow the model to shift the activation function as needed.

# Non-Linear Activation:
# The output of the linear transformation (z) is passed through an activation function to introduce non-linearity into the model.

# Propagation Through Hidden Layers:

# The activation output from one layer becomes the input for the next layer.
# This process continues until the final layer is reached.

# Output Layer:
# The final layer produces the network’s output. Depending on the task:
# Regression: Outputs a continuous value.
# Classification: Outputs probabilities (using a softmax or sigmoid function).

In [None]:
# 2. What is the purpose of the activation function in forward propagation?
# Ans: The activation function in forward propagation introduces non-linearity into a neural network, allowing it to learn and model complex relationships between the input features and the target output. Without an activation function, the neural network would behave like a linear model, regardless of the number of layers, limiting its capability to solve non-linear problems.

# Key Roles of the Activation Function:
# Introduces Non-Linearity:

# Real-world data and problems are often non-linear. Activation functions enable the network to approximate complex mappings from inputs to outputs by introducing non-linear transformations.
# Example: Recognizing patterns like edges in images or detecting sentiment in text.
# Enables Hierarchical Feature Learning:

# Activation functions allow deeper layers in the network to learn progressively more abstract and complex features (e.g., from edges to shapes to objects in image classification tasks).
# Controls the Flow of Information:

# They decide whether a neuron should be activated (fire) or not, mimicking the behavior of biological neurons.
# Adds Flexibility to the Model:

# Activation functions allow the network to adapt to various types of data distributions and tasks.
# Helps in Capturing Non-Linear Patterns: Without activation functions, stacking multiple layers would still result in a linear transformation due to the composition of linear functions being linear.

In [None]:
# 3. Describe the steps involved in the backward propagation (backpropagation) algorithm.
# Ans: The backpropagation algorithm is a method used in training neural networks to minimize the error (loss) by updating the weights and biases through gradient descent. It calculates the gradient of the loss function with respect to each parameter by propagating the error backward through the network.

# Steps in Backpropagation:
# 1. Forward Propagation:
# Input data is passed through the network, and outputs are computed at each layer.
# The final output is compared to the target to compute the loss using a loss function, e.g., Mean Squared Error (MSE) or Cross-Entropy.

# 2. Compute Output Error (Loss Gradient w.r.t Output):
# The derivative of the loss function with respect to the network’s output (y^) is computed.

# 3. Backward Propagation of Errors:
# The error is propagated backward through the network to compute gradients for each layer.
# a. Compute Gradients for Output Layer:
# Using the chain rule, calculate the derivative of the loss function with respect to weights and biases in the output layer.

# 4. Update Weights and Biases:
# Using gradient descent (or a variant like SGD, Adam, or RMSprop), weights and biases are updated to minimize the loss.

# 5. Repeat for All Layers:
# Steps 3 and 4 are repeated layer by layer, moving from the output layer back to the input layer.

# 6. Iterate Over Multiple Epochs:
# The entire forward and backward pass is repeated for multiple epochs (iterations over the training dataset) until the loss converges to a satisfactory level or stops decreasing significantly.

In [None]:
# 4. What is the purpose of the chain rule in backpropagation?
# Ans: The chain rule is a fundamental mathematical principle used in backpropagation to compute the gradient of a loss function with respect to the parameters (weights and biases) of a neural network. It allows the network to efficiently calculate how changes in parameters affect the final output and loss by breaking down the gradient computation into manageable steps.

# Key Roles of the Chain Rule in Backpropagation:
# Propagating Error Gradients Through Layers:

# Neural networks consist of multiple layers, and the chain rule enables the computation of the loss gradient layer by layer, starting from the output and moving backward (hence "backpropagation").
# Each layer's output depends on its weights, biases, and the previous layer's output. The chain rule links these dependencies.
# Handling Composite Functions:

# The output of each layer is a composite function of the inputs, weights, biases, and activation functions.
# The chain rule allows gradients to be computed for such composite functions by decomposing the derivatives into the product of simpler derivatives.
# Efficient Gradient Computation:

# The chain rule breaks the gradient computation into smaller parts, allowing efficient computation through the chain of dependencies. This ensures gradients for deep networks can be calculated without redundancy.
# Guiding Parameter Updates:

# By determining how much each weight and bias contributes to the error, the chain rule provides the information needed to adjust these parameters during the optimization step (e.g., gradient descent).

# Benefits of Using the Chain Rule:
# Simplifies Complex Networks:

# Neural networks with many layers involve intricate dependencies between inputs, weights, and outputs. The chain rule systematically handles these dependencies.
# Enables Deep Learning:

# Without the chain rule, computing gradients for deep networks would be intractable due to their complexity.
# Efficient Computation:

# The chain rule is central to the automatic differentiation techniques used in modern deep learning frameworks (e.g., TensorFlow, PyTorch).


In [2]:
# 5. Implement the forward propagation process for a simple neural network with one hidden layer using NumPy.

import numpy as np

# Define activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Forward propagation
def forward_propagation(X, W1, b1, W2, b2):

    # Hidden layer computation
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)

    # Output layer computation
    Z2 = np.dot(A1, W2) + b2
    A2 = softmax(Z2)

    return A2, (Z1, A1, Z2, A2)

# Usage
np.random.seed(42)

# Sample input data (m samples, n_input features)
X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Network parameters
n_input = 2
n_hidden = 3
n_output = 2

# Initialize weights and biases
W1 = np.random.randn(n_input, n_hidden)
b1 = np.random.randn(1, n_hidden)
W2 = np.random.randn(n_hidden, n_output)
b2 = np.random.randn(1, n_output)

# Perform forward propagation
output, cache = forward_propagation(X, W1, b1, W2, b2)

print("Output of the network:\n", output)


Output of the network:
 [[9.95886284e-01 4.11371643e-03]
 [9.99929907e-01 7.00925198e-05]
 [9.99998592e-01 1.40782351e-06]]
