<a href="https://colab.research.google.com/github/ShubhamP1028/DeepLearningTute/blob/main/Backpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Backpropagation
Forward Pass :
Input → Layer 1 → Activation → Layer 2 → Activation → Output → Loss

<center><u>Backprop is exactly opposite to this
</u>

<b> weight (w) → raw neuron output (z) → final prediction (ŷ) → Loss (L) </b>

</cemter>

### Formula : ∂L/∂w = (∂L/∂ŷ) ⋅ (∂ŷ/∂z) ⋅ (∂z/∂w)

Simply means that - The total impact of a weight on the final loss is the product of [how much the loss reacts to the prediction] times [how much the prediction reacts to the neuron's raw output] times [how much the neuron's raw output reacts to the weight]



taking an example using 1 neuron

1 Neuron:

#### z=w⋅x+b
#### ŷ =σ(z)
##### L= 1/2 * ​(ŷ−y)2

Now backprop: dw/dL = dŷ/dL ⋅ dz/dŷ ⋅ dw/dz
​


<center>

### Building a mini neural net

*  2 input neurons

*  1 hidden layer (3 neurons, sigmoid)
*  1 output neuron (sigmoid or softmax for multi-class)
</center>

In [6]:
# without any deep learning library
import numpy as np
# Activation function
def sigmoid(x): # bcs it return o or 1
    return 1 / (1 + np.exp(-x))

# This is the derivative of the sigmoid function.
# We need this for backpropagation to calculate the gradient.
def sigmoid_derivative(x):
    return x * (1 - x)

# The Neural Network class that will encapsulate our model.
class SimpleNeuralNetwork:
    def __init__(self, input_nodes, hidden_nodes, output_nodes):
        """
        Initializes the network's architecture and weights.
        - input_nodes: 2
        - hidden_nodes: 3
        - output_nodes: 1
        """
        # --- Initialize weights and biases with random values ---
        # Weights connecting the input layer to the hidden layer (2x3 matrix)
        self.weights_input_hidden = np.random.uniform(size=(input_nodes, hidden_nodes))
        # Weights connecting the hidden layer to the output layer (3x1 matrix)
        self.weights_hidden_output = np.random.uniform(size=(hidden_nodes, output_nodes))

        # Biases for the hidden layer (3 neurons)
        self.bias_hidden = np.random.uniform(size=(1, hidden_nodes))
        # Bias for the output layer (1 neuron)
        self.bias_output = np.random.uniform(size=(1, output_nodes))

        print("Network initialized with random weights and biases.")

    def feedforward(self, inputs):
        """
        Calculates the network's prediction for a given input.
        This is the "forward pass".
        """
        # 1. Calculate the signal into the hidden layer
        hidden_layer_input = np.dot(inputs, self.weights_input_hidden) + self.bias_hidden
        # 2. Apply the activation function
        hidden_layer_output = sigmoid(hidden_layer_input)

        # 3. Calculate the signal into the output layer
        output_layer_input = np.dot(hidden_layer_output, self.weights_hidden_output) + self.bias_output
        # 4. Apply the activation function to get the final prediction
        predicted_output = sigmoid(output_layer_input)

        return predicted_output, hidden_layer_output

    def train(self, training_inputs, training_outputs, learning_rate, epochs):
        """
        The main training loop. This is where the learning happens.
        """
        print(f"\nStarting training for {epochs} epochs...")
        for epoch in range(epochs):
            # --- Step 1: Forward Pass ---
            predicted_output, hidden_layer_output = self.feedforward(training_inputs)

            # --- Step 2: Calculate the Error ---
            # This is (predicted - actual), our basic error signal
            output_error = training_outputs - predicted_output

            # --- Step 3: Backpropagation ---
            # This is where we apply the chain rule to find the gradients.

            # Gradient for the output layer
            # This is (error * derivative_of_activation)
            # This is our (ŷ - y) * ŷ(1 - ŷ) from the formula
            output_delta = output_error * sigmoid_derivative(predicted_output)

            # How much did the hidden layer contribute to the output error?
            # Propagate the error backward to the hidden layer
            hidden_error = output_delta.dot(self.weights_hidden_output.T)

            # Gradient for the hidden layer
            hidden_delta = hidden_error * sigmoid_derivative(hidden_layer_output)

            # --- Step 4: Update Weights and Biases (Gradient Descent) ---
            # We move the weights in the opposite direction of their gradient.

            # Update weights for hidden-to-output connections
            self.weights_hidden_output += hidden_layer_output.T.dot(output_delta) * learning_rate
            # Update weights for input-to-hidden connections
            self.weights_input_hidden += training_inputs.T.dot(hidden_delta) * learning_rate

            # Update biases
            self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
            self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

        print("Training complete!")

    def predict(self, inputs):
        """
        Make a prediction on new, unseen data.
        """
        prediction, _ = self.feedforward(inputs)
        return prediction

# --- Example Usage ---

# 1. Define the training data
# We'll use the XOR problem, a classic non-linear problem.
training_inputs = np.array([[0,0], [0,1], [1,0], [1,1]])
training_outputs = np.array([[0], [1], [1], [0]]) # XOR outputs

# 2. Create the neural network
# 2 inputs, 3 hidden neurons, 1 output neuron
nn = SimpleNeuralNetwork(input_nodes=2, hidden_nodes=3, output_nodes=1)

# 3. Train the network
# epochs: The number of times to loop through the entire training set.
# learning_rate: How big of a step to take during gradient descent.
nn.train(training_inputs, training_outputs, learning_rate=0.5, epochs=10000)

# 4. Make predictions on the training data to see if it learned
print("\nPredictions after training:")
for i, input_data in enumerate(training_inputs):
    print(f"Input: {input_data},  Actual: {training_outputs[i][0]},  Predicted: {nn.predict(input_data)[0][0]:.4f}")

Network initialized with random weights and biases.

Starting training for 10000 epochs...
Training complete!

Predictions after training:
Input: [0 0],  Actual: 0,  Predicted: 0.0148
Input: [0 1],  Actual: 1,  Predicted: 0.9801
Input: [1 0],  Actual: 1,  Predicted: 0.9899
Input: [1 1],  Actual: 0,  Predicted: 0.0173


# Using Tensorflow

In [7]:
import tensorflow as tf
import numpy as np

# --- 1. The Model Architecture (Your code) ---
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=3, activation='sigmoid', input_shape=(2,)),
    tf.keras.layers.Dense(units=1, activation='sigmoid')
])

# --- 2. The Data ---
# XOR problem: output is 1 if inputs are different, otherwise 0.
X = np.array([[0,0], [0,1], [1,0], [1,1]], dtype=np.float32)
y = np.array([[0], [1], [1], [0]], dtype=np.float32)

# --- 3. Compile the Model ---
# This configures the model for training.
# - optimizer='adam': Our efficient gradient descent algorithm.
# - loss='binary_crossentropy': A loss function designed for binary (0 or 1)
#   classification problems. It's the standard partner for a sigmoid output.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

print("Model compiled. Starting training...\n")

# --- 4. Train the Model ---
# - X, y: Our training data and labels.
# - epochs=2000: The model will see the entire dataset 2000 times.
# - verbose=0: Set to 0 to keep the output clean for this example.
model.fit(X, y, epochs=2000, verbose=0)

print("Training complete!\n")

# --- 5. Make Predictions ---
# Let's see how well it learned.
predictions = model.predict(X)
print("Predictions on training data:")
for i in range(len(X)):
    print(f"Input: {X[i]}, Actual: {y[i][0]}, Predicted: {predictions[i][0]:.4f}")

Model compiled. Starting training...

Training complete!

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step
Predictions on training data:
Input: [0. 0.], Actual: 0.0, Predicted: 0.2522
Input: [0. 1.], Actual: 1.0, Predicted: 0.6163
Input: [1. 0.], Actual: 1.0, Predicted: 0.8094
Input: [1. 1.], Actual: 0.0, Predicted: 0.3499


In [1]:
 # Hidden Layer: 3 neurons
    # Math for this layer (Forward Pass for one neuron):
    # z1 = (input_1 * w1) + (input_2 * w2) + b
    # h1 = sigmoid(z1)
    # In matrix form for the whole layer: h = sigmoid(X · W1 + b1)

# Output Layer: 1 neuron
    # Math for this layer (Forward Pass):
    # z2 = (h1 * w_h1) + (h2 * w_h2) + (h3 * w_h3) + b
    # y_hat = sigmoid(z2)
    # In matrix form: y_hat = sigmoid(h · W2 + b2)


# 'adam': An efficient optimization algorithm (a form of Gradient Descent).
# 'binary_crossentropy': The loss function.
# Math for Loss (L): L = -[y*log(y_hat) + (1-y)*log(1-y_hat)]
# This function heavily penalizes confident wrong predictions.

In [3]:
# The .fit() method encapsulates the training loop.
# For each epoch, it performs the following for batches of data:
#
#   a) Forward Propagation:
#      - The input data `X` is passed through the network to get a prediction `y_hat`.
#      - This involves the matrix multiplications and sigmoid activations defined above.
#
#   b) Loss Calculation:
#      - The `binary_crossentropy` loss `L` is calculated between the
#        prediction `y_hat` and the true label `y`.
#
#   c) Backward Propagation (Backpropagation):
#      - TensorFlow calculates the gradient of the Loss with respect to every
#        weight and bias in the network. This is the chain rule in action.
#      - e.g., For a weight `w` in the output layer:
#        ∂L/∂w = (∂L/∂y_hat) * (∂y_hat/∂z2) * (∂z2/∂w)
#
#   d) Gradient Descent:
#      - The 'adam' optimizer updates each weight and bias by moving it slightly
#        in the opposite direction of its calculated gradient.
#      - update = learning_rate * gradient
#      - new_weight = old_weight - update