In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# 🧠 Backpropagation: Step-by-Step with Math + Real-Life Analogy 🍰

---

## 🔁 What is Backpropagation?

Backpropagation is the core algorithm used in training neural networks. It computes the **gradient of the loss function** with respect to each weight using the **chain rule**, and updates the weights using **gradient descent**.

---

## 📦 Real-Life Analogy: Baking a Cake 🍰

Imagine you're baking a cake and it tastes bad (high loss). To fix it, you trace backward:

- Too much sugar?
- Not enough baking powder?

You adjust ingredients based on output — that’s backpropagation! The weights are ingredients, and the taste (loss) tells you what to fix.

---

## 🔢 Step-by-Step Process

Assume a simple neural network:

- Input layer  
- One hidden layer with ReLU  
- One output layer with Sigmoid  
- Binary Cross Entropy Loss

---

### 🔹 Step 1: Forward Pass

1. Hidden layer:

$$
Z^{[1]} = W^{[1]} X + b^{[1]}
$$

$$
A^{[1]} = \text{ReLU}(Z^{[1]})
$$

2. Output layer:

$$
Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}
$$

$$
A^{[2]} = \hat{Y} = \sigma(Z^{[2]})
$$

---

### 🔹 Step 2: Loss Calculation (Binary Cross-Entropy)

$$
\mathcal{L} = - \left( Y \log(\hat{Y}) + (1 - Y) \log(1 - \hat{Y}) \right)
$$

---

### 🔹 Step 3: Backward Pass (Gradient Calculation)

#### Output Layer:

$$
\frac{\partial \mathcal{L}}{\partial A^{[2]}} = -\left(\frac{Y}{A^{[2]}} - \frac{1 - Y}{1 - A^{[2]}} \right)
$$

Derivative of Sigmoid:

$$
\frac{dA}{dZ} = A^{[2]} (1 - A^{[2]})
$$

Combine:

$$
\delta^{[2]} = A^{[2]} - Y
$$

#### Gradient w.r.t Weights:

$$
\frac{\partial \mathcal{L}}{\partial W^{[2]}} = \delta^{[2]} \cdot (A^{[1]})^T
$$

#### Hidden Layer:

Backpropagated error:

$$
\delta^{[1]} = (W^{[2]})^T \delta^{[2]} \cdot \text{ReLU}'(Z^{[1]})
$$

Derivative of ReLU:

$$
\text{ReLU}'(z) = \begin{cases} 
1 & \text{if } z > 0 \\
0 & \text{otherwise}
\end{cases}
$$

Gradient w.r.t Weights:

$$
\frac{\partial \mathcal{L}}{\partial W^{[1]}} = \delta^{[1]} \cdot X^T
$$

---

### 🔁 Weight Update Rule

Using learning rate \( \eta \):

$$
W^{[l]} = W^{[l]} - \eta \cdot \frac{\partial \mathcal{L}}{\partial W^{[l]}}
$$

$$
b^{[l]} = b^{[l]} - \eta \cdot \frac{\partial \mathcal{L}}{\partial b^{[l]}}
$$

---

## 🏁 Summary Table

| Layer         | Error Term \( \delta \)                  | Gradient w.r.t Weight              |
|---------------|-------------------------------------------|------------------------------------|
| Output Layer  | \( \delta^{[2]} = A^{[2]} - Y \)          | \( \delta^{[2]} (A^{[1]})^T \)     |
| Hidden Layer  | \( \delta^{[1]} = (W^{[2]})^T \delta^{[2]} \cdot \text{ReLU}'(Z^{[1]}) \) | \( \delta^{[1]} X^T \) |

---

## 💡 Real-Life Example Revisited: Smart Cake Baking

- Bake a cake → 🍰  
- It tastes too sweet → 😖  
- Trace back ingredients  
- Adjust sugar, bake again  
- 🎯 Backpropagation in action!

---


# 🔁 Backpropagation in Deep Learning – Complete Math Walkthrough

## 🧠 Objective:
Train a neural network by minimizing a loss function using gradients computed via backpropagation.

---

## 🏗️ Architecture (Simple 1 Hidden Layer Network)

Let’s define:
- Input: **x** ∈ ℝⁿ  
- Hidden layer weights: **W₁** ∈ ℝ^(h × n), biases: **b₁** ∈ ℝ^h  
- Activation (sigmoid): **a = σ(z₁)**  
- Output layer weights: **W₂** ∈ ℝ^(1 × h), biases: **b₂** ∈ ℝ  
- Output: **ŷ = z₂ = W₂a + b₂**  
- True output: **y** ∈ ℝ  
- Loss function: Mean Squared Error (MSE)  
  **L = ½ (y - ŷ)²**

---

## 🚀 Step 1: Forward Pass

1. Compute linear transformation of hidden layer:  
   **z₁ = W₁x + b₁**

2. Apply activation:  
   **a = σ(z₁)**, where **σ(z) = 1 / (1 + e^(−z))**

3. Compute output:  
   **z₂ = W₂a + b₂ = ŷ**

4. Compute loss:  
   **L = ½ (y - ŷ)²**

---

## 🔄 Step 2: Backward Pass (Gradient Computation using Chain Rule)

### ⬅️ Gradient at Output Layer

Loss:  
**L = ½ (y - ŷ)²**  
So:  
**∂L/∂ŷ = -(y - ŷ)**

Since:  
**ŷ = z₂ = W₂a + b₂**,  
We compute:  
**∂L/∂W₂ = ∂L/∂z₂ × ∂z₂/∂W₂ = (ŷ - y) × aᵗ**  
**∂L/∂b₂ = ∂L/∂z₂ × ∂z₂/∂b₂ = (ŷ - y)**

---

### ⬅️ Gradient at Hidden Layer

We propagate error back:

1. From output to hidden activation:  
   **∂L/∂a = ∂L/∂z₂ × ∂z₂/∂a = (ŷ - y) × W₂**

2. Apply derivative of sigmoid:  
   Recall:  
   **σ(z₁) = a**  
   **σ'(z₁) = a × (1 - a)**

   So:  
   **∂L/∂z₁ = ∂L/∂a × σ'(z₁) = ((ŷ - y) × W₂) ⊙ a × (1 - a)**  
   (⊙ = element-wise multiplication)

3. Compute gradients of W₁ and b₁:  
   **∂L/∂W₁ = ∂L/∂z₁ × ∂z₁/∂W₁ = (∂L/∂z₁) × xᵗ**  
   **∂L/∂b₁ = ∂L/∂z₁**

---

## 🧮 Final Gradient Summary

- **∂L/∂W₂ = (ŷ - y) × aᵗ**  
- **∂L/∂b₂ = (ŷ - y)**  
- **∂L/∂W₁ = ((ŷ - y) × W₂) ⊙ a × (1 - a) × xᵗ**  
- **∂L/∂b₁ = ((ŷ - y) × W₂) ⊙ a × (1 - a)**

---

## 🛠️ Step 3: Weight Update (Gradient Descent)

Using learning rate **η**:
- **W₁ = W₁ - η × ∂L/∂W₁**
- **b₁ = b₁ - η × ∂L/∂b₁**
- **W₂ = W₂ - η × ∂L/∂W₂**
- **b₂ = b₂ - η × ∂L/∂b₂**

---

## 📌 Notes:

- Vectorization is crucial for efficiency.
- You can use ReLU instead of sigmoid:  
  ReLU(z) = max(0, z),  
  ReLU′(z) = 1 if z > 0 else 0
- Use Cross-Entropy + Softmax for multi-class tasks.

---

## 🧠 Intuition:

- Forward pass gives prediction.
- Loss measures how far prediction is from truth.
- Backprop computes how much each weight contributed to the error.
- Weights are updated in the direction that reduces the loss the most.



In [2]:
import numpy as np 
import pandas as pd 

In [3]:
df=pd.DataFrame([[8,8,4],[7,9,5],[6,10,6],[5,12,7]],columns=['cgpa','profile','lpa'])
df

Unnamed: 0,cgpa,profile,lpa
0,8,8,4
1,7,9,5
2,6,10,6
3,5,12,7


In [4]:
import numpy as np

def initialize_parameters(layer_dims):
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)

    for i in range(1, L):
        parameters['W' + str(i)] = np.ones((layer_dims[i], layer_dims[i - 1])) * 0.1
        parameters['b' + str(i)] = np.zeros((layer_dims[i], 1))

    return parameters

# Example call
params = initialize_parameters([2, 2, 1])
print(params)


{'W1': array([[0.1, 0.1],
       [0.1, 0.1]]), 'b1': array([[0.],
       [0.]]), 'W2': array([[0.1, 0.1]]), 'b2': array([[0.]])}


In [5]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# Step 1: XOR Input & Output
X = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = np.array([[0], [1], [1], [0]])

# Step 2: Define the model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))  # Hidden layer
model.add(Dense(1, activation='sigmoid'))            # Output layer

# Step 3: Compile the model
model.compile(loss='binary_crossentropy', optimizer=SGD(learning_rate=0.1), metrics=['accuracy'])

# Step 4: Train the model
model.fit(X, Y, epochs=500, verbose=0)

# Step 5: Evaluate
predictions = model.predict(X)
print("Predictions:\n", np.round(predictions))


2025-07-21 09:57:39.245094: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753091859.519191      13 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753091859.594825      13 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-07-21 09:57:54.984487: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step
Predictions:
 [[0.]
 [1.]
 [1.]
 [0.]]


In [6]:
import tensorflow as tf
import numpy as np

# Step 1: Sample data (XOR logic)
X = tf.constant([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=tf.float32)
Y = tf.constant([[0], [1], [1], [0]], dtype=tf.float32)

# Step 2: Define model weights manually
W1 = tf.Variable(tf.random.normal((2, 4)), dtype=tf.float32)  # (input_dim, hidden_dim)
b1 = tf.Variable(tf.zeros((4,)), dtype=tf.float32)

W2 = tf.Variable(tf.random.normal((4, 1)), dtype=tf.float32)  # (hidden_dim, output_dim)
b2 = tf.Variable(tf.zeros((1,)), dtype=tf.float32)

# Step 3: Define learning rate
learning_rate = 0.1

# Step 4: Use GradientTape for forward + backward pass
with tf.GradientTape() as tape:
    # Forward pass
    Z1 = tf.matmul(X, W1) + b1         # Shape: (4, 4)
    A1 = tf.nn.relu(Z1)                # Shape: (4, 4)
    Z2 = tf.matmul(A1, W2) + b2        # Shape: (4, 1)
    A2 = tf.nn.sigmoid(Z2)             # Final output

    # Loss (binary crossentropy)
    loss = tf.reduce_mean(tf.keras.losses.binary_crossentropy(Y, A2))

# Step 5: Compute gradients
grads = tape.gradient(loss, [W1, b1, W2, b2])

# Step 6: Print gradients and their shapes
print("Loss:", loss.numpy())
print("Gradients:")
print("dW1 shape:", grads[0].shape)  # (2, 4)
print("db1 shape:", grads[1].shape)  # (4,)
print("dW2 shape:", grads[2].shape)  # (4, 1)
print("db2 shape:", grads[3].shape)  # (1,)

# Step 7: Optional - Update weights manually
W1.assign_sub(learning_rate * grads[0])
b1.assign_sub(learning_rate * grads[1])
W2.assign_sub(learning_rate * grads[2])
b2.assign_sub(learning_rate * grads[3])


Loss: 0.9508767
Gradients:
dW1 shape: (2, 4)
db1 shape: (4,)
dW2 shape: (4, 1)
db2 shape: (1,)


<tf.Variable 'UnreadVariable' shape=(1,) dtype=float32, numpy=array([0.02521864], dtype=float32)>

# Code from the keras

In [7]:
import tensorflow 
from tensorflow import keras 
from tensorflow.keras import Sequential
from keras.layers import Dense

In [8]:
df

Unnamed: 0,cgpa,profile,lpa
0,8,8,4
1,7,9,5
2,6,10,6
3,5,12,7
