##### **Deep Learning**

Sure! Here's a mock simple deep learning code example — written in pure Python using no external libraries, just to illustrate core concepts like:

*   Forward pass
*   Simple activation (ReLU or sigmoid)
*   Loss calculation (MSE)
*   Backpropagation (manually coded)
*   Weight update (gradient descent)

⚠️ Note: This is for educational/demo purposes — real deep learning is done with libraries like PyTorch or TensorFlow.

🧠 **Mock Deep Learning Example: Train a Single Neuron to Learn y = 2x + 1**

In [2]:
# ================
# MOCK DEEP LEARNING
# ================

import random


# Helper functions
def relu(x):
    return max(0, x)


def relu_derivative(x):
    return 1 if x > 0 else 0


def mean_squared_error(pred, target):
    return (pred - target) ** 2


def mse_derivative(pred, target):
    return 2 * (pred - target)


# Our mock training data
# Target function: y = 2x + 1
training_data = [(x, 2 * x + 1) for x in range(-5, 6)]

# Initialize weights and bias
weight = random.uniform(-1, 1)
bias = random.uniform(-1, 1)
learning_rate = 0.01

print(f"Training Data: {training_data}")
print(f"Weight: {weight:.4f}, Bias: {bias}, Learning Rate: {learning_rate}")

# Training loop
for epoch in range(100):
    total_loss = 0

    for x, y_true in training_data:
        # Forward pass
        z = weight * x + bias
        y_pred = relu(z)

        # Loss
        loss = mean_squared_error(y_pred, y_true)
        total_loss += loss

        # Backward pass (manual gradients)
        dL_dy = mse_derivative(y_pred, y_true)
        dy_dz = relu_derivative(z)
        dz_dw = x
        dz_db = 1

        # Chain rule
        dL_dw = dL_dy * dy_dz * dz_dw
        dL_db = dL_dy * dy_dz * dz_db

        # Gradient descent
        weight -= learning_rate * dL_dw
        bias -= learning_rate * dL_db

    # Print progress
    if epoch % 10 == 0:
        epoch = f"Epoch {epoch}: Loss = {total_loss:.4f}, Weight = {weight:.4f}, Bias = {bias:.4f}"
        print(epoch)

# Test final model
print("\n=== Testing Trained Model ===")
for x in [-2, 0, 2, 4]:
    y_pred = relu(weight * x + bias)
    print(f"x = {x} => y_pred = {y_pred:.2f} (expected ~ {2*x + 1})")

Training Data: [(-5, -9), (-4, -7), (-3, -5), (-2, -3), (-1, -1), (0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (5, 11)]
Weight: -0.8689, Bias: -0.28509397937197534, Learning Rate: 0.01
Epoch 0: Loss = 365.7803, Weight = 1.8227, Bias = -0.1464
Epoch 10: Loss = 166.1227, Weight = 2.1734, Bias = 0.2690
Epoch 20: Loss = 165.5371, Weight = 2.1200, Bias = 0.4944
Epoch 30: Loss = 165.2569, Weight = 2.0830, Bias = 0.6503
Epoch 40: Loss = 165.1229, Weight = 2.0574, Bias = 0.7581
Epoch 50: Loss = 165.0588, Weight = 2.0397, Bias = 0.8327
Epoch 60: Loss = 165.0281, Weight = 2.0275, Bias = 0.8843
Epoch 70: Loss = 165.0135, Weight = 2.0190, Bias = 0.9200
Epoch 80: Loss = 165.0064, Weight = 2.0131, Bias = 0.9446
Epoch 90: Loss = 165.0031, Weight = 2.0091, Bias = 0.9617

=== Testing Trained Model ===
x = -2 => y_pred = 0.00 (expected ~ -3)
x = 0 => y_pred = 0.97 (expected ~ 1)
x = 2 => y_pred = 4.99 (expected ~ 5)
x = 4 => y_pred = 9.00 (expected ~ 9)


---
We’ll go step-by-step through how gradients are computed and how the model learns using **gradient descent**.

#### 🎯 GOAL

We're training a **single neuron** (i.e., a basic linear model with an activation) to learn this function:

```
y = 2x + 1
```

The neuron:

* Takes input `x`
* Applies a linear function: `z = weight * x + bias`
* Passes it through ReLU: `y_pred = relu(z)`
* Compares it to the true value `y_true` using MSE loss
* Computes gradients of the loss w\.r.t. `weight` and `bias`
* Updates them using gradient descent

---

#### 🧮 Forward Pass

```python
z = weight * x + bias
y_pred = relu(z)
loss = mean_squared_error(y_pred, y_true)
```

---

#### 🔁 Backward Pass (Manual Backpropagation)

We want to compute how much the **loss** changes with respect to each parameter:

* `dL/dw` (change in loss with respect to weight)
* `dL/db` (change in loss with respect to bias)

We’ll do this using the **chain rule** of calculus.

---

#### 🔄 1. Loss Derivative (MSE)

```python
dL_dy = mse_derivative(y_pred, y_true)  # dL/dy
```

For MSE:

$$
\text{Loss} = (y_{\text{pred}} - y_{\text{true}})^2  
\Rightarrow \frac{dL}{dy_{\text{pred}}} = 2 \cdot (y_{\text{pred}} - y_{\text{true}})
$$

---

#### 🔄 2. ReLU Derivative

```python
dy_dz = relu_derivative(z)  # dy/dz
```

For ReLU:

$$
\text{ReLU}(z) = \max(0, z)  
\Rightarrow \frac{dy}{dz} =
\begin{cases}
1 & \text{if } z > 0 \\
0 & \text{otherwise}
\end{cases}
$$

---

#### 🔄 3. Derivatives of z

```python
dz_dw = x
dz_db = 1
```

Because:

* $z = w \cdot x + b$
* $\frac{dz}{dw} = x$
* $\frac{dz}{db} = 1$

---

#### 🔗 Combine with Chain Rule

```python
dL_dw = dL_dy * dy_dz * dz_dw
dL_db = dL_dy * dy_dz * dz_db
```

This follows:

$$
\frac{dL}{dw} = \frac{dL}{dy} \cdot \frac{dy}{dz} \cdot \frac{dz}{dw}  
= (2 \cdot (y_{\text{pred}} - y_{\text{true}})) \cdot \text{ReLU}'(z) \cdot x
$$

---

#### 🧠 Weight Update (Gradient Descent)

```python
weight -= learning_rate * dL_dw
bias   -= learning_rate * dL_db
```

You move **against the gradient** to reduce the loss.

---

#### 📌 Complete Backprop Code Block in Context:

```python
# Forward pass
z = weight * x + bias
y_pred = relu(z)

# Loss
loss = mean_squared_error(y_pred, y_true)
total_loss += loss

# Backward pass
dL_dy = mse_derivative(y_pred, y_true)    # ∂L/∂y
dy_dz = relu_derivative(z)                # ∂y/∂z
dz_dw = x                                 # ∂z/∂w
dz_db = 1                                 # ∂z/∂b

dL_dw = dL_dy * dy_dz * dz_dw             # ∂L/∂w
dL_db = dL_dy * dy_dz * dz_db             # ∂L/∂b

# Update weights
weight -= learning_rate * dL_dw
bias   -= learning_rate * dL_db
```

---

#### ✅ Summary

| Step                  | What It Does                           |
| --------------------- | -------------------------------------- |
| Forward Pass          | Compute prediction `y_pred` and loss   |
| Loss Derivative       | Compute how loss changes w\.r.t output |
| Activation Derivative | Account for ReLU turning off gradients |
| Chain Rule            | Propagate error to weight and bias     |
| Update                | Apply gradients via gradient descent   |

---

Would you like me to:

* Extend this to a **hidden layer**?
* Switch to **sigmoid** or **no activation**?
* Visualize how loss decreases over time?

Let me know how deep you want to go!
