Often We need to organize our code into modules that PyTorch can track, save, and move to GPUs.

## Problem 1: The "MyLinear" Layer from Scratch

In Notebook 03, we used `w` and `b` as loose tensors. Now, we will build a proper Layer. You are **not** allowed to use `nn.Linear`. You must use `nn.Parameter`.



**Your Challenge:**
Implement a class `MyLinear` that mimics `nn.Linear`.



In [6]:
import torch
import torch.nn as nn

class MyLinear(nn.Module): 
    def __init__(self, in_dim, out_dim):
        super().__init__() 
        # Initialize smaller weights for stability
        self.weights = nn.Parameter(torch.randn(in_dim, out_dim) * 0.01)
        self.bias = nn.Parameter(torch.zeros(out_dim))
    
    def forward(self, x):
        # b = batch, i = in_dim, j = out_dim
        y_pred = torch.einsum('bi,ij->bj', x, self.weights) + self.bias
        return y_pred

# Test it with a Batch of 5:
model = MyLinear(10, 3)
x = torch.randn(5, 10) # 5 samples, 10 features each
y_pred = model(x)
print(f"Output shape: {y_pred.shape}") # Should be (5, 3)

print(f"model state keys: {list(model.state_dict().keys())}")
print(f"no of model params: {len(list(model.parameters()))}")

Output shape: torch.Size([5, 3])
model state keys: ['weights', 'bias']
no of model params: 2


<details>
<summary>Post-Mortem (Problem 1)</summary>

#### 1. Why is the count only 2?
In PyTorch, `model.parameters()` returns a list of **Tensors** (specifically `nn.Parameter` objects).
1.  `self.weights` is **one** tensor object (of shape $10 \times 3$).
2.  `self.bias` is **one** tensor object (of shape $3$).

Even though there are 33 individual floating-point numbers (elements) inside, PyTorch counts them as 2 "tensors to be optimized." 

**To see the total number of individual numbers (scalar elements), you have to do this:**
```python
total_elements = sum(p.numel() for p in model.parameters())
print(f"Total scalar elements: {total_elements}") # This will be 33
```

#### 2. The "Ghost" Experiment Analysis
If you added `self.noise = torch.randn(10)`:
*   It is **NOT** in the `state_dict`.
*   It is **NOT** in `parameters()`.
*   **Why?** Because it is just a standard Python attribute. PyTorch's `nn.Module` has a "magic" `__setattr__` method. When you write `self.x = ...`, PyTorch checks: *"Is this an `nn.Module` or an `nn.Parameter`?"* If yes, it adds it to the internal tracker. If not, it ignores it.

</details>

### Problem 2: The Multi-Layer Perceptron (Stacking)



**Your Challenge:**
Use your `MyLinear` class to build a 2-layer network.

In [7]:
class SimpleMLP(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super().__init__()
        # 1. Define two layers using your MyLinear class
        self.fc1 = MyLinear(in_dim, hidden_dim)
        self.fc2 = MyLinear(hidden_dim, out_dim)
        
    def forward(self, x):
        # 2. Implement: Input -> Layer 1 -> ReLU -> Layer 2
        # Use torch.relu() or torch.nn.functional.relu()
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        return x

# 3. Create the MLP: 10 inputs -> 20 hidden -> 1 output
mlp = SimpleMLP(10, 20, 1)

# TEST: Run the parameter check again!
print(f"MLP state keys: {list(mlp.state_dict().keys())}")
print(f"Total MLP parameter objects: {len(list(mlp.parameters()))}")

MLP state keys: ['fc1.weights', 'fc1.bias', 'fc2.weights', 'fc2.bias']
Total MLP parameter objects: 4
