##  Two ways to save models in PyTorch

### 1. `torch.save(model.state_dict(), PATH)`

This saves **only the parameters** (weights & biases) of the model.

```python
# Save
torch.save(model.state_dict(), "model_weights.pth")

# Load
model = TheModelClass(*args, **kwargs)  # must recreate same architecture
model.load_state_dict(torch.load("model_weights.pth"))
model.eval()
```

* **Pros:**

  * Recommended / common way.
  * File is lighter (only contains tensors).
  * Flexible: you can change some parts of the model code and still load weights.
  * More portable (different machines, PyTorch versions).
*  **Cons:**

  * You must have the model class definition available when loading.
  * Requires you to manually re-create the model object before loading weights.

---

### 2. `torch.save(model, PATH)`

This saves the **entire model object**, including:

* Architecture definition
* Parameters

```python
# Save
torch.save(model, "entire_model.pth")

# Load
model = torch.load("entire_model.pth")
model.eval()
```

*  **Pros:**

  * Super easy to load — you don’t need to redefine the model class.
  * Useful for quick experiments or when sharing with colleagues who won’t re-create the model code.
*  **Cons:**

  * Not portable: tightly coupled to the code (class name, location in module, etc.).
  * May break if:

    * Code changes (e.g., you rename a class or move files).
    * Different PyTorch versions are used.
  * Larger file size.

---

##  Which one is more common?

 **Best practice (most common in research & production):**

```python
torch.save(model.state_dict(), PATH)
```

because it’s more **robust, portable, and reproducible**.

**When to use `torch.save(model, PATH)`:**

* Quick experiments, debugging, or toy projects.
* When you’re not planning to refactor the code and just want to “freeze” everything.

---

 **Summary:**

* Use `state_dict()` in 99% of cases (production, research, sharing).
* Use `torch.save(model)` only if you want the *entire object snapshot* and you control the environment.

---




```python
class SimpleMNISTModel(nn.Module):
    def __init__(self, num_classes=10, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Feature extraction layers (conv layers) - can be frozen/unfrozen
        self.features = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32,
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 28x28 -> 14x14

            nn.Conv2d(in_channels=32, out_channels=64,
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 14x14 -> 7x7

            nn.Conv2d(in_channels=64, out_channels=128,
                      kernel_size=3, padding=1),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d((4, 4))  # 7x7 -> 4x4
        )

        # Classifier layers (MLP)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=128*4*4, out_features=256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(in_features=256, out_features=128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(in_features=128, out_features=num_classes)
        )

    def forward(self, x):
        # Feature extraction
        x = self.features(x)
        # Classification
        x = self.classifier(x)
        return x

```


**`features.0.weight torch.Size([32, 1, 3, 3])`**
- This is the **first convolutional layer** (line 21-22 in your code)
- **Shape breakdown**: `[out_channels, in_channels, kernel_height, kernel_width]`
  - `32`: Output channels (number of feature maps this layer produces)
  - `1`: Input channels (grayscale MNIST images have 1 channel)
  - `3, 3`: Kernel size (3×3 convolution filter)

**`features.0.bias torch.Size([32])`**
- **Bias terms** for the first conv layer
- One bias value per output channel (32 biases for 32 output channels)

**`features.3.weight torch.Size([64, 32, 3, 3])`**
- This is the **second convolutional layer** (line 26-27 in your code)
- **Shape breakdown**:
  - `64`: Output channels
  - `32`: Input channels (matches the output of the previous layer)
  - `3, 3`: Kernel size

**`features.3.bias torch.Size([64])`**
- **Bias terms** for the second conv layer (64 biases)

**`features.6.weight torch.Size([128, 64, 3, 3])`**
- This is the **third convolutional layer** (line 31-32 in your code)
- **Shape breakdown**:
  - `128`: Output channels
  - `64`: Input channels (from previous layer)
  - `3, 3`: Kernel size

**`features.6.bias torch.Size([128])`**
- **Bias terms** for the third conv layer (128 biases)

---

The indices (0, 3, 6) correspond to the position of each Conv2d layer in your `nn.Sequential`:
- Index 0: First `nn.Conv2d`
- Index 1: `nn.ReLU()` (no parameters)
- Index 2: `nn.MaxPool2d()` (no parameters)
- Index 3: Second `nn.Conv2d`
- Index 4: `nn.ReLU()` (no parameters)
- Index 5: `nn.MaxPool2d()` (no parameters)
- Index 6: Third `nn.Conv2d`

---

Notice how the channels flow through your network:
```
Input: 1 channel (grayscale)
   ↓
Conv1: 1 → 32 channels
   ↓
Conv2: 32 → 64 channels
   ↓
Conv3: 64 → 128 channels
```


### 1. **Automatic Parameter Registration**

When you create layers like `nn.Conv2d` and `nn.Linear`, they internally create `nn.Parameter` objects for their weights and biases. These parameters are automatically registered with the parent module:



### 3. **Module Hierarchy and Parameter Discovery**

When you assign modules to `self.features` and `self.classifier`, PyTorch's `nn.Module` class:

1. **Detects all child modules** recursively
2. **Collects all parameters** from each child module
3. **Makes them accessible** via `model.parameters()` and `model.named_parameters()`

### **Key Points**

✅ **No manual tracking needed**: You don't need to register parameters yourself

✅ **Automatic `requires_grad=True`**: All layer parameters are trainable by default

✅ **Hierarchical access**: You can access parameters at different levels:
- `model.parameters()` - all parameters
- `model.features.parameters()` - only conv layer parameters  
- `model.classifier.parameters()` - only linear layer parameters

✅ **Works with any nn.Module structure**: Whether you use `nn.Sequential`, individual layers, or custom modules
