# 1. Saving Weights
If you're also using CUDA (GPU tensors), and you want deterministic behavior there too, you should also call:


```python
torch.cuda.manual_seed(42)
```

but for CPU, 
```python
torch.manual_seed(42)
```
is enough.




In [6]:
import torch
import torch.nn as nn

import os


model = nn.Sequential(nn.Flatten(), nn.Linear(2, 10),
                      nn.ReLU(),    nn.Linear(10, 1))


if torch.cuda.is_available():
    device = torch.device("cuda")
    torch.cuda.manual_seed(42)
else:
    device = torch.device("cpu")
    torch.manual_seed(42)


model.to(device)

if os.path.isfile("model.pth"):
    print("Loading model from model.pth")
    model.load_state_dict(torch.load("model.pth", weights_only=True))
else:
    print("Creating new model")


N, C, H, W = 5, 1, 2, 1
input = torch.randn(N, C, H, W, device=device)


input = input.view(N, -1)
print(model(input))


torch.save(model.state_dict(), "model.pth")


Loading model from model.pth
tensor([[ 0.5498],
        [ 0.2914],
        [-0.0424],
        [-0.1927],
        [-0.2770]], device='cuda:0', grad_fn=<AddmmBackward0>)


# 2. Saving Model Architecture + Weights

There are **two main ways** to save both **model architecture + weights** together in PyTorch:

---

### 2.1. Save the **whole model** (architecture + weights)

You can simply do:

```python
torch.save(model, "model_full.pth")
```

And then **load** it like this:

```python
model = torch.load("model_full.pth")
```

This way, you **don't need to manually recreate the model** — it knows its own architecture.

**BUT**, it's a bit less flexible, because it's tied to the exact Python code and environment where you saved it (especially class definitions). It's not very portable across different codebases.

---

### 2.2. Save **architecture and weights separately** (better practice)

A **cleaner, safer** way (especially for serious projects) is:

**When saving:**

```python
# Save model parameters (e.g., for example input size, number of classes, etc.)
torch.save({
    'model_state_dict': model.state_dict(),
    'input_size': 784,
    'hidden_size': 500,
    'output_size': 10
}, 'model_checkpoint.pth')
```

**When loading:**

```python
# Load the checkpoint
checkpoint = torch.load('model_checkpoint.pth')

# Recreate the model using saved parameters
model = MyModel(checkpoint['input_size'], checkpoint['hidden_size'], checkpoint['output_size'])

# Load the weights
model.load_state_dict(checkpoint['model_state_dict'])
```

This is much more flexible: you can update the model code later if needed, and still restore the trained model correctly.

---


**if your model is big and complex** (many convolution layers, BatchNorms, Dropouts, special blocks like ResNet, etc.):  It's **better to separate model architecture info into a separate config file** (like **YAML** or **JSON**).

---

### Typical structure for big models:

- `model_weights.pth`: → only stores the **state_dict** (weights and batchnorm stats).
- `model_config.yaml` or `model_config.json`: → describes the **model architecture** (layer sizes, number of blocks, activation types, etc.).


For example, the YAML file might look like:

```yaml
model_type: "resnet"
num_layers: 50
input_channels: 3
num_classes: 1000
block_type: "bottleneck"
```

Then you **reconstruct the model from the YAML/JSON** + **load weights from .pth**.

---

`model_config.yaml`:

```yaml
model_type: simple_cnn
input_channels: 3
num_classes: 10
conv_layers: 
  - out_channels: 32
    kernel_size: 3
    stride: 1
  - out_channels: 64
    kernel_size: 3
    stride: 1
```

---


In [9]:

import torch
import torch.nn as nn
import yaml

# 1. Define a model class that can build dynamically based on config
class SimpleCNN(nn.Module):
    def __init__(self, input_channels, num_classes, conv_layers):
        super().__init__()
        layers = []
        in_channels = input_channels
        for layer_cfg in conv_layers:
            layers.append(nn.Conv2d(in_channels, layer_cfg['out_channels'], kernel_size=layer_cfg['kernel_size'], stride=layer_cfg['stride']))
            layers.append(nn.BatchNorm2d(layer_cfg['out_channels']))
            layers.append(nn.ReLU(inplace=True))
            in_channels = layer_cfg['out_channels']
        self.conv = nn.Sequential(*layers)
        self.fc = nn.Linear(in_channels, num_classes)
        
    def forward(self, x):
        x = self.conv(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

# 2. Load config
with open("model_config.yaml", "r") as f:
    config = yaml.safe_load(f)

# 3. Create the model
model = SimpleCNN(
    input_channels=config['input_channels'],
    num_classes=config['num_classes'],
    conv_layers=config['conv_layers']
)

# 4. Load weights


if os.path.isfile("model_weights.pth"):
    print("Loading model from model_weights.pth")
    model.load_state_dict(torch.load("model_weights.pth"))
    model.eval()
    print("Model is ready")



