
# Create & Save a Custom CNN (TorchScript) for **fakevoicefinder**

This notebook shows how to define a **custom CNN** that:
- Works with **variable input sizes** (independent of spectrogram width/height) thanks to **AdaptiveAvgPool2d**.
- Outputs **2 classes** (real vs fake).
- Is exported as **TorchScript** (`.pt`) so the library can load it as a **user model** (`usermodel_jit`).

> **Save location**: by default we save to `../models/SimpleCNN_scripted.pt` (assuming this notebook lives in a `notebooks/` folder).


In [1]:

# If you need to install PyTorch, follow your platform instructions at https://pytorch.org/
import torch
import torch.nn as nn
from pathlib import Path



## Define a size-agnostic CNN

Key idea: use **`nn.AdaptiveAvgPool2d((1, 1))`** before the final classifier so the model
does **not** depend on the input spatial size. This is perfect for spectrograms where time width varies.


In [4]:

class SimpleCNN(nn.Module):
    """A small, size-agnostic CNN for 2-class classification.

    Design goals
    ------------
    - Accepts inputs shaped [B, C, H, W] where C is typically 1 (mel/log spectrograms).
    - Uses AdaptiveAvgPool2d((1,1)) so the spatial dimensions do not constrain the model.
    - Final output has **2 logits** (real=0, fake=1). Apply Softmax externally if you need probabilities.

    TorchScript-compatibility:
    - The module is simple (pure nn layers) and can be exported with torch.jit.script.
    """
    def __init__(self, in_channels: int = 1, hidden_channels: int = 32):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(in_channels, hidden_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(hidden_channels),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),   # downsample by 2

            nn.Conv2d(hidden_channels, hidden_channels * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(hidden_channels * 2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),   # downsample by 2

            nn.Conv2d(hidden_channels * 2, hidden_channels * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(hidden_channels * 4),
            nn.ReLU(inplace=True),
            # No more maxpool to keep it lightweight; we rely on AdaptiveAvgPool below
        )
        self.gap = nn.AdaptiveAvgPool2d((1, 1))     # size-agnostic pooling
        self.classifier = nn.Linear(hidden_channels * 4, 2)  # 2 classes

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)        # [B, C*, H', W']
        x = self.gap(x)             # [B, C*, 1, 1]
        x = torch.flatten(x, 1)     # [B, C*]
        logits = self.classifier(x) # [B, 2]
        return logits



## Sanity check with variable input sizes

We test two spectrogram-like tensors with **different widths**; output must always be `[B, 2]`.


In [7]:

# Instantiate
model = SimpleCNN(in_channels=1, hidden_channels=32)
model.eval()

# Two inputs with different spatial sizes (H=128, W varies)
x1 = torch.randn(1, 1, 128, 300)
x2 = torch.randn(1, 1, 128, 552)

with torch.no_grad():
    y1 = model(x1)
    y2 = model(x2)

print("Output shapes:", y1.shape, y2.shape)  # both should be [1, 2]
assert y1.shape == (1, 2) and y2.shape == (1, 2), "Model output should be [B, 2] regardless of input size."
print("✅ Size-agnostic forward pass OK")


Output shapes: torch.Size([1, 2]) torch.Size([1, 2])
✅ Size-agnostic forward pass OK



## Export to TorchScript and save under `../models`

The library will look for user models in your **`models/`** folder. We save a `*.pt`
TorchScript file there, which the loader will detect and copy into the experiment.


In [10]:

save_dir = Path("../models")
save_dir.mkdir(parents=True, exist_ok=True)
save_path = save_dir / "SimpleCNN_scripted.pt"

# Script and save
scripted = torch.jit.script(model)
scripted.save(str(save_path))

print(f"✅ TorchScript saved to: {save_path.resolve()}")


✅ TorchScript saved to: D:\UMNG-2025\FakeVoice\FakeVoice\models\SimpleCNN_scripted.pt



## Reload test (optional)


In [None]:

reloaded = torch.jit.load(str(save_path), map_location="cpu")
reloaded.eval()
with torch.no_grad():
    y = reloaded(torch.randn(2, 1, 128, 400))
print("Reloaded output shape:", y.shape)  # [2, 2]
print("✅ Reload test OK")



## Use this model in **fakevoicefinder**

Once saved in `models/`, your experiment can load it as a *user TorchScript* model:

```python
from fakevoicefinder.experiment import CreateExperiment
from fakevoicefinder.model_loader import ModelLoader

# (1) Build your experiment as usual (cfg already validated)
exp = CreateExperiment(cfg, experiment_name=cfg.run_name)
exp.build()

# (2) Prepare benchmarks (optional)
loader = ModelLoader(exp)
loader.prepare_benchmarks(add_softmax=False, input_channels=1)  # or 3 if your transforms are RGB-like

# (3) Prepare user models (TorchScript)
#    This will detect ../models/SimpleCNN_scripted.pt and register it as 'usermodel_<file>'
loader.prepare_user_models()

# (4) Proceed to training with Trainer(exp) ...
```
