## Table of Contents
- [Table of Contents](#table-of-contents)
- [1) Environment & Imports](#environment-imports)
- [2) Experiment Configuration](#experiment-configuration)
- [3) Decide Clip Length (Optional)](#decide-clip-length-optional)
- [4) Create Experiment Layout](#create-experiment-layout)
- [5) Prepare Dataset (Load / Split / Save Originals)](#prepare-dataset-load-split-save-originals)
- [6) Prepare Models](#prepare-models)
- [7) Sanity Checks](#sanity-checks)
- [8) Train & Evaluate (Optional)](#train-evaluate-optional)
- [9) Metrics & Plots (Optional)](#metrics-plots-optional)
- [Appendix ‚Äî Tips & Troubleshooting](#appendix-tips-troubleshooting) 

<a id="basic-example"></a>
# Basic Example

This notebook prepares data from the ZIPs (`real.zip`, `fake.zip`), loads **benchmark** models from torchvision (VGG, ResNet, ALEXNET) according to your configuration, and trains for **3 epochs** per transformation.

**Prerequisites**:
- Notebook folder: `notebooks/`
- ZIPs in `../dataset/` (at the repo root): `real.zip`, `fake.zip`
- (Optional) user-provided TorchScript models in `../models/`


In [4]:

import sys, os
from pathlib import Path

lib_path = os.path.abspath(os.path.join(os.getcwd(), ".."))
if lib_path not in sys.path:
    sys.path.insert(0, lib_path)
print("Project root added to sys.path:", lib_path)


Project root added to sys.path: D:\FakeVoiceFinder


<a id="environment-imports"></a>
## 1) Environment & Imports
<a id='sec1'></a>

**Goal:** Make sure the environment has the required packages and import the project modules.

**Requirements:**
- Python 3.9+ (recommended)
- `numpy`, `pandas`, `matplotlib`
- `librosa`, `soundfile`, `PyWavelets`
- `scikit-learn`, `torch`, `torchvision`

**Install (example):**
```bash
pip install numpy pandas matplotlib librosa soundfile PyWavelets scikit-learn torch torchvision
```
Run the next cell(s) to import `ExperimentConfig`, `CreateExperiment`, and helpers.

In [6]:
# 2) Imports principales
from pprint import pprint
from fakevoicefinder import ExperimentConfig, CreateExperiment, ModelLoader, Trainer, ConfigError,shortest_audio_seconds


<a id="experiment-configuration"></a>
## 2) Experiment Configuration
<a id='sec2'></a>

**Goal:** Define models, transforms, and hyperparameters.

**Key fields in `ExperimentConfig`:**
- `models_list`: e.g., `['alexnet', 'resnet18', 'convnext_tiny']`
- `transform_list`: any of `['mel', 'log', 'dwt']`
- `mel_params`, `log_params`, `dwt_params`: optional **overrides** (dicts). Defaults are used if not set.
- `clip_seconds`: window length (seconds) for each audio (pad/trim). Default is 3.0 s.
- `image_size`: **optional** resize for MEL/LOG (e.g., `224` for ViT).

Tip: Keep defaults first; only override if you need a different setting.

In [8]:
from fakevoicefinder import ExperimentConfig, ConfigError

cfg = ExperimentConfig()

# Experiment name (folder under outputs/)
cfg.run_name = "exp_newv5"

# Paths (repo-relative)
cfg.data_path = "../dataset"   # where real.zip and fake.zip are
cfg.models_path = "../models"  # user TorchScript models

# Transforms to generate
cfg.transform_list = ["mel","dwt","log","cqt"]   # now we use MEL + CQT

# --- mel: valid keys ---
# n_mels, n_fft, hop_length, win_length, fmin, fmax
cfg.mel_params = {
    "n_mels": 68,
    "n_fft": 2048,
    "hop_length": 512,
    # "win_length": None,
    # "fmin": 0,
    # "fmax": None,
}

# --- log: valid keys ---
# n_fft, hop_length, win_length
cfg.log_params = {
    "n_fft": 2048,
    "hop_length": 256,
    # "win_length": None,
}

# --- dwt: valid keys ---
# wavelet, level, mode
cfg.dwt_params = {
    "wavelet": "db6",
    "level": 5,
    "mode": "symmetric",
}

# --- cqt: valid keys ---
# hop_length, n_bins, bins_per_octave, fmin, scale
cfg.cqt_params = {
    "hop_length": 256,      # good time‚Äìfrequency tradeoff
    "n_bins": 96,           # recommended range ~84‚Äì120
    "bins_per_octave": 24,  # 12 or 24 ‚Üí 24 gives more detail on formants
    "scale": True,          # more stable spectral distribution
    # "fmin": 32.70319566,  # C1 (this is the default in the code)
    # "fmin": 65.40639133,  # C2 if you want to shift focus to vocal range
}

cfg.image_size = 224

# Benchmark models to test
cfg.models_list = [
    "alexnet","resnet18","vgg16","vit_b_16","convnext_tiny"
]

# Quick smoke-test training
cfg.type_train = "both"   # 'scratch' | 'pretrain' | 'both'
cfg.epochs = 3
cfg.batch_size = 8
cfg.learning_rate = 0.0001
cfg.patience = 5

# Input channels for spectrograms (.npy): 1 channel
cfg.input_channels = 1 

# Config validation
try:
    cfg.validate()
    print("Config validation ‚úÖ")
except ConfigError as e:
    print("Config validation error:", e)
    raise

print(cfg.summary())



Config validation ‚úÖ
ExperimentConfig:
  batch_size     : 8
  cache_features : True
  clip_seconds   : None
  cqt_params     : {'hop_length': 256, 'n_bins': 96, 'bins_per_octave': 24, 'scale': True}
  data_path      : ../dataset
  device         : gpu
  dwt_params     : {'wavelet': 'db6', 'level': 5, 'mode': 'symmetric'}
  epochs         : 1
  eval_metric    : ['accuracy', 'F1']
  fake_zip       : fake.zip
  flag_train     : True
  image_size     : 224
  input_channels : 1
  learning_rate  : 0.0001
  log_params     : {'n_fft': 2048, 'hop_length': 256}
  mel_params     : {'n_mels': 68, 'n_fft': 2048, 'hop_length': 512}
  models_list    : ['alexnet']
  models_path    : ../models
  num_workers    : 4
  optimizer      : Adam
  outputs_path   : outputs
  patience       : 5
  real_zip       : real.zip
  run_name       : exp_newv5
  save_best_only : True
  save_models    : True
  seed           : 23
  transform_list : ['mel', 'cqt']
  type_train     : both


<a id="decide-clip-length-optional"></a>
## 3) Decide Clip Length (Optional)
<a id='sec3'></a>

**Goal:** Choose the time window (`clip_seconds`) to use for all audios.

Use `shortest_audio_seconds(cfg)` to scan `reals.zip` and `fakes.zip` and return the shortest duration in seconds. Then either:
- **A)** set `cfg.clip_seconds = min_duration` to avoid truncation; or
- **B)** choose a fixed value (e.g., 3.0 s). Short files will be **zero-padded** automatically.

In [10]:
# 2) Elegir la ventana de audio (clip_seconds)
min_sec = int(shortest_audio_seconds(cfg))
print(f"Duraci√≥n m√≠nima detectada en los zips: {min_sec}")

# Opci√≥n A: usar exactamente la m√≠nima detectada
cfg.clip_seconds = min_sec

# Opci√≥n B: usar un valor fijo que t√∫ prefieras (p. ej., 3.0 s)
# cfg.clip_seconds = 3.0

# Nota: si pones un valor mayor que muchos audios, se rellenar√° con padding (como ya hace el pipeline).


Duraci√≥n m√≠nima detectada en los zips: 4


<a id="create-experiment-layout"></a>
## 4) Create Experiment Layout
<a id='sec4'></a>

**Goal:** Initialize the experiment folder structure and manifest (`experiment.json`).

Run `CreateExperiment(cfg).build()` to set up:
- `outputs/<RUN>/datasets/{train,test}/...`
- `outputs/<RUN>/models/loaded/`
- `outputs/<RUN>/reports/`

The manifest stores paths and metadata for reproducibility.

In [12]:
exp = CreateExperiment(cfg, experiment_name=cfg.run_name)
exp.build()

{'models': {'alexnet': {'loaded_path': None,
   'trained_path': None,
   'train_parameters': {'epochs': 1,
    'learning_rate': 0.0001,
    'batch_size': 8,
    'optimizer': 'Adam',
    'patience': 5,
    'device': 'gpu',
    'seed': 23,
    'type_train': 'both',
    'num_workers': 4,
    'transform': None}}},
 'train_data': {'original_dataset': {'path': 'outputs/exp_newv5/datasets/train/original',
   'num_items': 0},
  'transforms_dataset': {'mel': {'path': 'outputs/exp_newv5/datasets/train/transforms/mel',
    'params': {}},
   'cqt': {'path': 'outputs/exp_newv5/datasets/train/transforms/cqt',
    'params': {}}}},
 'test_data': {'original_dataset': {'path': 'outputs/exp_newv5/datasets/test/original',
   'num_items': 0},
  'transforms_dataset': {'mel': {'path': 'outputs/exp_newv5/datasets/test/transforms/mel',
    'params': {}},
   'cqt': {'path': 'outputs/exp_newv5/datasets/test/transforms/cqt',
    'params': {}}}},
 'reports': {'path': 'outputs/exp_newv5/reports'}}

<a id="prepare-dataset-load-split-save-originals"></a>
## 5) Prepare Dataset (Load / Split / Save Originals / Transform)
<a id='sec5'></a>

**Goal:**
1) Read `real.zip` and `fake.zip`.
2) Stratified split into train/test.
3) Extract original audio files into the experiment folders.
4) Generate Transforms (MEL / LOG / DWT)

In [14]:
summary = exp.prepare_data(train_ratio=0.8, seed=cfg.seed, transforms=cfg.transform_list)
print("Data prep summary:")
pprint(summary)

print("Manifest:", (exp.root / "experiment.json").as_posix())

Data prep summary:
{'load': {'fake': 600, 'real': 600},
 'save_original': {'test': 240, 'train': 960},
 'split': {'test': {'fake': 120, 'real': 120, 'total': 240},
           'train': {'fake': 480, 'real': 480, 'total': 960}},
 'transforms': {'cqt': {'test': 240, 'train': 960},
                'mel': {'test': 240, 'train': 960}}}
Manifest: D:/FakeVoiceFinder/outputs/exp_newv5/experiment.json


<a id="prepare-models"></a>
## 6) Prepare Models

In [16]:
loader = ModelLoader(exp)
bench = loader.prepare_benchmarks(add_softmax=False, input_channels=getattr(cfg, "input_channels", 1))
print("Benchmarks saved under models/loaded:")
pprint(bench)

# User models (if any .pt/.pth under cfg.models_path)
user = loader.prepare_user_models(add_softmax=False, input_channels=cfg.input_channels)
print("User models saved:")
pprint(user)

Benchmarks saved under models/loaded:
{'alexnet': {'pretrain': 'outputs/exp_newv5/models/loaded/alexnet_pretrain.pt',
             'scratch': 'outputs/exp_newv5/models/loaded/alexnet_scratch.pt'}}
User models saved:
{'SimpleCNN_scripted.pt': 'outputs/exp_newv5/models/loaded/SimpleCNN_scripted_usermodel_jit.pt'}


In [17]:
print(exp.loaded_models)
print(exp.loaded_models.exists())

D:\FakeVoiceFinder\outputs\exp_newv5\models\loaded
True


In [18]:
cfg.models_path

'../models'

<a id="sanity-checks"></a>
## 7) Sanity Checks
<a id='sec7'></a>

**Goal:** Verify shapes and parameters saved to the manifest.

- Load one `.npy` per class and print its shape.
- Inspect `experiment.json` parameters under `train_data.transforms_dataset[<name>].params`.

In [31]:
def print_tree(root: Path, max_depth: int = 3, prefix: str = ""):
    if max_depth < 0:
        return
    try:
        entries = sorted(root.iterdir(), key=lambda p: (p.is_file(), p.name.lower()))
    except FileNotFoundError:
        return
    for e in entries:
        print(prefix + ("üìÑ " if e.is_file() else "üìÅ ") + e.name)
        if e.is_dir():
            print_tree(e, max_depth - 1, prefix + "   ")

print_tree(exp.root, max_depth=3)

üìÅ datasets
   üìÅ test
      üìÅ original
         üìÅ fake
         üìÅ real
      üìÅ transforms
         üìÅ cqt
         üìÅ mel
   üìÅ train
      üìÅ original
         üìÅ fake
         üìÅ real
      üìÅ transforms
         üìÅ cqt
         üìÅ mel
üìÅ models
   üìÅ loaded
      üìÑ alexnet_pretrain.pt
      üìÑ alexnet_scratch.pt
      üìÑ SimpleCNN_scripted_usermodel_jit.pt
   üìÅ trained
üìÅ reports
üìÑ experiment.json


<a id="train-evaluate-optional"></a>
## 8) Train & Evaluate (Optional)
<a id='sec8'></a>

**Goal:** Train your selected models and compute metrics on the test split.

- Use your training loop or the provided trainer to fit each model.
- Evaluate with `MetricsReporter` to build a DataFrame of scores (Accuracy, F1).

In [34]:
exp.loaded_models

WindowsPath('D:/FakeVoiceFinder/outputs/exp_newv5/models/loaded')

In [None]:
trainer = Trainer(exp)
train_results = trainer.train_all()
print("Resultados de entrenamiento (rutas repo-relativas):")
pprint(train_results)

print("Best checkpoints stored in:", (exp.trained_models).as_posix())


[Trainer] Using device: cuda
[Trainer] Transforms to train: ['mel', 'cqt']
[Trainer] Models found: ['alexnet', 'usermodel_SimpleCNN_scripted.pt']

=== MODEL: alexnet ===
[alexnet] Hyperparams -> epochs=1, lr=0.0001, bs=8, optimizer=Adam, patience=5, seed=23, num_workers=4
[alexnet][mel] Dataset sizes -> train: 1920, test: 480
[alexnet][mel] Batches -> train: 240, test: 60
[alexnet][mel][scratch] Loading checkpoint: D:\FakeVoiceFinder\outputs\exp_newv5\models\loaded\alexnet_scratch.pt
[alexnet][mel][scratch] Loaded pickled module.
[alexnet][mel][scratch] Start training for 1 epochs
[alexnet][mel][scratch] Epoch 1/1 - loss=0.6964 acc=0.5000
[alexnet][mel][scratch] Confusion matrix (test):
[[TN= 240, FP=   0],
 [FN= 240, TP=   0]]
[alexnet][mel][scratch] ‚úÖ New best acc=0.5000 at epoch 1
[alexnet][mel] Saved best checkpoint -> alexnet_scratch_mel_seed23_epoch001_acc0.50.pt
[alexnet][mel][pretrain] Loading checkpoint: D:\FakeVoiceFinder\outputs\exp_newv5\models\loaded\alexnet_pretrain.pt


In [None]:
def print_tree(root: Path, max_depth: int = 3, prefix: str = ""):
    if max_depth < 0:
        return
    try:
        entries = sorted(root.iterdir(), key=lambda p: (p.is_file(), p.name.lower()))
    except FileNotFoundError:
        return
    for e in entries:
        print(prefix + ("üìÑ " if e.is_file() else "üìÅ ") + e.name)
        if e.is_dir():
            print_tree(e, max_depth - 1, prefix + "   ")

print_tree(exp.root, max_depth=3)

<a id="metrics-plots-optional"></a>
## 9) Metrics & Plots (Optional)
<a id='sec9'></a>

**Goal:** Visualize results.

- `plot_architectures_for_transform`: bar chart of `(model+variant)` for a single transform.
- `plot_variants_for_model`: bar chart of `(transform)` for a single `(model, variant)`.
- `plot_heatmap_models_transforms`: heatmap over `(models√óvariants) √ó transforms`.

**Color map defaults:** worst value ‚Üí red, better ‚Üí green, max at 100%. Each cell shows its value in %.

In [None]:
from fakevoicefinder.config import ExperimentConfig
from fakevoicefinder.experiment import CreateExperiment
from fakevoicefinder.metrics import MetricsReporter

EXP_NAME = cfg.run_name  

cfg = ExperimentConfig(); cfg.run_name = EXP_NAME
exp = CreateExperiment(cfg, experiment_name=cfg.run_name)  

rep = MetricsReporter(exp)                   # toma reports/ del manifest
df = rep.evaluate_all("metrics_summary.csv") # guarda CSV en outputs/<exp>/reports/

In [None]:
df

In [None]:
# Figuras (se guardan en 'reports/' al pasar out_name)
rep.plot_architectures_for_transform(df, transform="mel", metric="accuracy",
                                     y_min=0, y_max=100, out_name="fig_arch_mel_acc.png")


In [None]:
rep.plot_variants_for_model(df, model="alexnet", variant="pretrain", metric="accuracy",
                              y_min=0, y_max=100, out_name="fig_alexnet_pretrain_accuracy.png")


In [None]:
rep.plot_heatmap_models_transforms(df, metric="accuracy", vmin=50, vmax=100,
                                   out_name="fig_all.png")


<a id="appendix-tips-troubleshooting"></a>
## Appendix ‚Äî Tips & Troubleshooting
<a id='sec10'></a>

- If using ViT/ConvNeXt, prefer **224√ó224** inputs. Set `cfg.image_size = 224` (MEL/LOG) or use **DWT**.
- If audio files vary in length, pick an appropriate `clip_seconds`. Shorter files are zero-padded.
- If you change transform hyperparameters, rerun the transform step to regenerate features.
- Ensure CUDA is available if `cfg.device='gpu'`. Otherwise, it will fall back to CPU.
- Check `outputs/<RUN>/reports/` for figures and CSVs.