# Quality Metrics

Let's now discuss masures of quality for the generated images, specifically three of them:

- **BPD (Bits Per Dimension):** Measures how well a model compresses data; lower BPD means better likelihood of the data under the model. Common in likelihood-based models.
- **FID (Fréchet Inception Distance):** Compares real and generated image distributions using features from an Inception network; lower FID means more realistic and diverse images.
- **IS (Inception Score):** Evaluates image quality and diversity using the Inception model’s output; higher IS means images are both sharp (confident labels) and diverse (many classes).

Please note that calculating FID and IS required loading an Inception V3 model, a convolutional neural network architecture that is part of the Inception family of models, developed by Google, and performing many forward passes, so this can be computatinally intensive and require some significant GPU memory, specially with larger `num_eval_samples`.

Also, the first time you run evaluation, torchvision might need to download the pre-trained weights for the InceptionV3 model.

We'll start with the conditional model and, for instance, class 0.

<font color='red'>The cell below downloads pretrained models</font>

In [None]:
import sys
import os

# Get the parent directory
parent_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))

# Add the parent directory to sys.path
sys.path.append(parent_dir)

from fetch_models import main
main()

In [1]:
from quality_utils import get_controls
from IPython.display import display

display(get_controls())



Using device: cpu


VBox(children=(HTML(value='<b>Evaluation Configuration:</b>'), Dropdown(description='Dataset:', options=('mnis…

It can also be run as a standalone command. This time we'll use an conditional model.

In [1]:
!cd .. && python main.py \
 --mode evaluate \
 --dataset mnist \
 --image_channels 3 \
 --process ve \
 --model_path mnist_model/ve_50_conditional.pth \
 --use_class_condition \
 --target_class 0 \
 --num_eval_samples 100 \
 --sampler euler \
 --gen_steps 1000 \
 --pc_snr 0.1

Using device: cuda
Selected Dataset: MNIST
Selected mode: evaluate
Selected diffusion process: ve
Initializing VE diffusion process with sigma=25.0, T=1.0
--- Running Model Evaluation (MNIST) ---
Starting evaluation for model: mnist_model/ve_50_conditional.pth on MNIST
Target Class:  for class 0
Process: ve, Sampler: euler, Steps: 1000, Eval Samples: 100
Loading score model...
Evaluation will generate samples conditioned on class 0.
[DEBUG run_evaluation] Passing to generic_load_model: num_classes=10, image_channels=3
[DEBUG run_evaluation ENTRY] Received num_classes_eval=10, args.use_class_condition=True, load_as_conditional=True
[DEBUG run_evaluation] Setting load_image_channels=1 for MNIST model.
Loading model from: mnist_model/ve_50_conditional.pth
Getting ScoreModel configured for VE process.
Model loaded successfully.
Loading real test dataset (MNIST)...
Loading MNIST dataset...
Converting MNIST to 3 channels.
Using all 60000 MNIST training samples.
Loading standard MNIST test se

We can see values here are worse because VE is giving worse results in this case and because we have less images on the real dataset to test against as we are filtering it.

In [2]:
display(get_controls())

VBox(children=(HTML(value='<b>Evaluation Configuration:</b>'), Dropdown(description='Dataset:', options=('mnis…

In [None]:
!cd .. && python main.py \
 --mode evaluate \
 --dataset mnist \
 --image_channels 3 \
 --process vp --schedule cosine \
 --model_path mnist_model/vp_linear_40_conditional.pth \
 --use_class_condition \
 --target_class 0 \
 --num_eval_samples 100 \
 --sampler ei \
 --gen_steps 1000 \
 --pc_snr 0.1

Using device: cpu
Selected Dataset: MNIST
Selected mode: evaluate
Selected diffusion process: vp
Using VP Schedule: cosine
Applying MNIST-specific cosine beta clamp [1.0e-04, 20.0]
Initializing VP diffusion process with cosine schedule, T=1.0.
  Cosine schedule using beta clamp: [1.0e-04, 20.0]
--- Running Model Evaluation (MNIST) ---
Starting evaluation for model: mnist_model/vp_linear_40_conditional.pth on MNIST
Target Class:  for class 0
Process: vp, Sampler: ei, Steps: 1000, Eval Samples: 100
Loading score model...
Evaluation will generate samples conditioned on class 0.
[DEBUG run_evaluation] Passing to generic_load_model: num_classes=10, image_channels=3
[DEBUG run_evaluation ENTRY] Received num_classes_eval=10, args.use_class_condition=True, load_as_conditional=True
[DEBUG run_evaluation] Setting load_image_channels=1 for MNIST model.
Loading model from: mnist_model/vp_linear_40_conditional.pth
Getting ScoreModel configured for VP process.
Model loaded successfully.
Loading real

Comparing the same model for a different amount of trained epochs.

In [3]:
for i in [10, 20, 30]:
    !cd .. && python main.py \
         --mode evaluate \
         --dataset cifar10 \
         --image_channels 3 \
         --process vp --schedule cosine \
         --model_path cifar_model/vp_cosine_100_filtered0_epoch{i}.pth \
         --target_class 0 \
         --num_eval_samples 100 --eval_batch_size 100\
         --sampler ei \
         --gen_steps 1000 \
         --pc_snr 0.1

Using device: cpu
Selected Dataset: CIFAR10
Selected mode: evaluate
Selected diffusion process: vp
Using VP Schedule: cosine
Applying default cosine beta clamp [1.0e-07, 1.0]
Initializing VP diffusion process with cosine schedule, T=1.0.
  Cosine schedule using beta clamp: [1.0e-07, 1.0]
--- Running Model Evaluation (CIFAR-10) ---
Starting evaluation for model: cifar_model/vp_cosine_100_filtered0_epoch10.pth on CIFAR-10
Target Class:  for class 0
Process: vp, Sampler: ei, Steps: 1000, Eval Samples: 100
Loading score model...
[DEBUG run_evaluation] Passing to generic_load_model: num_classes=None, image_channels=3
[DEBUG run_evaluation ENTRY] Received num_classes_eval=10, args.use_class_condition=False, load_as_conditional=False
[DEBUG run_evaluation] Setting load_image_channels=3 for CIFAR10 model.
Loading model from: cifar_model/vp_cosine_100_filtered0_epoch10.pth
Getting ScoreModel configured for VP process.
Model loaded successfully.
Loading real test dataset (CIFAR-10)...
Loading CI