<a href="https://colab.research.google.com/github/SzymonNowakowski/Machine-Learning-2025/blob/master/Lab15-EDM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 15 - EDM, Working in ICM

### Author: Szymon Nowakowski

# Introduction

Today I will talk about executing code from the following paper:

Karras, T., Aittala, M., Aila, T., & Laine, S. (2022). Elucidating the Design Space of Diffusion-Based Generative Models. In Proceedings of the Neural Information Processing Systems (NeurIPS) Conference.
https://arxiv.org/abs/2206.00364

It has 2.7k citations as of October 2025.

### EDM Codebase

The code itself can be found here:

https://github.com/NVlabs/edm

Although 3 years old, the codebase and the precomputed networks it provides are still considered the state-of-the-art for generating images.

# Presentation Plan

## Cluster Computation (ICM Example)

## Working with Containers

## Best (Not Only Programming) Practices

## EDM

### Pretrained Networks

### Generating Images

### Calculating FID

# Cluster Computation (ICM Example)



## SLURM

In ICM they use ***SLURM***. SLURM stands for **Simple Linux Utility for Resource Management** — it's a workload manager widely used on **HPC (High-Performance Computing)** clusters to schedule and manage jobs.

It handles job submission, queuing, scheduling, monitoring, and resource allocation.

In SLURM each command starts with a letter `s`:

    {bash}
    # Submit a job
    sbatch myjob.sh

    # Check job queue
    squeue -u $USER

    # Check all jobs
    squeue

    # Cancel a job
    scancel 12345


I will show you an example of a SLURM batch job file  (`myjob.sh` in examples above) in a moment.

Dicussing SLURM in detail exceeds the scope of this class:

- The great resource I recommend (and use often myself) is the [ICM help page on SLURM](https://kdm.icm.edu.pl/Tutorials/HPC-intro/slurm_intro/).

- The available resouces in ICM i.e. server configurations, installed GPUs, available memory can be checked at the [ICM help page on computing resources](https://kdm.icm.edu.pl/Zasoby/komputery_w_icm.pl/).




## LSF


There exist other computing HPC environments, the most popular alternative to SLURM being ***LSF*** (**Load Sharing Facility**), developed originally by Platform Computing (currently delivered by IBM).

In LSF each command starts with a letter `b`:

    {bash}
    # Submit a job
    bsub < myjob.lsf

    # Check your jobs
    bjobs

    # Check all jobs
    bjobs -u all

    # Show information about available queues
    bqueues

    # Kill a job
    bkill 12345


# Working with Containers

In many HPC environments (including ICM), it is hard to install your own dependencies. While it is possible to install additional Python packages, it is sometimes hard or impossible to install additionall binaries.

In another cluster I work with in WIM, there is no Internet availability at all, so one cannot even install additional packages.

**The way out is using the containers.** But there is another obstacle: you cannot run Docker directly for security reasons in a cluster.  

Instead, you typically **build docker containers locally** (on your workstation or laptop) and then **transfer them** to the cluster, where they can be rebuilt to *Apptainer* (formerly Singularity) format.

Below are the steps to create, save, and rebuild a Docker container image.


## `Dockerfile`

The `Dockerfile` describes your environment — e.g., which base image to use, which Python packages to install, etc.

    {bash}
    # Example Dockerfile
    FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

    # Set working directory
    WORKDIR /workspace

    # Copy your project
    COPY . /workspace

    # Install extra dependencies
    RUN pip install -r requirements.txt

    # Default command
    CMD ["python", "train.py"]



## Building the Docker Image

    {bash}
    docker build -t myproject:latest .

This creates a local image named `myproject:latest`. Note the `.` (dot) indicating the `Dockerfile` file in the current directory.

Now, when you run

    {bash}
    docker run myproject
what happens is that the `python train.py` gets executed within the container.

There can only be one `CMD` instruction in a `Dockerfile` (if there are multiple, only the last one is used).

Often, `CMD` is used to launch the main script or application of the container. This command can be overwritten and you can run some other code (you can even pass arguments) by executing

    {bash}
    docker run myproject python some_other_code.py --epochs 10 --lr 1e-3

## Saving the Image to a `.tar` File

    {bash}
    docker save myproject:latest -o myproject_latest.tar


What happens is that the file `myproject_latest.tar` gets written to the current directory.

## Copying the container image to ICM

Use `scp` (secure copy) to transfer the image file to your home directory at ICM.  
Replace `username` with your ICM login name.

    {bash}
    scp myproject_latest.tar username@hpc.icm.edu.pl:/lu/tetyda/home/username/

`/lu/tetyda/home/username/` is a global path of your rysy home directory visible from `hpc.icm.edu.pl` server.


## Converting the Image to Apptainer Format on ICM

Here is an example of the SLURM configuration file that will rebuild the docker container `myproject_latest.tar` into apptainer's `myproject_latest.sif`:

    {bash}
    # after `ssh`-ing to RYSY
    more rebuild_container.slurm

    #!/bin/bash
    #SBATCH --job-name=docker2apptainer
    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --gres=gpu:1
    #SBATCH --time=48:00:00
    #SBATCH --account=g99-4302
    #SBATCH --output=slurm-%j.out

    export APPTAINER_TMPDIR=/home/$USER/tmp

    apptainer build ./myproject_latest.sif docker-archive:///home/$USER/myproject_latest.tar

Now we shall go through it line by line.

- `export APPTAINER_TMPDIR=/home/$USER/tmp` - this lines ensures that there is enough space on the output for temporary fies. Obviously, you need to create the `~/tmp` directory first.
- Note the triple `///` - it costed me a few days to figure it out. These days, ChatGPT can supply such hints instantly.

Now one needs to execute on RYSY

    {bash}
    sbatch rebuild_container.slurm
    squeue

to see something like

    {bash}
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    127248       gpu setup_en   ljanis PD       0:00      1 (AssocGrpCPUMinutesLimit)
    134837       gpu      OLA tomaszsu  R 12-08:46:28     1 rysy-n6
    134838       gpu docker2a   szymon  R       0:18      1 rysy-n1
    135515        ve     bash   herman  R    8:13:47      1 pbaran

And after a few minutes or hours, depending on the docker file size, one gets `myproject_latest.sif` file written to the current directory. **This is the apptainer container file**.

    



## Building the EDM Dockerfile

The original EDM `Dockerfile` from `https://github.com/NVlabs/edm/blob/main/Dockerfile` is the following

    {bash}
    # Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    #
    # This work is licensed under a Creative Commons
    # Attribution-NonCommercial-ShareAlike 4.0 International License.
    # You should have received a copy of the license along with this
    # work. If not, see http://creativecommons.org/licenses/by-nc-sa/4.0/

    FROM nvcr.io/nvidia/pytorch:22.10-py3

    ENV PYTHONDONTWRITEBYTECODE 1
    ENV PYTHONUNBUFFERED 1

    RUN pip install imageio imageio-ffmpeg==0.4.4 pyspng==0.1.0

    WORKDIR /workspace

    RUN (printf '#!/bin/bash\nexec \"$@\"\n' >> /entry.sh) && chmod a+x /entry.sh
    ENTRYPOINT ["/entry.sh"]

The original container file gave me errors (I was not able to execute the code from within it,  no doubt due to broken dependencies along the way - the code is 3 years old), and I had no access to the original NVidia image which would have obviously worked fine. I needed to update `pillow` package in my fork **`https://github.com/SzymonNowakowski/edm/blob/main/Dockerfile`:**

    {bash}
    # Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    #
    # This work is licensed under a Creative Commons
    # Attribution-NonCommercial-ShareAlike 4.0 International License.
    # You should have received a copy of the license along with this
    # work. If not, see http://creativecommons.org/licenses/by-nc-sa/4.0/

    FROM nvcr.io/nvidia/pytorch:22.10-py3

    ENV PYTHONDONTWRITEBYTECODE 1
    ENV PYTHONUNBUFFERED 1

    RUN pip install imageio imageio-ffmpeg==0.4.4 pyspng==0.1.0
    RUN pip install --upgrade pillow

    WORKDIR /workspace

    RUN (printf '#!/bin/bash\nexec \"$@\"\n' >> /entry.sh) && chmod a+x /entry.sh
    ENTRYPOINT ["/entry.sh"]

From there the standard sequence would let me build the docker container and ship it to ICM:

    {bash}
    docker build -t edm:latest .
    docker save edm:latest -o edm_latest.tar
    scp edm_latest.tar username@hpc.icm.edu.pl:/lu/tetyda/home/username/

Now we need to rewrite the SLURM script to include different filenames (it assumes the `~/tmp` directory is available - if not - create it!) so it looks like this:

    {bash}
    # while on your terminal
    ssh username@hpc.icm.edu.pl

    # while on HPC computer
    ssh rysy

    # while on RYSY computer
    more rebuild_container.slurm
    #!/bin/bash
    #SBATCH --job-name=docker2apptainer
    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --gres=gpu:1
    #SBATCH --time=48:00:00
    #SBATCH --account=g99-4302
    #SBATCH --output=slurm-%j.out

    export APPTAINER_TMPDIR=/home/$USER/tmp

    apptainer build ./edm_latest.sif docker-archive:///home/szymon/edm_latest.tar

Now one needs to execute

    {bash}
    sbatch rebuild_container.slurm
    squeue

to see something like

    {bash}
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    127248       gpu setup_en   ljanis PD       0:00      1 (AssocGrpCPUMinutesLimit)
    134837       gpu      OLA tomaszsu  R 12-08:46:28     1 rysy-n6
    134839       gpu docker2a   szymon  R       0:18      1 rysy-n1
    135515        ve     bash   herman  R    8:13:47      1 pbaran

And after a few hours the file `edm_latest.sif` gets written to the current directory.

**This is the apptainer container file we shall need**.

# Best (Not Only Programming) Practices



## Configure Your `ssh` and `scp` Connection

As you need a stable `ssh` and `scp` connection with the cluster, you will need to configure it with a file `~/.ssh/config`. An example of such a config file I use:

    {bash}
    cat ~/.ssh/config

    Host icm
        ForwardX11 yes
        ForwardAgent yes
        UserKnownHostsFile ~/.ssh/known_hosts
        Hostname hpc.icm.edu.pl
        LocalForward 8022 rysy:22
        ServerAliveInterval 120
        ServerAliveCountMax 2
        User szymon

    Host fizyk1.fuw.edu.pl
        Hostname fizyk1.fuw.edu.pl
        ServerAliveInterval 120
        ServerAliveCountMax 2
        User snowakowski

    Host dg1
        ForwardX11 yes
        ForwardAgent yes
        UserKnownHostsFile ~/.ssh/known_hosts
        Hostname 10.21.2.118
        LocalForward 5900 localhost:5900
        LocalForward 8000 localhost:8000
        LocalForward 8001 localhost:8001
        LocalForward 8008 localhost:8008
        LocalForward 8888 localhost:8888
        LocalForward 6006 localhost:6006
        LocalForward 1433 localhost:1433
        RequestTTY yes
        ServerAliveInterval 120
        ServerAliveCountMax 2
        ProxyJump  fizyk1.fuw.edu.pl:22
        User szym

- `ServerAliveInterval 120` - instructs your SSH client to send a small, encrypted keep-alive message to the server every 120 seconds (2 minutes).
So every 2 minutes, your client pings the server silently to say “I'm still here.”

- `ServerAliveCountMax 2` - defines how many unanswered keep-alive messages the client will tolerate before disconnecting.
With `ServerAliveCountMax 2`, if the server fails to respond to two consecutive keep-alives, the client assumes the connection is dead and terminates it.

The `dg1` via `fizyk1` connection is to show you how to configure the proxy jump.

Now, instead of

    {bash}
    # while on your terminal
    ssh -l snowakowski fizyk1.fuw.edu.pl

    # while on fizyk1
    ssh -l szym 10.21.2.118

You can invoke simply one-step connection and with a server name (instead of the IP) and without a username!

    {bash}
    ssh dg1

The same applies to `scp`.

**Next I will show you how to enable the tab-completion of remote paths in `scp`**.



## Configure Your `ssh` and `scp` Certificate

As you will frequently need to `ssh` and `scp` to and from the cluster, the second thing to consider is to set up a password-less connection. You not only skip typing your password every time, but you also unlock extra conveniences such as **tab-completion of remote paths** in `scp`.

To generate a public/private key pair execute:

    {bash}
    ssh-keygen

It generates:
- your private key: `~/.ssh/id_rsa`
- your public key: `~/.ssh/id_rsa.pub`

To copy the public key to the server (for instance to ICM) do it with

    {bash}
    ssh-copy-id icm

(assuming you have your `~/.ssh/config` file already created).

After that

    {bash}
    ssh icm

gets you to the `hpc.icm.edu.pl` server with only the OTP (one time password). That you cannot avoid, or at least I don't know how. In `hpc`, you can set up a direct certificate-based (pasword-less) connection to `rysy`, to be able to type only (without the password).

    {bash}
    ssh rysy

## Smart Use of GitHub

Using GitHub is always a good idea — it provides independent backup, transparent version control, and collaboration features with very little extra effort.  

While demonstrating how to use `git` in detail is beyond the scope of this class, there are a few important practices worth mentioning.

- Since we want to extend the EDM repository from `https://github.com/NVlabs/edm`, it is best to **create your own fork** (as I did with `https://github.com/SzymonNowakowski/edm`).  
  This allows you to modify the code freely - for instance, by adding a new `Dockerfile` instruction - and push those changes safely to your own repository.

- With `git`, every time you run your code you are working on a **specific, uniquely identified version** that can always be restored.  
  The *commit hash* serves as that unique version identifier, and using just the first 6–10 characters is typically sufficient.

- You can automatically retrieve the current version hash using a small Python helper:

  ```{Python}
  def get_git_revision_short_hash() -> str:
      return (
          subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'])
          .decode('ascii')
          .strip()
      )

  ```
  This function reads the current Git commit hash, converts it from bytes to a string, and trims the trailing newline character.

- You can prefix this short hash to all outputs — file names, directories, or logs. This guarantees that results can always be reproduced: simply retrieve the same code version and rerun it on the same data.

- Sometimes I even use two version identifiers: one for the code that trained the neural network, and another for the code version used to test or evaluate it.

- **The real issue is forgetting** — you must remember to commit your changes before executing the code.

- **Working in a remote environment actually helps prevent that**. When developing locally (e.g., in PyCharm) and running jobs on ICM, the proper workflow is to synchronize code using `git push` and `git pull`, not `scp`. This way, **forgetting to commit becomes impossible** — if you don't push and pull your latest version, the remote host will simply run the old code!




# EDM

## Pretrained Networks

### Parametrization used

The usual way to train a neural network in the diffusion papers is by generating samples
$$
Y_t = \alpha_t X + \sigma_t \epsilon,
$$
with (for all possible $t$ we consider):
- $\alpha_t$ the signal schedule,
- $\sigma_t$ the noise schedule,
- $\epsilon \sim N(0,1)$.

In EDM they use $\alpha \equiv 1$. They use the denoiser $D$ estimating the true image $X$ from the noised image $X+\sigma \epsilon$ with $\epsilon \sim N(0,1)$.

$$
\hat X = D (X+\sigma \epsilon, \sigma)
$$

However, $D$ is not a neural network. Rather, it is a function which uses a neural network $F$ and four fixed functions of $\sigma$: $c_{skip}, c_{out}, c_{in}\text{ and }c_{noise}$. With

$$
Y=X+\sigma \epsilon
$$
we have
$$
\hat X = D (Y, \sigma) := c_{skip}(\sigma)Y + c_{out}(\sigma) F \left[ c_{in}(\sigma) Y, c_{noise}(\sigma) \right]
$$

The four fixed functions of $\sigma$: $c_{skip}, c_{out}, c_{in}\text{ and }c_{noise}$ are chosen in such a way, that
 - the input of the $F$ network has unit variance: it fixes $c_{in}(\sigma)$ which is applied before the input is passed to $F$:

  $$
  c_{in}(\sigma) = \frac{1}{\sqrt{\sigma^2 + \sigma_{data}^2}}
  $$
- the output of the $F$ network has unit variance; thus, $c_{out}(\sigma)$, applied after the network output, is expressed in terms of $c_{skip}(\sigma)$:
   $$
    c_{out}(\sigma)^2 = \left(1- c_{skip}(\sigma) \right)^2 \sigma_{data}^2 + c_{skip}(\sigma)^2 \sigma^2
   $$

- then the $c_{skip}$ function is selected so it minimizes $c_{out}(\sigma)$ so the errors of $F$ get amplified as little as possible:

  $$
    c_{skip}(\sigma) = \frac{\sigma_{data}^2}{\sigma^2 + \sigma_{data}^2},
  $$
  and
  $$
  c_{out}(\sigma) = \frac{\sigma \cdot \sigma_{data}}{\sqrt{\sigma^2 + \sigma_{data}^2}}
  $$

- $c_{noise}$ is selected empirically to be $c_{noise} = \frac{1}{4}\ln{\sigma}$.

See the Appendix B.6 of the Supplementary material to the paper for the derivations.

When using the EDM code, you don't have direct access to $F$. You use the denoiser ($D$) directly.

The denoiser ($D$) gets implemented in `EDMPrecond` class. You can see $F$ and the four functions we discussed applied to the input and output of $F$ [in the `forward` method of `EDMPrecond` class](https://github.com/NVlabs/edm/blob/main/training/networks.py#L654).

From the code you'll also note, that the $c_{data}=0.5$. For most natural image datasets normalized to [-1, 1], you'll find the standard deviation to be close to 0.5, which is why it's used as the default. However, for other data types (audio, different normalization schemes, synthetic data), you might need to adjust this value based on your actual data statistics.

### Loss Function


Just for the completeness, EDM minimizes the loss computed in [the `EDMLoss` class](https://github.com/NVlabs/edm/blob/main/training/loss.py#L66) and expressed as

$$
 \mathbb{E}_{\sigma, X\sim data, \epsilon \sim N(0,1)} \left[ \lambda(\sigma) \cdot \| D(X+\sigma \epsilon, \sigma) - X\| _2^2\right],
$$
with the weight
$$
\lambda(\sigma) = \frac{\sigma ^ 2 + \sigma_{data}^ 2}{\sigma ^2 \cdot \sigma_{data}^2}
$$

You'll note that $\sigma$ plays the role of time coordinate in their formulation (recall, that $\alpha \equiv 1$).

The other parameter that must be chosen is the distribution of $\sigma$. The EDM autors choose the lognormal distribution with

$$
  \ln{\sigma} \sim N(\mu = -1.2, \sigma^2=1.2^2)
$$

With $\ln(\sigma) \sim \mathcal{N}(-1.2,\, 1.2^2)$:

**Median:**  
$$
\mathrm{med}(\sigma) = e^{-1.2} \approx 0.301
$$

**Mean:**  
$$
\mathbb{E}[\sigma] = e^{-1.2 + 1.2^2 / 2} = e^{-0.48} \approx 0.619
$$

The distribution covers roughly  
$$
\sigma \in [e^{-1.2 - 3(1.2)},\, e^{-1.2 + 3(1.2)}] = [e^{-4.8},\, e^{2.4}] \approx [0.008,\, 11.02]
$$
(i.e., within ±3 standard deviations in log-space).


Also, recall that $c_{\text{noise}} = \ln\sigma / 4$, so  
$$
c_{\text{noise}} \sim \mathcal{N}(-0.3,\, 0.3^2).
$$

### Pretrained Networks Provided

The list of pretrained models can be accessed [here](https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/) and it shows:

    {bash}
    LICENSE.txt	21359	10/30/2022, 8:07:15 AM
    _all_files.zip	2859235624	10/30/2022, 11:27:48 AM
    edm-afhqv2-64x64-uncond-ve.pkl	251340663	10/30/2022, 11:14:22 AM
    edm-afhqv2-64x64-uncond-vp.pkl	247513130	10/30/2022, 11:15:36 AM
    edm-cifar10-32x32-cond-ve.pkl	225848751	10/30/2022, 11:15:51 AM
    edm-cifar10-32x32-cond-vp.pkl	223183453	10/30/2022, 11:15:52 AM
    edm-cifar10-32x32-uncond-ve.pkl	225833012	10/30/2022, 11:15:52 AM
    edm-cifar10-32x32-uncond-vp.pkl	223173327	10/30/2022, 11:17:03 AM
    edm-ffhq-64x64-uncond-ve.pkl	251340661	10/30/2022, 11:17:11 AM
    edm-ffhq-64x64-uncond-vp.pkl	247513128	10/30/2022, 11:17:12 AM
    edm-imagenet-64x64-cond-adm.pkl	1183892888	10/30/2022, 11:17:12 AM


## Generating Images

The images are generated in the function [`edm_smapler(...)` in `generate.py`](https://github.com/NVlabs/edm/blob/main/generate.py#L25).

Below is the annotated version of the code, with comments explaining the purpose of each line to make it easier to follow.

```{Python}
def edm_sampler(
    net, latents, class_labels=None, randn_like=torch.randn_like,
    num_steps=18, sigma_min=0.002, sigma_max=80, rho=7,
    S_churn=0, S_min=0, S_max=float('inf'), S_noise=1,
):
    # Adjust noise levels based on what's supported by the network.
    # (Ensure sigmas stay within the range the network was trained for.)
    sigma_min = max(sigma_min, net.sigma_min)
    sigma_max = min(sigma_max, net.sigma_max)

    # Time step discretization.
    step_indices = torch.arange(num_steps, dtype=torch.float64, device=latents.device)
    # Create indices [0, ..., num_steps-1] on the same device; we use them to build the noise schedule.

    t_steps = (sigma_max ** (1 / rho) + step_indices / (num_steps - 1) * (sigma_min ** (1 / rho) - sigma_max ** (1 / rho))) ** rho
    # Karras (EDM) sigma schedule: linearly interpolate between sigma_max^(1/ρ) and sigma_min^(1/ρ), then raise to ρ.
    # Result: a monotone decreasing sequence from sigma_max to sigma_min, denser at small sigmas when ρ>1 (default: ρ=7).

    t_steps = torch.cat([net.round_sigma(t_steps), torch.zeros_like(t_steps[:1])]) # t_N = 0
    # Round each σ to the network’s supported grid (round_sigma) so preconditioning matches training.
    # Append an extra 0 so the list length is num_steps+1 and the final step lands at zero noise.

    # Main sampling loop.
    x_next = latents.to(torch.float64) * t_steps[0]
    # Initialize at the highest noise level: latents ~ N(0, I) -> x ~ N(0, sigma_max^2 I).

    for i, (t_cur, t_next) in enumerate(zip(t_steps[:-1], t_steps[1:])): # 0, ..., N-1
        x_cur = x_next

        # Increase noise temporarily.
        gamma = min(S_churn / num_steps, np.sqrt(2) - 1) if S_min <= t_cur <= S_max else 0
        # Decide how much extra noise (“churn”) to add this step; capped by √2-1, active only on [S_min, S_max].

        t_hat = net.round_sigma(t_cur + gamma * t_cur)
        # Compute the “churned” sigma: t_hat = (1 + gamma) * t_cur, then round to the supported σ grid.

        x_hat = x_cur + (t_hat ** 2 - t_cur ** 2).sqrt() * S_noise * randn_like(x_cur)
        # Add just enough Gaussian noise so the total variance grows from t_cur^2 to t_hat^2 (scaled by S_noise).

        # Euler step.
        denoised = net(x_hat, t_hat, class_labels).to(torch.float64)
        # In EDM, net(·, σ) returns an estimate of the clean image \hat X_0 (pre/post-scaling internal).
        # Use float64 for a bit more numerical stability.

        # For the EDM probability-flow ODE, dx/dσ = (x - X0)/σ. Replacing X0 by denoised gives this slope.
        d_cur = (x_hat - denoised) / t_hat

        x_next = x_hat + (t_next - t_hat) * d_cur
        # Explicit Euler update from σ = t_hat down to the next scheduled σ = t_next.

        # Apply 2nd order correction.
        if i < num_steps - 1:
            denoised = net(x_next, t_next, class_labels).to(torch.float64)
            d_prime = (x_next - denoised) / t_next
            # Heun correction (prediction–correction): re-evaluate slope at the end of the interval.

            x_next = x_hat + (t_next - t_hat) * (0.5 * d_cur + 0.5 * d_prime)
            # Trapezoidal rule: average of start/end slopes times step size, applied from x_hat.

    return x_next
```

The line
```{Python}
denoised = net(x_hat, t_hat, class_labels).to(torch.float64)
```
calls the denoiser ($D$) function which, under the hood, calls the $F$ network with the scaled input and scales the output, just like we have discussed a moment ago.

In case you want to change the generator this is the function you want to work with.


Now, let's check the number of function evaluations it makes:

- for iterations 1..17 it calls the network in both Euler step and Heun correction
- for the last iterations, it calls the network in the Euler step only.

It makes 35 network calls and this is the value you will find throughout the Karras paper (eg. in Table 2).











### Deterministic ODE Update in Euler Step

In the deterministic ODE settging, i.e. there is no churn, `x_hat==x_cur` and the EDM makes the following update:
```{Python}
d_cur = (x_cur - denoised) / t_hat  
x_next = x_cur + (t_next - t_hat) * d_cur
```
it can be further simplified as following:
```{Python}
x_next = x_cur + (t_next - t_hat) * (x_cur - denoised) / t_hat
```

$$
X_{next} = X_{cur} + (\sigma_{next} - \sigma_{cur}) * \frac{(X_{cur} - \hat X_{cur})}{\sigma_{cur}}
$$


$$
X_{next} = r X_{cur} + (1-r)\hat X_{cur},\quad\quad\text{with }
r=\frac{\sigma_{next}}{\sigma_{cur}}.
$$

Consider the standard discrete ancestral process with the denoiser parametrisation $\hat X_{cur} = \hat X_{cur}(X_{cur}) := D(X_{cur}, \sigma_{cur})$:
$$
X_{next} = \sigma_{next} \left( \sqrt{\rho_{next}} - \sqrt{\rho_{cur}} \sqrt{1 - \eta_{next}^2} \right) \hat X_{cur} + \frac{\sigma_{next}}{\sigma_{cur}} \sqrt{1 - \eta_{next}^2} X_{cur} + \sigma_{next} \eta_{next} \tilde\epsilon_{next},
$$
with (for all possible $t$ we consider):
- $\alpha_t$ the signal schedule,
- $\sigma_t$ the noise schedule,
- $\eta_t  \geq 0$ the diffusion schedule,
- SNR ratio $\rho_t = \frac{\alpha_t^2}{\sigma_t^2}$
- $\tilde \epsilon_t \sim N(0, I)$.

It will be introduced in the next class formally and with more deatil, today take it for granted.

If you set $\alpha_{cur}=\alpha_{next}=1$, $\eta \equiv 0$ (no diffusion, deterministic schedule), you will get exactly
the EDM Euler update (**check it!**)

$$
X_{next} = r X_{cur} + (1-r)\hat X_{cur},\quad\quad\text{with }
r=\frac{\sigma_{next}}{\sigma_{cur}}.
$$



### Executing Image Generation from the Container

To calculate FID reliably you need to generate 50000 images.

Execute the following SLURM script. Note there are **TWO** places where you specify the number of GPUs to be used:

    {bash}
    #!/bin/bash
    #SBATCH --job-name=edm_generation
    #SBATCH --nodes=1                     
    #SBATCH --ntasks=1
    #SBATCH --gres=gpu:1              <--------- HERE
    #SBATCH --time=48:00:00
    #SBATCH --account=g99-4302
    #SBATCH --output=generate-%j.out

    # Run generate.py inside the container using Apptainer and torchrun
    apptainer exec --nv --bind "$PWD" ./edm_latest.sif \
      torchrun --standalone --nproc_per_node=1 \   <------ AND HERE
        generate.py \
        --outdir=out_50k \
        --seeds=0-49999 \
        --batch=128 \
        --steps=18 \
        --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

Before running the final execution, it's a good idea to **calibrate the batch size** so that the **available GPU memory is fully utilized**, or at least used as efficiently as possible. In order to do that I would perform the short interactive run (beyond the scope of this class, read more on interactive running in [ICM help page on SLURM](https://kdm.icm.edu.pl/Tutorials/HPC-intro/slurm_intro/))

    {bash}
    srun -A g99-4302 -J edm_generation \
     -N 1 -n 1 --gres=gpu:1 \
     --time=48:00:00 \
     --output=generate-%j.out \
     --pty /bin/bash -l  # cluster: rysy

    # after I get allocated to the particular computing node
    apptainer exec --nv --bind "$PWD" ./edm_latest.sif \
      torchrun --standalone --nproc_per_node=1 \
        generate.py \
        --outdir=out_50k \
        --seeds=0-49999 \
        --batch=128 \
        --steps=18 \
        --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl &
      
    # (note the asterisk & which makes the job go to the background,
    #  so I can invoke nvidia-smi in the foreground)

    nvidia-smi   # it let's me monitor the GPU load

After the batch size has been established, you can put it into the SLURM script and execute the batch computation.

The internet may be unavailable in some clusters from the computation nodes (not the case in ICM). In this case you could download the network and indicate the local storage with the `--network` argument (eg. `--network=downloaded_networks/edm-cifar10-32x32-cond-vp.pkl`).

## Calculating FID

The **Fréchet Inception Distance (FID)** is a widely used metric for evaluating the quality of images generated by generative models such as GANs and diffusion models.

FID was introduced in the paper *GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium* by Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter, published in NeurIPS 2017 (~20k citations as of Oct 2025).


Its main idea is to measure how close the **distribution of generated images** is to the **distribution of real images**, but since comparing high-dimensional image distributions directly is intractable, FID does this comparison in a **feature space** learned by a pretrained neural network — typically the Inception-v3 network.

Specifically, both real and generated images are passed through the network, and the resulting feature vectors (usually from a late pooling layer) are modeled as **multivariate Gaussians** with estimated means and covariances $(\mu_r, \Sigma_r)$ and $(\mu_g, \Sigma_g)$. The FID is then computed as the **Fréchet distance** (or **2-Wasserstein distance**) between these two Gaussians:

$$
\mathrm{FID} = \|\mu_r - \mu_g\|_2^2 + \mathrm{Tr}\!\left(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2}\right)
$$

Intuitively, it measures both the **difference in mean features** (capturing overall content) and **differences in covariance** (capturing diversity and structure).




### Lower FID Is Better

Lower FID values indicate that generated images are more similar to real ones, both in quality and variety.  


### FID Calculation

To calculate FID, execute the following SLURM script `fid.slurm`. The calculation is relatively fast, so I don't even use `torchrun` for parallelization of GPU computation (request one GPU only) nor did I calibrate the batch size (the default is 64, it seems)

    {bash}
    #!/bin/bash
    #SBATCH --job-name=fid
    #SBATCH --nodes=1                     
    #SBATCH --ntasks=1
    #SBATCH --gres=gpu:1              
    #SBATCH --time=48:00:00
    #SBATCH --account=g99-4302
    #SBATCH --output=fid-%j.out

    # Run fid.py inside the container using Apptainer
    apptainer exec --nv --bind "$PWD" ./edm_latest.sif \
        python fid.py calc --images=out_50k \
            --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz

`https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz` contains precomputed Inception-V3 scores (feature statistics) for the original CIFAR-10 dataset.

The internet may be unavailable in some clusters during computations (not the case in ICM). Then you can download the statistics and download the Inception-V3 network itself. For the statistics, you can indicate the local storage with the `--ref` argument (eg. `--ref=fid/cifar10-32x32.npz`, however for the Inception-V3 network you will need to make a chirurgic change in the code in [this line](https://github.com/NVlabs/edm/blob/main/fid.py#L34).

    {bash}
    szymon@rysy ~/edm $ sbatch fid.slurm
    Submitted batch job 135531
    szymon@rysy ~/edm $ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            127248       gpu setup_en   ljanis PD       0:00      1 (AssocGrpCPUMinutesLimit)
            135531       gpu      fid   szymon  R       0:02      1 rysy-n3
            134837       gpu      OLA tomaszsu  R 13-07:04:02      1 rysy-n6
            135529       gpu sys/dash    mdzik  R      58:05      1 rysy-n3
            135515        ve     bash   herman  R 1-06:31:21      1 pbaran

    # After 1:30 minutes

    szymon@rysy ~/edm $ more fid-135531.out
    INFO:    underlay of /etc/localtime required more than 50 (94) bind mounts
    INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (471) bind mounts
    13:4: not a valid test operator: (
    13:4: not a valid test operator: 530.30.02
    Loading dataset reference statistics from "https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz"...
    Loading Inception-v3 model...
    Loading images from "out_50k"...
    /opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py:563: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Plea
    se be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      warnings.warn(_create_warning_msg(
    Calculating statistics for 50000 images...
     5%|▍         | 39/782 [00:09<00:58, 12.75batch/s]
    10%|█         | 79/782 [00:12<01:10,  9.95batch/s]
    15%|█▍        | 117/782 [00:15<00:51, 12.88batch/s]
    20%|█▉        | 155/782 [00:18<00:48, 13.01batch/s]
    25%|██▍       | 193/782 [00:21<00:45, 12.89batch/s]
    29%|██▉       | 229/782 [00:24<00:42, 12.64batch/s]
    34%|███▍      | 265/782 [00:27<00:40, 12.61batch/s]
    38%|███▊      | 301/782 [00:30<00:37, 12.88batch/s]
    43%|████▎     | 335/782 [00:32<00:34, 12.93batch/s]
    47%|████▋     | 369/782 [00:35<00:31, 12.92batch/s]
    52%|█████▏    | 403/782 [00:38<00:29, 12.99batch/s]
    56%|█████▌    | 437/782 [00:40<00:26, 13.14batch/s]
    60%|██████    | 471/782 [00:43<00:24, 12.95batch/s]
    65%|██████▍   | 503/782 [00:45<00:21, 13.03batch/s]
    69%|██████▊   | 535/782 [00:48<00:19, 12.88batch/s]
    73%|███████▎  | 567/782 [00:50<00:16, 12.95batch/s]
    77%|███████▋  | 599/782 [00:53<00:14, 12.94batch/s]
    81%|████████  | 631/782 [00:55<00:11, 12.94batch/s]
    85%|████████▍ | 661/782 [00:58<00:09, 12.94batch/s]
    89%|████████▊ | 691/782 [01:00<00:07, 12.93batch/s]
    92%|█████████▏| 723/782 [01:02<00:04, 12.83batch/s]
    96%|█████████▌| 751/782 [01:05<00:02, 12.88batch/s]
    100%|█████████▉| 781/782 [01:07<00:00, 13.35batch/s]
    ▒▒████| 782/782 [01:12<00:00, 10.73batch/s]
    Calculating FID...
    1.96999

