# An Example

Start with an example:

Loading the model's state dictionary and inspecting the layers with the **pre-trained model** for case 4:

In [4]:
import numpy as np
import torch

checkpoint_path = "/Users/minghan/Library/CloudStorage/GoogleDrive-john.chuicandoit@gmail.com/My Drive/CoNFiLD/ConditionalDiffusionGeneration/inference_scripts/Case4/random_sensor/input/cnf_model/checkpoint_20000.pt"
checkpoint = torch.load(checkpoint_path, map_location="cpu")

for key in checkpoint.keys():
    print(key)  # Look for something related to 'channel_mult'

# Extract model state dict
state_dict = checkpoint["model_state_dict"]

# Print all layer names
print("Model layers:", state_dict.keys())

# Look for the final convolution layer (it usually starts with 'out' or 'conv_out')
for key, value in state_dict.items():
    print(f"Layer: {key}, Shape: {value.shape}")

# Find the last layer
last_layer = list(state_dict.keys())[-1]
print(f"Final layer: {last_layer}, Shape: {state_dict[last_layer].shape}")

epoch
model_state_dict
optim_states_dict
optim_net_dec_dict
scheduler_states_dict
scheduler_net_dec_dict
hidden_states_model
loss
log_epoches
eval_losses
psnrs
Model layers: odict_keys(['net1.0.weight', 'net1.0.bias', 'net1.1.weight', 'net1.1.bias', 'net1.2.weight', 'net1.2.bias', 'net1.3.weight', 'net1.3.bias', 'net1.4.weight', 'net1.4.bias', 'net1.5.weight', 'net1.5.bias', 'net1.6.weight', 'net1.6.bias', 'net1.7.weight', 'net1.7.bias', 'net1.8.weight', 'net1.8.bias', 'net1.9.weight', 'net1.9.bias', 'net1.10.weight', 'net1.10.bias', 'net1.11.weight', 'net1.11.bias', 'net1.12.weight', 'net1.12.bias', 'net1.13.weight', 'net1.13.bias', 'net1.14.weight', 'net1.14.bias', 'net1.15.weight', 'net1.15.bias', 'net1.16.weight', 'net1.16.bias', 'net2.0.weight', 'net2.1.weight', 'net2.2.weight', 'net2.3.weight', 'net2.4.weight', 'net2.5.weight', 'net2.6.weight', 'net2.7.weight', 'net2.8.weight', 'net2.9.weight', 'net2.10.weight', 'net2.11.weight', 'net2.12.weight', 'net2.13.weight', 'net2.14.weigh

  checkpoint = torch.load(checkpoint_path, map_location="cpu")


The `torch.Size([384, 384])` in the final layer of your CNF model suggests that the model’s final operation is not directly in **physical space** but instead in **latent space**. 

## Physical Space vs Latent Space

The reason why `[384, 384]` appears in latent space rather than `[xx, xx, xx, xx]` (which is typical for physical space) is because the structure of data changes when **transitioning from physical space to latent space**.

| **Space**               | **Typical Shape**                                              | **What it represents?**                                       |
|--------------------------|----------------------------------------------------------------------|----------------------------------------------------------------|
| **Physical Space (CFD Grid)**              |`[Nx, Ny, Nz, Channels]` (e.g., `[384, 384, 3]`)           | Velocity, pressure, etc. defined on a physical grid    |
| **Latent Space**        | `[Feature_Dim1, Feature_Dim2]` (e.g., `[384, 384]`)                | Encoded feature representation, not directly spatial|

$\underline{Physical Space}$

In physical space, data is typically structured based on spatial dimensions:
- 3D Flow Example -> `[Nx, Ny, Nz, Channels]`
- `Nx, Ny, Nz` = grid resolution in the `x, y, z` directions.
- Channels = flow quantities (e.g., velocity components  `u, v, w`, pressure  `p` ).
- Example: `[384, 384, 3]` could mean a 2D flow with 3 velocity components (`u, v, w`) at each point.

2D Flow Example -> `[Nx, Ny, Channels]`
- Example: `[384, 384, 2]` means a `2D` periodic hill case, where only  `u, v`  are considered.

**Physical space is explicitly tied to spatial locations.**

$\underline{\textbf{Latent Space}}$

When transitioning to **latent space**, we move from **explicit spatial values** to **compressed feature representations**:

- The numbers `384 × 384` in latent space no longer refer to physical grid points.
- Instead, they represent **abstract features** that describe the **flow statistically**.
- The model learns to **encode complex flow interactions into these feature vectors**.

Although the same shape in the latent space, e.g., `[384, 384]`, as that in the physical space is used for this specific case,  
- The specific size depends on how the CNF encoder structures the data.
- Some latent spaces compress more aggressively (e.g., [256, 256]).
- Some keep similar dimensions but store different types of information (e.g., [384, 384] is not a spatial map but **a learned representation of flow dynamics**).

**Latent space is not tied to physical locations but instead captures compressed relationships between flow features.**

$\underline{\textbf{In short}}$

- Physical space follows grid-based discretization -> `[Nx, Ny, Nz, Channels]`
- Latent space follows abstract feature encoding -> `[Feature_Dim1, Feature_Dim2]`
- The **diffusion model learns in latent space**, then decodes back to **physical space**.

## Encoder (CNF)

The **Conditional Neural Field (CNF)** acts as a **feature extractor** that **optimizes the data** before feeding it into the **diffusion model**. This helps the diffusion model focus on learning the most **essential flow patterns** rather than being overwhelmed by **redundant information**.

Encoder like CNF is important for diffusion models because:

1. Filters out redundancy:
- The raw DNS flow fields contain tons of correlated data points that don’t all contribute independently to the physics.
- The CNF compresses the input so that only the **most relevant turbulence structures remain**.

2. Reduces dimensionality (compression):
- Instead of storing every single grid point and every minor fluctuation, the CNF encodes the global flow structure in a more compact way.
- This prevents the diffusion model from learning unnecessary details that don’t generalize well.

3. Extracts important patterns:
- Instead of dealing with raw velocity values per grid point, the CNF finds latent features that summarize the flow structure.
- These latent features are **much easier for the diffusion model to process than raw CFD data**.

4. Speeds up learning:
- Without encoding, the diffusion model would take much longer to converge because it would need to learn both the **essential structures and the redundant noise**.
- With CNF, the model already gets **“pre-processed” flow structures**, making training much more efficient.

$\underline{\textbf{Impact of choice of different Encoder}}$

The **choice of encoder is very important because different encoders extract different types of information from the raw data.**

1. Good Encoder (like CNF) -> Captures the correct flow features
- If the encoder is well-designed, it extracts **meaningful flow structures** that help the diffusion model focus on learning useful patterns.
- Example: CNF is trained to preserve the **spatial-temporal coherence of flow features**, which is critical for turbulence modeling.

2. Bad Encoder (Poorly Chosen) -> Might **lose important physics**
- If an encoder **removes too much information** or **doesn’t retain the right flow structures**, the diffusion model might miss important physics.
- Example: If an encoder **is too aggressive in compression**, it might **discard small-scale vortices** that are important in turbulence modeling.

3. Different Encoders Extract Different Features
- Some encoders might focus more on **spatial structure** (good for smooth flows).
- Others might focus more on **temporal evolution** (better for **unsteady turbulence**).
- **The best choice depends on what kind of flow you are modeling**.

### Summary Table

| Type of Encoder | How it works | Good for flow fields |
|:--------:|:--------:|:--------:|
|  CNF (Neural Field Encoder)   |  Uses coordinate-based representations to encode continuous flow features   |  Yes, because it preserves physics & allows flexible reconstructions |
| Autoencoder (AE)   | Compresses spatial data into a bottleneck latent space  |  Sometimes, but might lose fine turbulence details  |
|  Fourier-based Encoders   | Uses spectral methods to capture periodic structures  |    Yes, if dealing with periodic turbulence (like Kolmogorov turbulence)  |
|  CNN (Convolutional Neural Network) Encoder  |  Extracts spatial patterns using convolution filters |  Yes, but struggles with irregular geometries  |
|  PCA (Principal Component Analysis) Encoder   |  Uses linear transformations to reduce dimensionality  |   Too simple for complex turbulence  |

**Different encoders extract different features, and the choice of encoder impacts what the diffusion model learns.**
