# A fundamental distinction between stochastic flow realizations, statistical similarity, and deterministic constraints in conditional generation

## DNS

DNS Runs (Multiple Simulations with Same Conditions):
- If you run DNS multiple times with the **same initial and boundary conditions**, the flow fields should be **almost identical** at each time step due to the **deterministic nature** of the **Navier-Stokes equations**.

## Unconditional Generation vs. DNS
- Unconditional Generation: When generating flow fields **without any sensor constraints**, the realizations may **differ from specific instantaneous DNS flow fields**, but they can **still match DNS statistics** (e.g., mean, variance, spectra).

## Stochastic vs. Statistical Similarity
looking at randomness in generated flow fields, we have different levels of stochasticity and constraints:

a. Fully Random (No Statistical Similarity)
- If you randomly generate flow fields without conditioning on DNS data, they could have any arbitrary structure and may not match DNS statistics.
- This would be like generating random noise—it does not resemble DNS in any meaningful way.

b. Stochastic but Retains Statistical Similarity (Unconditional Generation)
- The generated flow fields do not match any specific DNS **snapshot** at a given time step, but **their statistical properties** (mean, variance, correlation lengths, energy spectra) match DNS distributions.
- This is what happens in unconditional generation using diffusion models.
- Each realization is different from an instantaneous DNS field but still looks “DNS-like” on average.

c. Constrained Stochasticity (Conditional Generation with Sensors)
- Here, the generated flow fields match both the statistics and certain instantaneous flow features due to sensor constraints.
- The stochasticity is now restricted by the measurement inputs, meaning the generated flow fields:
1. Retain the correct statistical properties (mean, variance, spectra) like in unconditional generation.
2. Conform to specific observed features (sensor measurements, ensuring the flow is more DNS-like at particular locations).
3. Still have some randomness in the unmeasured regions but are physically plausible due to the learned generative model.
- This is like a **stochastic reconstruction problem**, where the model predicts missing details while ensuring global consistency.

## Conditional generation
The regions outside sensor locations start with a **high level of stochasticity**, but as conditional inference progresses, the **influence of sensor-constrained information propagates**, gradually steering the generated flow fields **toward DNS-like behavior**.

1. Initial Stochasticity in Unobserved Regions
- At the start, the regions without sensor data **retain the randomness** dictated by the generative model.
- Since the model has been trained to capture DNS statistics, these unobserved regions already look physically plausible but remain stochastic.

2. Gradual Influence of Sensor Constraints
- Through conditioning, the sensor-constrained regions act as “anchors,” forcing the generated fields to be consistent with real DNS data.
- Due to spatial correlations in turbulence, information from sensors propagates through the domain via the **generative model’s learned structures**.
- This **gradually forces randomness** in the unobserved regions, aligning them more with DNS.

3. How Stochasticity Evolves
- Near Sensors: The flow quickly conforms to DNS because the direct observations constrain the generative process.
- Far from Sensors: Initially, the flow retains more randomness, but with each inference step (especially in diffusion-based models), the constraints push the field closer to DNS.
- Large-Scale vs. Small-Scale Structures: **Large coherent structures are influenced faster, while smaller turbulent details take longer to align due to their chaotic nature.**

4. Analogy: Diffusing Information Like a PDE

Think of it like solving an inverse problem:
- The generative model starts from an unconstrained state.
- Sensor constraints act as boundary conditions.
- As inference progresses, the unobserved regions are “corrected” over time, similar to how diffusion propagates corrections in a PDE.

5. Final Outcome
- In highly constrained cases, the entire field closely matches DNS.
- In weakly constrained cases, some stochasticity persists in unobserved regions, but statistical consistency with DNS remains.
