# Machine learning (ML) methods integrated with CFD

| **Method**                     | **How It Enforces Physical Constraints?**                                                                 | **Solves Model Equations?**                                   | **Strength**                                                                 |
|----------------------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------|--------------------------------------------------------------------------|
| **PPNN**                       | Embeds physics into the neural network structure via **residual-based correction** and **temporal rollout consistency**. | No, but **enforces PDE consistency through architecture and training.** | Reduces long-term en simulations.                                        |
| **Kalman Filter (DAFI, EnKF, etc.)** | Directly solves the model equations alongside updating state estimates using real-time data.          | **Yes, explicitly solves PDEs** at each step.                | Ensures dynamically consistent corrections based on observations.                              |
| **PiML (Physics-Informed ML)** | Incorporates **physical residuals** in the loss function, penalizing violations of governing equations.    | No, **only enforces constraints implicitly** via loss function minimization. | Works even when training data is limited, but may not fully respect physics in complex flows.                        |

$\underline{\textbf{PPNN vs. Kalman Filter}}$

## PPNN (Physics-Preserved Neural Networks)

### Does NOT solve PDEs explicitly but modifies the NN framework to ensure physics-consistent predictions.

It focuses on the **learning architecture design** for improving the robustness, stability, and generalizability of data-driven next-step predicting models, which commonly suffer from considerable **error accumulations** due to the auto-regressive formulation and fails to operate in a long-span model rollout. 

In PPNN, the **convolutional neural network (CNN)** layers are used to represent **finite difference (FD) stencils**, effectively embedding PDE operators directly into the network.

A **finite difference stencil** is a pattern of points used to approximate derivatives in numerical methods. For example, a central difference stencil for the first derivative is:


$$\frac{\partial u}{\partial x} \approx \frac{u_{i+1} - u_{i-1}}{2\Delta x}$$


For a second derivative:


$$\frac{\partial^2 u}{\partial x^2} \approx \frac{u_{i+1} - 2u_i + u_{i-1}}{\Delta x^2}$$


In traditional CFD solvers, these stencils are **explicitly applied in the numerical scheme**. Note the **traditional CFD solvers**, such as **finite difference (FDM), finite volume (FVM), or finite element (FEM)** methods, are **NOT fully differentiable** mainly because of:

1. Discrete Iterative Solving (Non-Differentiable Operations)

	- Traditional solvers construct **a large system of nonlinear equations from the Navier-Stokes (NS) equations**.
	- They then solve this system **iteratively** (e.g., using Newton’s method, multigrid solvers, or Krylov subspace methods).
	- The solving step (e.g., LU decomposition, iterative solvers like GMRES, or non-smooth algebraic operations) is **not differentiable in the machine learning sense**.
    
2. Implicit Time Integration (Non-Differentiable Updates)

	- Many solvers use **implicit schemes** (e.g., backward Euler, Crank-Nicholson) requiring **matrix inversion** or linear/nonlinear system solutions at each time step.
	- These numerical solvers involve **discrete updates** that break **automatic differentiation paths**.
    
3. Mesh Adaptation and Discretization Errors

	- Traditional CFD uses **adaptive meshes, turbulence models, and artificial dissipation techniques**. These introduce discontinuous operations (e.g., upwinding, limiter functions) that further prevent differentiation.

| Feature | Traditional CFD solvers | CNN-Preserved Stencils (PPNN) |
|:--------:|:--------:|:--------:|
|  Solves a large system of equations?   |  Yes, solves a global matrix equation at each time step    |  No, uses local operations without forming a global system  |
| Iterative solution required?   | Yes, requires iterative methods (e.g., Newton-Raphson, GMRES)  |  No, CNN operates locally without solving a full system  |
|  Differentiability   |  Not fully differentiable due to iterative solvers  |   Fully differentiable since CNN layers are composed of smooth functions  |
|  Automatic Differentiation Compatible?  |  No, due to matrix solvers and non-smooth steps  |  Yes, because CNN-based FD operations are differentiable   |
|  Computational Cost   |  High due to large system solving  |   Lower since no global matrix inversion is required   |


On the other hand, A **CNN-preserved FD solver** does not explicitly solve a PDE system in the way traditional CFD solvers do. Instead, it represents **PDE differential operators** using **convolutional neural networks (CNNs)**.

1. Encoding PDE Operators as CNN Convolutions
	- CNN filters (kernels) mimic finite difference stencils (e.g., second-order central differences for Laplacian terms).
	- These filters are fixed (not trainable) and preserve PDE structure during training.
	- Instead of solving for the unknowns by **inverting a large matrix**, CNN layers apply **local stencil** operations.

2. Solving PDEs Without a Global System Matrix
	- Traditional CFD solvers solve  $A u = b$ , where $A$ is a large sparse matrix derived from discretized NS equations.
	- CNN-based solvers approximate the **same differential operators using convolutional layers**.
	- Instead of **matrix inversion**, CNNs process local patches, avoiding a **global system of equations**.


$\underline{\textbf{A global system of equations}}$

Note **A global system of equations** refers to the large set of algebraic equations that arise when **discretizing a partial differential equation (PDE)** over an entire computational domain in traditional numerical methods like finite difference (FDM), finite volume (FVM), or finite element methods (FEM). **These equations must be solved simultaneously** to obtain the solution for all points in the domain.

Consider a PDE, such as the Poisson equation:

$$\nabla^2 u = f$$

When discretized using **finite difference methods** (e.g., central difference), this equation turns into a linear system:

$$A u = b$$

where:

1. $A$  is a large sparse matrix representing the finite difference discretization of the PDE.
2. $u$  is the vector of unknowns at all grid points.
3. $b$  is the known forcing term (e.g., boundary conditions, source terms).

To solve for  $u$ , the entire system of equations **must be solved together**, typically using:
	- Direct solvers (LU decomposition, Gauss elimination)
	- Iterative solvers (Jacobi, Gauss-Seidel, Multigrid, Krylov-based methods like GMRES)

$\textbf{It is considered "global" because:}$

Each equation at a point depends **on its neighbors**, leading to **a large coupled system**.
	- The solution for **one point depends on the entire domain**, meaning **a change at one location propagates globally**.
	- **Traditional solvers** need to solve the **entire matrix equation at each time step**, making them **computationally expensive**.

$\underline{\textbf{How CNN-Based Finite Difference (PPNN) Avoids a Global System}}$

**Important concepts in CNN:** CNN is a **specialized neural network architecture** designed for **spatial data processing**, and it can be used within **other architectures**, such as **Autoencoders and Transformers**.CNN convolution operations consist of:

1. **Kernels (Filters)** – Small matrices that slide over input data to extract features.
2. **Feature Encoding** – The process of transforming input data into a hierarchical representation using learned features.
3. **Convolutional Layers** – Layers where kernels are applied to extract spatial patterns.
    
Clarifications on CNN and Neural Network (NN) Architectures:

- CNN is a **neural network architecture, not just a method**. It defines a specific way of processing data using **convolutional layers, pooling layers, and fully connected layers**.
-  CNN can be used as **a component within another neural network architecture** (e.g., **Autoencoders and Transformers**). For example, CNNs can serve as an **encoder in an Autoencoder**.
-  **CNNs are customizable** – You can modify components like kernel size, number of layers, activation functions, etc.
  
**Note: The convolution operation only deals with convolution layers. Pooling (1. Max pooling: Takes the max value in a region. 2. Average pooling: Computes the average of a regioReduces spatial dimensions of feature maps while preserving key information. Helps reduce computational cost and increase translational invariance.) and fully connected (Converts extracted spatial features into a final classification or regression output. Each neuron is connected to every neuron in the previous layer.) layers (happens after convolution operation) are separate mechanisms that work together with convolution layers in CNN architectures.**   

CNN-based approaches do NOT construct and solve a global matrix equation. Instead:
	- **CNN kernels (filters)** represent **finite difference stencils** locally through **CNN convoutions**.
	- Instead of forming and solving  $A u = b$ , CNN layers update the solution **pointwise** using local differential operations.
	- The solution evolves **step-by-step** without requiring the **inversion of a large matrix**.

CNNs apply **convolution filters (kernels)** to extract **spatial features from input data**. These filters operate like finite difference stencils by performing **local weighted sums of neighboring values**. There is a distinction between CNN in a traditional sense and the CNN in **PPNN** architecture:

**PPNN** primarily needs **filtering** rather than **traditional encoding**! Here’s why:

Key Differences Between Filtering and Encoding in PPNN

1. Filtering (Convolution with Fixed Kernels for PDEs)
    - PPNN represents PDE differentials using CNN kernels, which act as **finite difference stencils.**
	- This step is similar to applying a discretized differential operator (e.g., ∂u/∂x via finite differences).
	- The CNN filters operate directly on the numerical grid to approximate derivatives, preserving PDE structures.

These filters remain fixed and do not require encoding.

- Encoding (Transforming Features into a Compressed Representation)
- In typical CNNs, encoding helps convert raw spatial data into a lower-dimensional, high-level feature space.
- However, PPNN does not need traditional encoding because it doesn’t aim to extract abstract features but rather enforce PDE constraints directly.


A **CNN kernel** (filter) can represent finite difference operations directly. Approximating  $\frac{\partial^2 u}{\partial x^2}$ can be:

$$W = \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}$$

This **CNN kernel** resembles the **Laplacian operator** used in solving **diffusion equations**. Instead of being learned, these **filters are FIXED (predefined based on numerical differentiation rules)**. 

$\underline{\textbf{CNNs are naturally suited for processing structured grid data}}$

CNNs are **naturally suited for processing structured grid data (like CFD grids)**. Instead of explicitly coding FD stencils in a separate solver, we embed them directly in the neural network.

CNN Convolutions $\approx$ Finite Difference Operations. CNNs **slide over the computational domain**, applying the same stencil everywhere, ensuring local PDE constraints are enforced.

Unlike traditional CNNs that learn arbitrary filters, in **PPNN**, the weights of **PDE-preserving layers are fixed to match FD formulas.**


$\underline{\textbf{Why CNN is less expensive}}$

CNN-based PDE solvers (e.g., PPNN) also solve **all points in the computational domain**, not just **collocation points**. The CNN acts as a **local solver**, applying finite difference stencils as **convolution filters** at every grid point, enabling the solution to be computed **across the entire domain**.
	
Unlike **Physics-Informed Neural Networks (PINNs)**, which **enforce physics at discrete collocation points**, CNN-based PDE solvers operate on **structured grids** and perform **pointwise updates at all grid locations**.

The convolution operation in CNNs **sweeps through the entire computational domain**, applying the same set of **filters at each point**, making it similar to traditional finite difference methods. While, **solving locally** with CNN-based PDE solvers is **often faster** than solving a global system for two key reasons:

$\textbf{Avoiding Global Matrix Inversion}$

a. Traditional CFD solvers construct a large sparse matrix from discretized PDEs and solve it globally using iterative solvers (e.g., GMRES, Multigrid).
	
b. CNN-based solvers update the solution locally, using **only neighboring grid points** (like finite difference stencils), avoiding the need to **solve a coupled system of equations**. (No matrix inversion involved which requires tremendous computational time)

$\textbf{Efficient Parallel Computation with CNNs}$

a. CNN-based approaches use **convolution layers**, which are highly **parallelizable on GPUs**.

b. Traditional solvers involve sequential iterations (e.g., Jacobi, Gauss-Seidel), **whereas CNNs update all points simultaneously using batch computations.**
    
**This makes CNN-based methods more differentiable and computationally efficient, since they avoid the expensive global system-solving step in traditional CFD solvers.**
  

### Residual connection

Residual connections (as in **ResNet**) **skip one or more layers** by directly passing information from an earlier layer to a later layer, **bypassing intermediate transformations.** Instead of completely relying on deep layers to learn the mapping from input to output, the network learns only the **necessary corrections** while **preserving important information from previous layers.** 


In **traditional neural networks (NNs)**, each layer produces an output that serves as input to the next layer, the network **does not explicitly compute or track residuals between layers.** In residual networks, explicit residuals allow networks to learn corrections **instead of full transformations**, improving training stability. PPNN extends this idea by using a physics-based model as a residual component, enforcing PDE constraints while **learning unknown corrections**.

In deep learning, a residual connection (as in **ResNet**) refers to a direct shortcut that connects layers, allowing information to bypass some parts of the network.

$\textbf{An example showing idea of ResNet}$

**Residual networks (ResNets)** introduce **explicit residuals via skip connections**, which allow deeper networks to be trained more effectively.

1. Residual Formulation:

- Instead of learning a direct mapping  H(x) , ResNets learn the residual function  F(x) :

$$H(x) = x + F(x)$$

- Here,  x  is the original input, and  F(x)  is the learned transformation.
- This means that later layers receive both the modified information  F(x)  and the original input  x .

2. How Residuals Are Used:

- The network only learns the difference (correction) instead of the full transformation.
- This helps **mitigate vanishing gradients** because the **identity mapping** allows gradients to propagate more easily.
- It enables training of very deep networks, as information is preserved across layers.

#### Identity Mapping

**Identity mapping** in neural networks refers to a **transformation** where the **input remains unchanged** as it passes through a layer. Mathematically, it is represented as:

$$H(x) = x$$

where:
- $x$  is the input,
- $H(x)$  is the output,
- The function simply returns the input without applying any transformation.

This concept is crucial in residual connections used in deep networks like ResNet.

In the example $H(x) = x + F(x)$, the role of **physics-preserving** and **trainable correction** are:  

$x$ : **Physics-Preserving Part**

- This term represents the **hardcoded physics-based relationships**, meaning it carries **prior knowledge** from a PDE-based model (e.g., finite difference stencils).
- It ensures that the network retains **fundamental physical consistency** across layers.
- In a traditional **ResNet**, $x$  would be the **identity input**, but in **PPNN**, it corresponds to the **physics-driven solution component** (CNN-presented FD operators).

$F(x)$: **Trainable Correction (Residual Learning Part)**

- This term represents the part that the neural network learns from data to correct discrepancies in the physics model.
- It accounts for missing physics (e.g., turbulence closure models, empirical adjustments, or subgrid-scale effects).
- The network only learns the necessary corrections, reducing computational burden and improving generalization.




$\textbf{Skipping layers is beneficial (avoids the Vanishing Gradient Problem)}$

In deep networks, **gradients can become extremely small (vanishing gradient problem)**, **making it hard to update early layers**. **Residual connections** allow gradients to flow directly through the **skip path**, helping maintain meaningful updates across all layers. Since each layer only needs to learn a **small correction** instead of a full transformation, the **network converges faster**. Without skipping layers, it gets harder to optimize because each layer must transform its input entirely from scratch. **Skipping layers** forces the network to rely on **both deep transformations and shallow features**, **preventing over-reliance on any single depth level**. promotes generalization, especially when training data is limited.



In **PPNN**, the **physics-preserving part** plays a role similar to **residual connections** by effectively “skipping” layers because the weights in this part are fixed to enforce the physical constraints (i.e., the PDE structure). This means that instead of requiring every layer to learn from scratch, this component preserves **physics-based relationships across layers**, allowing the network to focus on **learning only the unknown corrections**, which is the **trainable network**. 

**PPNN** features a **residual connection** which consists of **two parts:** a **trainable network** and a **PDE preserving network** (where the right hand side of the governing PDE, discretized on **finite difference grid** (e.g., second-order finite difference for the Laplacian), is represented by a **convolution neural network (CNN)**. The weights of the PDE preserved convolutional residual component are determined by the **discretization scheme and remain constant during training**.)

Note the second part (A PDE-preserving residual connection) that represents the PDE is NOT **trainable**. This enforces that the PPNN automatically **satisfies the underlying physics while training**. (In **PIML**, the PDE residual is only enforced **weakly** by penalizing violations in the loss function.)

### Comparison between PPNN and DNN

In **PPNN**, the PDE is directly embedded into the model's architecture, making the predictions naturally follow the PDE without needing explicit penalties. The **fundamental idea behind the PDE-preserving method** is that **deep neural networks (DNNs) can be embedded within a differentiable PDE solver**, ensuring that the learned representations **adhere to the governing physics** rather than merely approximating the data. Instead of treating DNN as **black-box** approximators, they are integrated with a **differentiable PDE solver**. This allows the DNN to **directly interact with the PDE discretization scheme**, ensuring physics consistency.

- The PDE operators (e.g., $\nabla^2u$, $\nabla \cdot u$) are represented as **differentiable layers** in the neural network (CNN learns to approximate PDE differential terms, i.e., spatial derivatives). Then, **automatic differentiation** enables gradients to flow through the PDE discretization, allowing **end-to-end learning** constrained by physics (These CNN-encoded PDEs are integrated into a differentiable framework using finite difference method). The **PPNN method updates both the learned correction and the PDE-preserving term**, improving stability and accuracy.
- In comparison with PINN (Physics-Informed Neural Networks) that imposes physics via **penalizing residuals in the loss function**. However, this doesn't guarantee strict enforcement of the PDE structure.
- **PDE-Preserving Networks (PPNNs)** explicitly enforce PDE constraints within the **architecture by formulating part of the network as a differentiable PDE solver**.
- PPNN leads to better **stability, generalization, and physical accuracy**, especially for long-term time-dependent problems.

$\underline{\textbf{Mathematical Formulation of PDE-Preserving Networks}}$

A general PDE can be written as:

$$
\mathcal{N}(u) = f
$$

where:

- $u$ is the solution field (e.g., velocity, temperature).
- $\mathcal{N}$ is a differential operator (e.g., Navier-Stokes equations).
- $f$ is a forcing term.

**In PDE-Preserving Networks (PPNN)**:

-  The neural network **learns a correction term** or **solution mapping**, while the PDE structure is **hardcoded** into a part of the network.

- The network consists of:
    1. Trainable DNN $\mathcal{F_\theta}$ that predicts unknowns.
    2. Fixed convolutional layers (CNNs) representing discretized PDE operators (e.g., finite difference stencils for derivatives).
    3. Residual connection enforcing the PDE constraint:
    $$
u^{(t+1)} = u^{(t)} + \mathcal{F}_\theta(u^{(t)}) + \mathcal{N}(u^{(t)})$$

here, $ \mathcal{F}_\theta(u^{(t)}) $ learns deviations from the PDE solution. The fixed **CNN layer** represents $\mathcal{N(u)}$, ensuring physics consistency.


