# Phase 0 : Introduction, Overview and the Overall Implementation

## The Paper: Evaluation of Machine Learning Algorithms for Predictive Reynolds Stress Transport Modeling - J.P Panda & H.V Warrior (Department of Ocean Engineering and Naval Architecture Indian Institute of Technology, Kharagpur)

---

# 1. Introduction: The Turbulence Modeling Bottleneck
### Computational Fluid Dynamics (CFD) utilizes numerical techniques to model and solve problems involving fluid flows. While CFD has become indispensable in modern engineering, from designing F1 cars to predicting weather patterns, its success is largely dependent on our ability to accurately model turbulence.

### Turbulence is characterized by chaotic changes in pressure and flow velocity. It spans a vast range of spatial and temporal scales, making direct simulation of every eddy computationally impossible for most engineering problems.

## 1.1 The Hierarchy of Turbulence Simulation
### To manage this complexity, engineers rely on a hierarchy of modeling approaches, trading fidelity for computational cost:
#### 1. Direct Numerical Simulation (DNS): Resolves all scales of motion down to the Kolmogorov micro-scales. It is exact but computationally prohibitive for high Reynolds numbers.
#### 2. Large Eddy Simulation (LES): Resolves large energy-containing eddies and models the smaller sub-grid scales. It is accurate but still too expensive for many complex industrial flows.
#### 3. Reynolds Averaged Navier-Stokes (RANS): Models all turbulent fluctuations, solving only for the mean flow fields. This is the industry workhorse due to its low computational cost.
### The Problem: The most common RANS models (like $k-\epsilon$ or $k-\omega$) are based on the Eddy Viscosity Hypothesis (EVM). They assume turbulence behaves like a viscous fluid, where turbulent stress is linearly proportional to the mean strain rate.

## 1.2 Limitations of Eddy Viscosity Models (EVM)
### While simple, Eddy Viscosity Models fail in complex scenarios because they assume turbulence is isotropic (the same in all directions). This assumption breaks down in flows with:
#### - Streamline Curvature: e.g., flow inside a cyclone or U-bend.
#### - Flow Separation: e.g., stalling airfoils.
#### - System Rotation: e.g., turbomachinery or geophysical flows.
#### - Secondary Flows: Flows driven by stress anisotropy, such as in square ducts.
### To capture these phenomena, we must move beyond simple viscosity models to Reynolds Stress Transport Models (RSTM).

---

# 2. Reynolds Stress Transport Modeling (RSTM)
## 2.1 The Concept
### Unlike EVMs, which guess the turbulent stress ($\overline{u_i u_j}$) using a scalar viscosity, RSTM solves a distinct transport equation for every independent component of the Reynolds stress tensor.
### This means RSTM explicitly computes how turbulent fluctuations transport momentum in different directions. It naturally accounts for:
#### - Directional effects of Reynolds stresses (Anisotropy).
#### - Effects of flow stratification and buoyancy.
#### - The "return to isotropy" in decaying turbulence.

## 2.2 The Reynolds Stress Transport Equation
### The evolution of the Reynolds stress tensor is governed by the following transport equation:
### $$\underbrace{\frac{\partial \overline{u_i u_j}}{\partial t} + U_k \frac{\partial \overline{u_i u_j}}{\partial x_k}}_{\text{Convection}} = \underbrace{P_{ij}}_{\text{Production}} + \underbrace{\phi_{ij}}_{\text{Pressure-Strain}} - \underbrace{\epsilon_{ij}}_{\text{Dissipation}} - \underbrace{\frac{\partial T_{ijk}}{\partial x_k}}_{\text{Diffusive Transport}}$$
### Where the key terms are:
#### - Production ($P_{ij}$): The generation of turbulence by mean velocity gradients. This term is exact and requires no modeling.
#### $$P_{ij} = -\overline{u_k u_j}\frac{\partial U_i}{\partial x_k} - \overline{u_i u_k}\frac{\partial U_j}{\partial x_k}$$
#### - Dissipation ($\epsilon_{ij}$): The destruction of turbulence into heat by viscous forces.
#### - Pressure-Strain Correlation ($\phi_{ij}$): The redistribution of energy between normal stresses. This is the most critical and difficult term to model.

---

# 3. The "Closure Problem" & Machine Learning Solution
## 3.1 The Pressure-Strain Correlation ($\phi_{ij}$)
### The pressure-strain term does not change the total energy of the turbulence (its trace is zero in incompressible flow), but it dictates how energy moves between components, for example, transferring energy from the streamwise direction to the wall-normal direction.
### $$\phi_{ij} = \overline{ \frac{p}{\rho} \left( \frac{\partial u_i}{\partial x_j} + \frac{\partial u_j}{\partial x_i} \right) }$$
### Standard physics-based models (like the LRR or SSG models) attempt to approximate $\phi_{ij}$ using linear expansions of the stress tensor. However, these algebraic models often fail to satisfy realizability constraints or capture non-local flow physics.

## 3.2 Data-Driven Turbulence Modeling
### Recent research focuses on using high-fidelity DNS data to "learn" these complex terms directly, bypassing the limitations of algebraic assumptions. By training Machine Learning algorithms on exact DNS data, we can create a surrogate model that predicts $\phi_{ij}$ with high accuracy based on local flow features.
### Why Machine Learning?
#### - Universality: ML models trained on massive datasets can potentially generalize to new flow types better than tuned coefficients.
#### - Non-Linearity: Neural Networks and Gradient Boosted Trees can capture complex, non-linear dependencies that linear algebraic models miss

---

# 4. Project Methodology & Objectives
## 4.1 Functional Mapping
### We aim to approximate the pressure-strain term $\phi_{uv}$ (the shear component) as a function of local flow invariants:
### $$\phi_{uv} = f(b_{uv}, \epsilon, \frac{dU}{dy}, k)$$

## 4.2 Algorithms Evaluated
### We will implement and compare three distinct regression algorithms proposed in the study:

#### 1. Artificial Neural Networks (MLP): Modeled using PyTorch. Good for smooth function approximation but prone to overfitting on small data.

#### 2. Random Forests (RF): Modeled using Scikit-learn. Excellent for checking feature importance.

#### 3. Gradient Boosted Decision Trees (GBDT): Modeled using XGBoost. The state-of-the-art for tabular regression, focusing on correcting residual errors sequentially.

## 4.3 Novel Contributions of this Implementation
### While adhering to the physics of the original paper, this project introduces modern engineering practices:


#### - Bayesian Optimization with Optuna: Replacing manual hyperparameter tuning with an automated, efficient Bayesian search to find optimal model architectures.
#### - Scalable Implementation: Utilizing XGBoost for industry-standard speed and scalability.
#### - Out-of-Distribution Testing: Rigorous validation against Couette Flow to ensure the model has learned physics, not just memorized the Channel Flow training data.

---

# 5. Dataset Description
### The project utilizes high-fidelity Direct Numerical Simulation (DNS) data from the Oden Institute Turbulence File Server.
## Training Data: Turbulent Channel FlowPhysics: 
#### - Pressure-driven flow between two infinite parallel plates.
#### - Re Numbers ($Re_\tau$): 550, 1000, 2000, 5200.
#### - Usage: Used to teach the model the fundamental relationship between anisotropy, shear, and pressure-strain.
## Testing Data: Turbulent Couette FlowPhysics: 
#### - Shear-driven flow caused by moving walls (no pressure gradient).
#### - Re Number ($Re_\tau$): $\approx 500$ (Domain $L_x = 100\pi$).
#### - Usage: A "blind test" to check if the model generalizes to a flow driven by different physics.
## Training Strategy (Leave-One-Out)
### We perform 4 distinct training experiments to assess robustness, as defined in Table 1:

In [1]:
import pandas as pd    # for tabular data exploration

In [2]:
Table_1_df = pd.read_csv("../Tables/Table 1.csv")
Table_1_df.set_index("Case",inplace=True)
Table_1_df

Unnamed: 0_level_0,Training Set,Testing Set
Case,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"550, 1000, 2000",5200
2,"550, 1000, 5200",2000
3,"550, 2000, 5200",1000
4,"1000, 2000, 5200",550


---

# End of Notebook

## Furthur explanation about the various turbulance modelling techniques, the physics behind the modelling and about the explanation of various Machine Learning and thier hyperparameter tuning techniques is in the original research paper (path - '. ./Research Paper/Research Paper - J.P Panda and H.V Warrior.pdf')

---