### Introduction & Vision
Welcome to this **prototype notebook**, which illustrates a **foundational workflow** for **Physics-AI** modeling, using **Darcy Flow** (and an **FNO** approach) as a **concrete example**. The goals of this notebook are **multi-faceted**:

1. **Showcase a workflow** for **Physics-AI training** with **Darcy Flow** data, emphasizing how easily we can integrate **FNO** models in **NVIDIA Modulus** or similar frameworks.  
2. **Demonstrate a general-purpose AutoML approach**—one that systematically searches for optimal **hyperparameters** (learning rate, channels, modes, etc.) across *any* PDE or neural operator.  
3. **Preview Active Learning** as a complementary strategy to guide data acquisition based on model **uncertainty**. (In this notebook, I’ll outline how to do it; the *actual active learning code* is placed in a second notebook, [`darcy_active_learning.ipynb`](darcy_active_learning.ipynb).)

Beyond these **current technical** accomplishments, this notebook also hints at **inspirational** next steps—extending from a **single PDE** toward a broad **Physics-AI solution** that includes:

#### I. Technical (Software Engineering)
- **Ontology-based data transformations**: A structured, automated pipeline for bridging different PDE data shapes (mesh ↔ grid ↔ point cloud). It reduces manual conversions and helps unify HPC outputs (e.g., AMReX) with ML-ready arrays.  
- **Ontology engine**: A framework that **detects** dataset geometry (e.g., uniform grid vs. unstructured mesh) and picks the right operator or transformation step. This paves the way for “one-click” PDE model building, especially when integrated with AutoML.  
- **AutoML for candidate selection**: Not just hyperparameters, but also *which* neural operator (FNO, AFNO, WNO, PINNs, etc.) is best for a given domain geometry or PDE. This accelerates experimentation by automatically ranking model architectures.  
- **Advanced workflow, pipelines, and training**: Encompasses **HPC synergy** (ingesting large or partially refined HPC data), orchestrated pipelines (e.g., Kedro, Airflow), and **accelerated PDE surrogate training** (multi-GPU, distributed). Together, these let us efficiently handle time-evolving PDE snapshots and large-scale parameter sweeps in real engineering environments.

#### II. Model & Physics Content
- **Models: WNO, NUFNO, etc.**:
  Beyond standard FNO, options like **Wavelet Neural Operator (WNO)** capture sharper local features via wavelet transforms, while **Non-Uniform FNO (NUFNO)** accommodates partial refinements or semi-structured domains. These advanced architectures can improve accuracy without a complete shift to graph-based methods.

- **Customized operator designs**:
  Domain-specific enhancements—e.g., a plasma-tailored FNO or specialized boundary treatments—boost performance on PDEs with unique constraints (sharp separatrices, anisotropy). This ensures surrogates match real-world physics more precisely than generic operators.

- **Ensembles / partial refinement & local closures**:
  In HPC settings with variable mesh refinement or subgrid phenomena, a hybrid approach (e.g., FNO + DiffusionNet) can handle global PDE patterns while focusing local operators on high-resolution patches. This preserves large-scale coverage and detail where it matters most.

- **Multi-scale patterns**  
  Many PDEs combine broad wave modes with fine-edged phenomena (e.g., subgrid turbulence). Leveraging wavelet-based or ensemble architectures means each scale can be tackled effectively—ensuring no critical features get lost.

- **Multi-physics regimes**:
  Real engineering tasks often blend multiple physics (e.g., fluid–structure interaction, electromagnetic–thermal coupling). By composing or extending neural operators for each sub-physics domain, we can solve coupled PDE sets under one pipeline.

- **Physics-Informed Loss (added to current physics-informed architecture)**:
  Incorporating PDE constraints directly into the training objective ensures surrogates adhere to known physics. This is invaluable for **inverse problem solving** (where data can be sparse) and for overall stability/robustness when extrapolating to new parameter regimes.

#### III. Downstream Applications
- **Inverse Problem Solving**:
  Quickly invert PDE relationships to find **which input conditions** yield a desired output (e.g., *some configuration for a target outcome value*). This drastically reduces design-cycle times compared to iterative HPC solves.

- **Optimization**:
  Plug surrogates into parametric optimization loops (shape optimization, operational parameter tuning). The surrogate’s fast inference replaces expensive HPC calls at each iteration, speeding up design exploration.

- **Deployment, HPC workflows, and MLOps**:
  Once the model is trained, seamlessly **deploy** it alongside HPC codes for real-time PDE updates, controlling or monitoring processes. MLOps features (monitoring, versioning) ensure reliability, traceability, and easy model updates in production or research HPC clusters.

I’m calling it a **“kick-off”** project because, even though it’s built around Darcy Flow and FNO, the underlying design can **readily scale**—both in terms of PDE complexity (multi-scale turbulence, advanced HPC data) and in terms of **workflow** (AutoML, HPC integration, interactive active learning, etc.). By adopting these modular components, we set the stage for a future in which **Physics-AI** model development becomes more automated, adaptable, and robust—serving a wide range of scientific and engineering challenges.

## Table of Contents
1. [00_Generate_Data](#00_generate_data)  
   - [00_01 Darcy Flow Simulation + Data Descriptor Creation](#00_01-darcy-flow-simulation--data-descriptor-creation)  
     - Demonstrates **generating synthetic Darcy flow data** (via `Darcy2D`) and creating a **data descriptor**. This lays the groundwork for PDE data ingestion, transformations, and future AutoML usage.

2. [01_Build_Surrogate_Model](#01_build_surrogate_model)  
   - [01_00 AutoMLCandidateModelSelection](#01_00-automlcandidatemodelselection)  
     - Introduces how we pick an operator (e.g., FNO or AFNO) for potential multi-model searches.

   - [01_01 Data Loading and (Optional) Data Transformation](#01_01-data-loading-and-optional-data-transformation)  
     - Covers loading raw `.pt` files or synthetic data, plus transformations like normalization or boundary labeling.  
       - [01_01_01 LoadRawData](#01_01_01-loadrawdata)  
         - Shows how `.pt` data is read, with minimal Exploratory Data Analysis (EDA).  
       - [01_01_03 TransformRawData](#01_01_03-transformrawdata)  
         - Applies any coordinate expansions, normalization, or shape fixes.  
       - [01_01_04 Preprocessing](#01_01_04-preprocessing)  
         - Optional steps for data quality checks or outlier removal.  
       - [01_01_05 FeaturePreparation](#01_01_05-featurepreparation)  
         - Final feature engineering, e.g., boundary-channel additions.

   - [01_02 Model Definition](#01_02-model-definition)  
     - **Implements** the PDE surrogate networks (e.g., `FNOWithDropout`, AFNO). Explains class architecture and relevant config fields.

   - [01_03 Model Factory](#01_03-model-factory)  
     - Demonstrates a single function `get_model(cfg)` that returns a chosen operator based on `model_name` in the config.

   - [01_04 Configuring Hyperparameters](#01_04-configuring-hyperparameters)  
     - Discusses reading or overriding hyperparams (Fourier modes, widths, learning rate, etc.) from `config.yaml`. Also references HPC or local usage.

   - [01_05 Model Training Loop](#01_05-model-training-loop)  
     - Outlines the core training logic: optimizer, loss, epoch iteration, logging (potentially with MLFlow).

   - [01_06 Model Training Execution](#01_06-model-training-execution)  
     - **Brings it together**: builds a `model`, obtains a `dataloader`, and runs the main training loop. For example:
       ```python
       model = get_model(cfg)
       train_loader = get_darcy_data_loader(cfg)
       final_val_loss = run_modulus_training_loop(cfg, model, train_loader)
       ```
     - Presents how we might do single-run or multi-model iteration.

   - [01_07 AutoML and Hyperparameter Tuning](#01_07-automl-and-hyperparameter-tuning)  
     - Demonstrates **Optuna** or similar libraries for PDE hyperparameter search (e.g., `modes`, `width`, `depth`, `lr`). Also covers multi-model tuning (FNO vs. AFNO).

   - [01_08 Visualizing Performance and Results](#01_08-visualizing-performance-and-results)  
     - Shows how to **plot** training/validation curves or produce PDE field comparisons (predicted vs. ground truth). Possibly lists best trials from AutoML.

3. [Offline Active Learning (Short Overview)](#offline-active-learning-short-overview)
   - **Note**: Active Learning steps (MC-Dropout for uncertain PDE samples) are covered in a separate notebook. We only summarize here.  
   - [Active Learning Notebook](darcy_active_learning.ipynb) — The second file demonstrates:
     1. Loading a **dropout-enabled** operator,
     2. Running multiple forward passes for uncertainty,
     3. Selecting top-K uncertain PDE inputs,
     4. (Optionally) saving them for partial retraining or HPC PDE solves.

> *If you only need to see the AL approach, jump directly to [Active Learning Notebook](darcy_active_learning.ipynb).* This first notebook focuses on data generation, model building, and AutoML. 

### 01_00 AutoML CandidateModelSelection

In this section, we load our dataset descriptor file (`data_desc.json`) and define a few example model descriptors—such as one for FNO and another for DiffusionNet. Then, by calling our `automl_candidate_model_selection(...)` function, we filter out which models are *compatible* with the dataset based on dimensions, geometry type, and other metadata from the descriptor.

We'll then save the chosen candidates to a JSON file for subsequent steps in the pipeline.

### 01_01 Data Loading and (Optional) Data Transformation

In this section, we **import and organize** the raw PDE data files (e.g., `.pt` files) produced by the previous step (“00_Generate_Data”). We’ll place them in a new folder such as `01_01_LoadRawData` or apply a minimal transform if necessary (like renaming, verifying shapes, or normalizing file structures).

We rely on simple helper functions from **`src/data_loading.py`**, which provides:

- **`copy_raw_data(src_folder, dst_folder)`** to systematically copy `.pt` files from a source directory to the target folder in our pipeline.
- (Optionally) **`maybe_load_pt_files(folder_path)`** to scan or validate `.pt` files before use.

These routines keep our data workflow **modular** and consistent with the approach introduced in “01_00 AutoMLCandidateModelSelection.” Later steps (like `TransformRawData`, `Preprocessing`, and `FeaturePreparation`) may build on these same I/O functions to tailor data for different candidate models or transformations.

#### 01_01_01 LoadRawData

In this subsection, we focus on **loading or copying** the raw `.pt` files produced during our earlier data generation step. Specifically:

- We **read** `.pt` files from the folder `data/00_Generate_Data/`.  
- We **create** a new folder, `data/01_01_LoadRawData/`, dedicated to housing these files for this stage in the pipeline.  
- We **preserve** (or replicate) the existing dataset descriptor (`data_desc.json`) so that the data structure and metadata remain consistent.  
- We conduct **minimal exploratory data analysis (EDA)** by loading at least one `.pt` file and confirming the presence/shapes of `"permeability"`, `"darcy"`, or other relevant keys.  

This ensures we keep a clear, structured approach: each step or stage in our data pipeline has its own folder, so we can better manage transformations, track candidate model logic, and later apply further preprocessing steps if needed. We will rely on the helper functions defined in `src/data_loading.py` (such as `copy_raw_data(...)`) to carry out these tasks with minimal code in the notebook.


#### 01_01_03 TransformRawData

In this section, we take the **candidate models** from our previous step (where we determined which architectures are compatible with the dataset). Each candidate (e.g., `"candidate0"`, `"candidate1"`, etc.) will get its own subfolder, where we either:

1. **Copy** the raw `.pt` data batches from the previous step (01_01_LoadRawData) 
2. Optionally **transform** them (e.g., add boundary channels, adjust resolution, or other PDE-specific modifications).

Right now, we’ll keep it minimal—essentially creating a placeholder transform step. In a real pipeline, you might:
- **Re-grid** data to match each model’s expected resolution,
- **Add** boundary-condition channels,
- **Normalize** or augment the data in different ways per candidate.

We’ll demonstrate this by reading the JSON file listing our chosen candidates (e.g., `[("FNO", "candidate0"), ...]`), then copying or performing a trivial transformation on each batch of `.pt` data. After this, each model’s training directory can remain distinct, simplifying further data preprocessing or feature engineering.

#### 01_01_04 Preprocessing

Real-world PDE workflows often include additional data modifications before final model training. These steps might include:

- **Geometry augmentation** (e.g., random rotations or domain cropping).  
- **Cleaning** or **Filtering** out invalid or noisy samples (like NaNs or out-of-bounds values).  
- **Domain-specific** preprocessing (e.g., boundary labeling, domain parameterization).

However, in this prototype, we keep the preprocessing step **minimal**. Our goal here is simply to:

1. Copy the data from our **transformed** candidates (from the previous step) into a new “Preprocessing” folder.  
2. Potentially add a placeholder for any minor shape checks or corrections.

In a real production pipeline, this function (`do_preprocessing_for_candidates`) could be expanded to handle domain-specific transformations. Here, we illustrate only the basic structure and docstring, leaving advanced logic as a future exercise.

#### 01_01_05 FeaturePreparation

In many PDE workflows, we may need additional feature engineering before the data is fully ready for model training. 
Examples include:

- **Adding boundary channels** that mark where the domain boundaries lie, so the model can distinguish boundary or interface regions.
- **Coordinate expansions** (e.g., adding \(x, y\) grids as separate input channels) for certain PDE operator methods.
- **Combining** multiple PDE fields into a single input tensor or rearranging channels for specialized architectures.

For our purposes, we keep feature preparation **minimal**—essentially copying the data 
to a new location to illustrate how you *could* add extra feature-engineering steps later 
(should your PDE problem require boundary masks, domain embedding, multi-physics channels, etc.).

We'll call a function from `src/feature_engineering.py`, like 
`prepare_features_for_candidates(...)`, which currently does minimal or no transformations, 
but can be extended to handle domain-specific feature expansions in the future.

### Conclusion: Data Pipeline Ready

We have successfully completed the **data pipeline** for our Darcy Flow project, including:

1. **Dataset Descriptor Logic**  
   - Loaded the PDE data descriptor (`data_desc.json`) to verify dimension, geometry type, uniformity, and other required fields.  
   - Checked candidate model compatibility through the `automl_candidate_model_selection(...)` logic.

2. **Raw Data Loading**  
   - Copied `.pt` files from `data/00_Generate_Data` to `data/01_01_LoadRawData`, preserving the original descriptor.  
   - Performed minimal Exploratory Data Analysis (EDA) to confirm shapes and contents.

3. **Transforming Raw Data**  
   - Created subfolders for each candidate (e.g., `candidate0`, `candidate1`) to store transformed data in `data/01_03_TransformRawData`.  
   - Demonstrated how to adapt or augment data further if needed.

4. **Preprocessing Steps**  
   - Showed a placeholder for tasks like geometry augmentation or data cleaning, placing output in `data/01_04_Preprocessing`.

5. **Feature Preparation**  
   - Demonstrated how to finalize input features (e.g., adding boundary channels or coordinate expansions) in `data/01_05_FeaturePreparation`.  
   - Verified results with a quick shape and visualization check.

**Outcome**  
- At this stage, **all data is prepared** for the next steps, whether a single-model training or a more complex *AutoML* hyperparameter tuning approach. Our directories now look something like this:

```
data/
 ├─ 00_Generate_Data/
 ├─ 01_00_AutoMLCandidateModelSelection/
 ├─ 01_01_LoadRawData/
 ├─ 01_03_TransformRawData/
 ├─ 01_04_Preprocessing/
 └─ 01_05_FeaturePreparation/
 ```

### 01_02 Model Definition

In this section, we introduce our primary PDE surrogate model definitions. We focus on two main variants:

1. **FNOWithDropout** – A custom subclass of Modulus’s Fourier Neural Operator (FNO) that injects dropout. This allows us to do Monte Carlo Dropout–based uncertainty estimation or simply add a regularization mechanism.
2. **AFNO** – NVIDIA Modulus’s Adaptive Fourier Neural Operator, which uses an adaptive frequency gating approach for improved spectral flexibility.

Both surrogates rely on hyperparameter definitions stored in our `config.yaml` under `cfg.arch.fno.*` or `cfg.arch.afno.*`. By default, we’ll pull settings like `in_channels`, `out_channels`, `latent_channels`, `drop` (dropout rate), and so on directly from `config.yaml`. You can override these values in the notebook if needed—just edit the `cfg` object before creating the models.

We’ll keep the actual model classes (and any helper functions) in `src/models.py` (or sub-files like `fno_dropout.py`, `afno.py`), each thoroughly documented with docstrings. Then, in the next cells, we’ll show how to use these classes in conjunction with the config fields.

### 01_03 Model Factory

This section focuses on **merging our user configuration** (especially the field `cfg.model_name`) with the model definitions created in “01_02 Model Definition.” By doing so, we can **automate** which PDE surrogate to build—be it an FNO-based model, AFNO, or a future extension (like a PINN or DiffusionNet). 

**Why a Factory?** It lets us keep a **single** entry point (`get_model(cfg)`), which reads the relevant parameters (`cfg.arch.fno.*`, `cfg.arch.afno.*`, etc.) and returns the correct PyTorch module. This modular approach also makes it straightforward to **add** new model variants (e.g., a different neural operator) without changing the notebook workflow. 

In the following steps, we’ll:
1. Create a new file, `model_factory.py`, that defines `get_model(cfg)` (with docstrings).
2. Demonstrate how we **import** and **use** this factory function in the notebook.
3. Confirm it works by instantiating a model and optionally running a quick shape check.

This pattern helps maintain a **clean separation** between model definitions and the logic that decides **which** model to instantiate—making the pipeline easier to scale and adapt for new PDE surrogates.

### 01_04 Configuring Hyperparameters

In this section, we outline how to configure the hyperparameters for our PDE surrogate models. 
Recall that we store default values (like `epochs`, `learning_rate`, `batch_size`, etc.) in our
[`config.yaml`](./config.yaml). 

For instance, here are a few default hyperparameters you might see in that file:

| Hyperparameter      | Default Value | Description / Notes                                |
|---------------------|--------------|-----------------------------------------------------|
| `training.epochs`   | 10           | Number of training epochs                           |
| `training.lr`       | 1e-3         | Initial learning rate for the optimizer            |
| `training.batch_size` | 16        | Mini-batch size for training loops                 |
| `arch.fno.num_fno_modes` | 12      | Number of Fourier modes (FNO-specific)             |
| `arch.afno.drop`    | 0.1          | Dropout rate for AFNO gating (AFNO-specific)       |

**Overriding Hyperparams Locally**  
You can update these hyperparameters within the notebook before training or tuning. For example:
```python
cfg.training.lr = 5e-4
cfg.training.epochs = 30
print("Updated training config:", cfg.training)
```

**Using MLFlow**  
We also demonstrate how to log hyperparameters to MLFlow, so each run’s configuration is 
stored alongside its metrics and artifacts. In a typical flow, you might do:

```python
import mlflow

mlflow.start_run(run_name="Experiment_FNO")
# log hyperparams
log_hyperparams_mlflow(cfg)

# proceed with training...
mlflow.end_run()
```

In subsequent cells, we’ll show how to integrate these hyperparameters into the training loop, 
as well as how to override them for AutoML or HPC use cases if you wish. 
This approach ensures a **reproducible** pipeline—where each run can be traced back 
to its exact configuration and settings.

### 01_05 Model Training Loop

In this section, we implement a **generic PDE training loop** that references our **configuration parameters** (like epochs, learning rate, batch size, etc.) from `config.yaml`. This training loop can be used for:

- **Single-Run Training**: Train a single model with a chosen set of hyperparameters (e.g., an FNO or AFNO).
- **Multi-Run/AutoML** scenarios: Called multiple times with different hyperparameter overrides for hyperparameter tuning (we’ll see this usage in a later section).

We incorporate:
- **Progress Bars** with `tqdm`, to get live feedback on training progress (especially helpful in notebooks).
- **MLFlow Logging** (optional), so each epoch’s train and validation loss is recorded for future analysis.
- **Device Handling** (CPU vs. GPU via a `device` parameter).

If you’re running on **HPC or distributed** environments, you may want to disable the tqdm progress bars (for performance/logging reasons) and/or integrate distributed managers from Modulus or PyTorch. We’ll point out where those hooks go, but keep them minimal for this prototype.

Below, we’ll demonstrate how to use our training loop, pass in a config object, and see the relevant progress bar and MLFlow logs.

### 01_06 Model Training Execution

In this section, we bring together all of the moving parts from our pipeline:
- **Data pipeline**: The raw data has been generated, transformed, and preprocessed in the earlier steps.
- **Model factory**: We can instantiate our chosen model (e.g., FNO or AFNO) using the config-based logic from “01_03 Model Factory.”
- **Hyperparameter settings**: From “01_04 Configuring Hyperparameters,” we have default (or overridden) values for epochs, learning rate, batch size, and so on.
- **Training loop**: As defined in “01_05 Model Training Loop,” which handles epochs, mini-batches, loss calculation, optional validation, and more.

By **combining** these steps, we now present a **user-facing script or function** (`execute_training` or similar) that performs the **end-to-end** training process:
1. **Pull** the final data loader(s),  
2. **Create** or load the model,  
3. **Train** using our training loop,  
4. **Track** progress in a notebook progress bar (using `tqdm` by default),  
5. **Log** metrics to MLFlow (if desired),  
6. **Save** checkpoints according to the user’s preference (final, best, or every epoch).

We’ll also briefly show how to adjust or disable certain features for HPC usage—such as turning off the progress bar or hooking in distributed training if needed. The remainder of this section walks through a Python function and example usage in the notebook to carry out this consolidated training flow.

### 01_07 AutoML and Hyperparameter Tuning
In the previous section (“01_06 Model Training Execution”), we demonstrated how to train our PDE surrogate (FNO or AFNO) with a chosen set of hyperparameters—either from our default `config.yaml` or via simple overrides. Now, we turn to a more **systematic** approach: **hyperparameter tuning** or **AutoML**.

Here, we’ll leverage a search method (grid, random, or Bayesian—commonly **Optuna** in Python) to explore the hyperparameter space. Our `config.yaml` already contains default parameter values and additional fields (under `cfg.automl`) specifying **ranges** (e.g., Fourier modes from 8 to 20, learning rate from 1e-4 to 5e-3, etc.). 

**MLFlow Logging**  
Just as in our normal training, we’ll integrate MLFlow to log each hyperparam trial’s configuration and final metrics. By doing so, we can easily compare many trials in a single, consolidated UI. 

**Progress Bars**  
For each trial, we can still rely on our PDE training loop’s `tqdm` progress bar—although for a large number of trials, it might be practical to reduce the training epochs or batch sizes to speed up each run.

---

**Key Points in This Section**
1. **Hyperparameter Range Setup**  
   We confirm or update the `config.yaml` sub-tree (`cfg.automl`) that defines the search space for FNO (e.g. `modes`, `width`, `depth`, etc.) and, if relevant, for AFNO (`drop`, `gating_strength`, etc.).

2. **AutoML Logic**  
   We’ll create or review a new file, `src/automl.py`, which contains code to parse those search ranges and define an **Optuna objective** function.

3. **Partial vs. Full Training**  
   In each trial, we might do a reduced set of epochs or data to expedite the search. Once the best params are found, we’ll do a **full** retraining using the discovered configuration.

4. **MLFlow**  
   We’ll log each trial’s hyperparams and final validation metrics under separate nested runs, so you can open MLFlow and compare them.

By the end of this section, you’ll have seen how to run multiple hyperparam search trials—**automatically** adjusting FNO or AFNO parameters—before picking the best discovered setup for a final training pass.

### 01_08 Visualizing Performance and Results

After training our PDE surrogate models (and possibly using AutoML to tune hyperparameters),
we now want to **examine** how well they perform. In this section, we will:

1. **Load** the training/validation metrics from our logs (or MLFlow, if enabled).
2. **Plot** these metrics (e.g., loss curves over epochs).
3. **Compare** model predictions to ground-truth solutions for a few test samples—especially
   valuable in Darcy flow, where we can visualize the predicted pressure fields vs. the true
   solution.
4. **Summarize** errors (e.g., MSE, absolute difference) across a set of test samples to
   get a sense of overall accuracy, variance, and potential failure cases.

We rely on the utility functions we placed in **`src/visualization.py`**:

- `plot_train_val_loss(...)`: For plotting training/validation loss curves.
- `plot_prediction_comparison(...)`: Side-by-side visualization of **input** (permeability),
  **predicted** (pressure), **ground truth** (pressure), and a simple **error map**.
- `plot_error_distribution(...)`: Quick histogram or boxplot of errors across many samples.
- `summarize_metrics_table(...)`: A small table summarizing results from multiple runs.

Finally, we’ll also **load** a saved model checkpoint (if we have one) or pick a final/best-epoch
checkpoint to run inference on sample PDE inputs. By the end, we should have a clear picture
of how our model is performing and any areas for improvement.

#### Concluding the Visualization & Pipeline
We’ve now completed a full pass through our PDE surrogate pipeline—from data preparation, 
model definition, and hyperparameter tuning, to final training and results visualization.

- **Final Observations**:  
  - For instance, using FNO with `modes=12` and `width=64` yielded approximately **X%** relative error on the test set.  
  - The predicted Darcy fields show close alignment with the ground truth solutions, as seen in our 2D plots.

- **HPC Readiness**:  
  - If you plan to run larger resolutions or more epochs, the same notebook logic can scale to HPC environments. 
  - You may disable the progress bar or use a distributed manager (e.g., `DistributedManager` in Modulus) to parallelize training.

- **Advanced Features**:  
  - In real-world scenarios, consider adding PDE constraints, subgrid modeling, or multi-objective optimization if the use-case demands more advanced physics fidelity.
  - **Active Learning** can be integrated to select new PDE samples, especially if generating or simulating data is expensive.

- **MLFlow or Other Logs**:  
  - If you recorded metrics in MLFlow, open the MLFlow UI (or your logging interface) to view interactive charts, parameter comparisons, and artifacts (e.g., model checkpoints, images).

**Next Steps**:
1. **Refine the Model**: Increase epochs, tweak hyperparameters further, or incorporate additional PDE constraints.
2. **Deploy or Save** the pipeline: Convert your final model to an inference engine or HPC environment.
3. **Explore** expansions like deeper AFNO gating, multi-physics PDE coupling, or more advanced domain transformations.

With these steps, you have a **functioning** pipeline that can be adapted for **larger HPC** usage, 
more sophisticated PDE tasks, or integrated with **AutoML** strategies to systematically refine hyperparameters.