<a href="https://colab.research.google.com/github/andrewjustin/anemoi-house-workflow/blob/master/colab-anemoi-workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Anemoi Workflow Demo

##### This notebook will guide you through the training an AI4NWP model with the Anemoi framework.

##### For questions, please contact andrew.justin@noaa.gov.

## 1) Environment Setup (10 minutes)

**TODO**: Remove *ufs2arco* from environment and use wget to retrieve YAMLs and pre-generated dataset.

**NOTE:** You will receive a popup after all packages are installed. Click "**restart session**" on the popup and continue on to the next step.

In [7]:
!pip install ufs2arco==0.6 mpi4py anemoi-datasets==0.5.23 anemoi-graphs==0.5.2 anemoi-models==0.5.0 anemoi-training==0.4.0 anemoi-inference 'numpy<2.3' 'earthkit-data<0.14.0' --force-reinstall


Collecting flash-attn
  Using cached flash_attn-2.8.0.post2-cp311-cp311-linux_x86_64.whl
Collecting torch (from flash-attn)
  Using cached torch-2.7.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting einops (from flash-attn)
  Using cached einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting filelock (from torch->flash-attn)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from torch->flash-attn)
  Using cached typing_extensions-4.14.0-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy>=1.13.3 (from torch->flash-attn)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch->flash-attn)
  Using cached networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch->flash-attn)
  Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec (from torch->flash-attn)
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Collecting nvi

## 2) Upload and Extract ZIP Folder containing YAML files

- Upload the provided *anemoi.zip* folder to your current colab session, then run the cell below.

In [1]:
!unzip anemoi.zip -d .

Archive:  anemoi.zip
   creating: ./data/
  inflating: ./data/zarr.yaml        
   creating: ./dataloader/
  inflating: ./dataloader/native_grid.yaml  
   creating: ./datamodule/
  inflating: ./datamodule/single.yaml  
   creating: ./diagnostics/
   creating: ./diagnostics/benchmark_profiler/
  inflating: ./diagnostics/benchmark_profiler/detailed.yaml  
  inflating: ./diagnostics/benchmark_profiler/simple.yaml  
   creating: ./diagnostics/callbacks/
  inflating: ./diagnostics/callbacks/placeholder.yaml  
  inflating: ./diagnostics/callbacks/pretraining.yaml  
  inflating: ./diagnostics/callbacks/rollout_eval.yaml  
  inflating: ./diagnostics/evaluation.yaml  
   creating: ./diagnostics/plot/
  inflating: ./diagnostics/plot/detailed.yaml  
  inflating: ./diagnostics/plot/none.yaml  
   creating: ./graph/
  inflating: ./graph/encoder_decoder_only.yaml  
  inflating: ./graph/multi_scale.yaml  
   creating: ./hardware/
  inflating: ./hardware/example.yaml  
   creating: ./hardware/files/
 

# 3) Dataset Generation

### 3.1) Define 'Recipe' YAML Paths

Datasets are generated with *ufs2arco* by referencing 'recipes' that define the structures of your training, validation, and testing datasets.

In [1]:
train_yaml_path = 'training.yaml'  # training YAML path
valid_yaml_path = 'validation.yaml'  # validation YAML path
test_yaml_path = 'testing.yaml'  # testing YAML path

The sample datasets in this notebook will include data for the following timeframes at 3-hourly intervals:
- **Training**: 0z 1 Jan 1994 - 21z 2 Jan 1994
- **Validation**: 0z 3 Jan 1994 - 21z 4 Jan 1994
- **Testing**: 0z 5 Jan 1994 - 21z 6 Jan 1994

### 3.2) Create the Training Dataset (1-2 minutes)

In [2]:
!ufs2arco {train_yaml_path}

Traceback (most recent call last):
  File "/usr/local/bin/ufs2arco", line 5, in <module>
    from ufs2arco.cli import main
  File "/usr/local/lib/python3.11/dist-packages/ufs2arco/__init__.py", line 7, in <module>
    from .cice6dataset import CICE6Dataset
  File "/usr/local/lib/python3.11/dist-packages/ufs2arco/cice6dataset.py", line 5, in <module>
    import xarray as xr
  File "/usr/local/lib/python3.11/dist-packages/xarray/__init__.py", line 3, in <module>
    from xarray import coders, groupers, testing, tutorial, ufuncs
  File "/usr/local/lib/python3.11/dist-packages/xarray/coders.py", line 6, in <module>
    from xarray.coding.times import CFDatetimeCoder, CFTimedeltaCoder
  File "/usr/local/lib/python3.11/dist-packages/xarray/coding/times.py", line 12, in <module>
    import pandas as pd
  File "/usr/local/lib/python3.11/dist-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/usr/local/lib/python3.11/dist-packages/pandas/core/api.py", 

### 3.3) Create the Validation Dataset (1-2 minutes)

In [4]:
!ufs2arco {valid_yaml_path}

  xds = xr.open_zarr(


### 3.4) Create the Testing Dataset (1-2 minutes)

In [5]:
!ufs2arco {test_yaml_path}

  xds = xr.open_zarr(


# 4) Model Setup & Training

### 4.1) Environment Variables

- Anemoi requires a "base seed" and a SLURM job ID.
  - The base seed is used to initialize model weights. Changing the seed will result in different initial model parameters.
  - The SLURM job ID is required, even if you are not on SLURM (just leave it as "0").
- Hydra can be configured to output more complete tracebacks for debugging purposes.


In [2]:
import os

### Required ###
os.environ["ANEMOI_BASE_SEED"] = "42"
os.environ["SLURM_JOB_ID"] = "0"

### Optional ###
os.environ['HYDRA_FULL_ERROR'] = "1"  # for debugging

### 4.2) Train the Model

In [3]:
!anemoi-training train --config-name=model-config.yaml

2025-06-24 18:48:13 INFO Running anemoi training command with overrides: ['--config-name=model-config.yaml']
2025-06-24 18:48:20 INFO NumExpr defaulting to 2 threads.
2025-06-24 18:48:23 INFO Prepending current user directory (/content) to the search path.
2025-06-24 18:48:23 INFO Search path is now: [provider=anemoi-cwd-searchpath-plugin, path=/content, provider=hydra, path=pkg://hydra.conf, provider=main, path=pkg://anemoi.training/config]
[2025-06-24 18:48:24,044][anemoi.training.train.train][INFO] - Config validated.
[2025-06-24 18:48:24,044][anemoi.training.train.train][INFO] - Run id: 9439c3c1-0d21-4532-8d40-0217feb9d337
[2025-06-24 18:48:24,045][anemoi.training.train.train][INFO] - Checkpoints path: p1/training-output/checkpoint/9439c3c1-0d21-4532-8d40-0217feb9d337
[2025-06-24 18:48:24,045][anemoi.training.train.train][INFO] - Plots path: p1/training-output/plots/9439c3c1-0d21-4532-8d40-0217feb9d337
[2025-06-24 18:48:24,854][anemoi.graphs.nodes.builders.from_file][INFO] - Readin