<a href="https://colab.research.google.com/github/andrewjustin/anemoi-house-workflow/blob/master/colab-anemoi-workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Anemoi Training Workflow Demo

##### This notebook will guide you through the training an AI4NWP model with the Anemoi framework.

##### For questions, please contact andrew.justin@noaa.gov.

##### **Acknowledgments:**  Tim Smith (PSL) https://github.com/NOAA-PSL/anemoi-house

# 1) Environment Setup (4 minutes)

**NOTES:**
- Use an **A100 runtime instance** to successfully run this notebook.
- You will receive a popup after all packages are installed. Click "**restart session**" on the popup and continue on to the next step.
- *--force-reinstall* is added to *pip install* to prevent an environment conflict with the *flash-attn* package.

In [None]:
!pip install anemoi-datasets==0.5.23 anemoi-graphs==0.5.2 anemoi-models==0.5.0 anemoi-training==0.4.0 anemoi-inference flash-attn wget 'numpy<2.3' 'earthkit-data<0.14.0' --force-reinstall

Collecting anemoi-datasets==0.5.23
  Downloading anemoi_datasets-0.5.23-py3-none-any.whl.metadata (16 kB)
Collecting anemoi-graphs==0.5.2
  Downloading anemoi_graphs-0.5.2-py3-none-any.whl.metadata (15 kB)
Collecting anemoi-models==0.5.0
  Downloading anemoi_models-0.5.0-py3-none-any.whl.metadata (16 kB)
Collecting anemoi-training==0.4.0
  Downloading anemoi_training-0.4.0-py3-none-any.whl.metadata (15 kB)
Collecting anemoi-inference
  Downloading anemoi_inference-0.6.3-py3-none-any.whl.metadata (16 kB)
Collecting flash-attn
  Downloading flash_attn-2.8.0.post2.tar.gz (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m69.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpy<2.3
  Downloading numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
[2K     [

# 2) Retrieve ZIP folder containing dataset and required YAML files (30 seconds)

In [None]:
import wget

wget.download('https://epic-noaa.s3.us-east-1.amazonaws.com/anemoi.zip')
!unzip anemoi.zip -d .

Archive:  anemoi.zip
   creating: ./data/
  inflating: ./data/zarr.yaml        
   creating: ./dataloader/
  inflating: ./dataloader/native_grid.yaml  
   creating: ./datamodule/
  inflating: ./datamodule/single.yaml  
   creating: ./diagnostics/
   creating: ./diagnostics/benchmark_profiler/
  inflating: ./diagnostics/benchmark_profiler/detailed.yaml  
  inflating: ./diagnostics/benchmark_profiler/simple.yaml  
   creating: ./diagnostics/callbacks/
  inflating: ./diagnostics/callbacks/placeholder.yaml  
  inflating: ./diagnostics/callbacks/pretraining.yaml  
  inflating: ./diagnostics/callbacks/rollout_eval.yaml  
  inflating: ./diagnostics/evaluation.yaml  
   creating: ./diagnostics/plot/
  inflating: ./diagnostics/plot/detailed.yaml  
  inflating: ./diagnostics/plot/none.yaml  
   creating: ./graph/
  inflating: ./graph/encoder_decoder_only.yaml  
  inflating: ./graph/multi_scale.yaml  
   creating: ./hardware/
  inflating: ./hardware/example.yaml  
   creating: ./hardware/files/
 

# 3) Model Training

Model training with Anemoi is performed using the *anemoi-training* module: https://anemoi.readthedocs.io/projects/training/en/latest/

The sample datasets in this notebook include data for the following timeframes at 3-hourly intervals (each with 16 timesteps):
- **Training**: 0z 1 Jan 1994 - 21z 2 Jan 1994
- **Validation**: 0z 3 Jan 1994 - 21z 4 Jan 1994
- **Testing**: 0z 5 Jan 1994 - 21z 6 Jan 1994

### 3.1) Environment Variables

Anemoi requires a "base seed" and a SLURM job ID.
- The base seed is used to initialize model weights. Changing the seed will result in different initial model parameters.
- The SLURM job ID is required, even if you are not on SLURM (just leave it as "0").

*Hydra* can be configured to output more complete tracebacks for debugging purposes.


In [None]:
import os

### Required ###
os.environ["ANEMOI_BASE_SEED"] = "42"
os.environ["SLURM_JOB_ID"] = "0"

### Optional ###
os.environ['HYDRA_FULL_ERROR'] = "1"  # for debugging

### 3.2) Train the Model (4 minutes, may vary if parameters are modified)

A config YAML file is referenced for configuring the model and training process.

Arguments for model architecture and training configuration can be overriden via the command line.
- e.g., *!anemoi-training train --config-name=model-config.yaml model.num_channels=32*
  - The *model.num_channels=32* argument overrides the number of channels in the model to 32 (provided config YAML sets this to 128)

The cell below will train a model for 3 epochs.

In [None]:
batch_size = 8  # batch size, will be applied to training, validation, and testing datasets
num_channels = 128  # number of channels in the model (must be evenly divisible by the number of heads, which is currently set to 16)
num_layers = 8  # number of layers in the model processor

!anemoi-training train --config-name=model-config.yaml dataloader.batch_size.training={batch_size} dataloader.batch_size.validation={batch_size} dataloader.batch_size.test={batch_size} \
model.num_channels={num_channels} model.processor.num_layers={num_layers}

2025-06-25 20:58:37 INFO Running anemoi training command with overrides: ['--config-name=model-config.yaml', 'dataloader.batch_size.training=8', 'dataloader.batch_size.validation=8', 'dataloader.batch_size.test=8', 'model.num_channels=128', 'model.processor.num_layers=8']
2025-06-25 20:58:42 INFO NumExpr defaulting to 12 threads.
2025-06-25 20:58:44 INFO Prepending current user directory (/content) to the search path.
2025-06-25 20:58:44 INFO Search path is now: [provider=anemoi-cwd-searchpath-plugin, path=/content, provider=hydra, path=pkg://hydra.conf, provider=main, path=pkg://anemoi.training/config]
[2025-06-25 20:58:45,496][anemoi.training.train.train][INFO] - Config validated.
[2025-06-25 20:58:45,497][anemoi.training.train.train][INFO] - Run id: efabcde5-8397-4521-9e21-a614c4dbffd7
[2025-06-25 20:58:45,497][anemoi.training.train.train][INFO] - Checkpoints path: p1/training-output/checkpoint/efabcde5-8397-4521-9e21-a614c4dbffd7
[2025-06-25 20:58:45,497][anemoi.training.train.trai

# 4) Inference

Model inference with Anemoi is performed with the *anemoi-inference* module: https://anemoi.readthedocs.io/projects/inference/en/latest/index.html#index-page

### 4.1) Retrieve Model Runs and Load Checkpoint
Each model run is saved in a folder with a random hash identifier.

In [None]:
model_runs = os.listdir('p1/training-output/checkpoint')
print('Available model runs:')
for run in model_runs:
    print(run + '\n')

Available model runs:
efabcde5-8397-4521-9e21-a614c4dbffd7



Select a model run from the list above and load the checkpoint.

In [None]:
model_run = 'efabcde5-8397-4521-9e21-a614c4dbffd7'  # model run hash identifier

## Do not change this ##
checkpoint = f'p1/training-output/checkpoint/{model_run}/inference-last.ckpt'

### 4.2) Configure and Run Model Inference (6 seconds)
Select an initialization time from the **testing dataset** and set a forecast lead time.
**NOTE:** Make sure that the valid time (i.e., time of the forecast) is **within the testing dataset**.

You can also create and call a config YAML file that contains the inference settings, however all settings can be easily passed through the command line.

In [None]:
init_time = '1994-01-05T21'  # initialization time [YYYY]-[MM]-[DD]T[HH]
lead_time = 18  # hours

## Do not change these ##
inference_dataset = 'p1/dataset/testing.zarr'
output_file = 'forecast.nc'  # output file containing the model forecast

!anemoi-inference run checkpoint={checkpoint} date={init_time} lead_time={lead_time} input.dataset={inference_dataset} output.netcdf={output_file}

                No post_processors defined. Accumulations will be accumulated from the beginning of the forecast.

                🚧🚧🚧 In a future release, the default will be to NOT accumulate from the beginning of the forecast. 🚧🚧🚧
                Update your config if you wish to keep accumulating from the beginning.
                https://github.com/ecmwf/anemoi-inference/issues/131
                
2025-06-25 21:06:05 INFO Pre processors: []
2025-06-25 21:06:05 INFO Accumulating fields []
2025-06-25 21:06:05 INFO Post processors: [Accumulate([])]
2025-06-25 21:06:05 INFO Using DefaultRunner runner, device=cuda
2025-06-25 21:06:05 INFO Input: DatasetInput(('p1/dataset/testing.zarr',), {})
2025-06-25 21:06:05 INFO Output: NetCDFOutput(forecast.nc)
2025-06-25 21:06:06 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_julian_day, 0, (73728,)
2025-06-25 21:06:06 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_local_time, 0, (73728,)
2025-06-25 21:06:06 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_longitude, 0, (73728,)
2025-06-25 21:06:06 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX

# **If you have reached the end of this notebook, congratulations! You have successfully trained an AI4NWP notebook using the Anemoi framework!**