# Training AI4NWP Models with the Anemoi Framework

##### This notebook steps through the process of training an AI-driven NWP model using the Anemoi framework.

##### For questions, please contact andrew.justin@noaa.gov.

# 1) Environment Setup

#### 1.1) Imports

In [1]:
import os

#### 1.2) CUDA
- the Anemoi framework utilizes the *flash-attn* package for transformer models. *Flash-attn* requires GPU(s) to be of the **Ampere architecture** or newer.
- *Flash-attn* requires the **CUDA_HOME** environment variable to point towards your CUDA installation.

In [2]:
CUDA_VERSION = 12.6  # CUDA version currently installed (x.x)

os.environ["CUDA_HOME"] = f'/usr/local/cuda-{CUDA_VERSION}'

#### 1.3) Install & Import Packages
- **NOTE**: *numpy* versions >=2.3 have compatibility issues with some *numba* distributions, so we will install an earlier version of *numpy* to avoid potential environment issues.

In [3]:
!pip install anemoi-datasets==0.5.23 anemoi-graphs==0.5.2 anemoi-models==0.5.0 anemoi-training==0.4.0 anemoi-inference flash-attn 'numpy<2.3'
!pip install 'earthkit-data<0.14.0'



#### 1.4) Other Environment Variables
- Anemoi requires a "base seed" and a SLURM job ID.
  - The base seed is used to initialize model weights. Changing the seed will result in different initial model parameters.
  - The SLURM job ID is required, even if you are not on SLURM (just leave it as "0").
- *Hydra* can be configured to output more complete tracebacks for debugging purposes.

In [4]:
### Required ###
os.environ["ANEMOI_BASE_SEED"] = "42"
os.environ["SLURM_JOB_ID"] = "0"

### Optional ###
os.environ['HYDRA_FULL_ERROR'] = "1"  # for debugging

## 2) Model Training

#### 2.1) Train the Model

In [10]:
!anemoi-training train --config-name=config.yaml

2025-06-23 16:14:30 INFO Running anemoi training command with overrides: ['--config-name=config.yaml']
2025-06-23 16:14:33 INFO Prepending current user directory (/mnt/c/users/andrew/pycharmprojects/anemoi-house) to the search path.
2025-06-23 16:14:33 INFO Search path is now: [provider=anemoi-cwd-searchpath-plugin, path=/mnt/c/users/andrew/pycharmprojects/anemoi-house, provider=hydra, path=pkg://hydra.conf, provider=main, path=pkg://anemoi.training/config]
[2025-06-23 16:14:34,119][anemoi.training.train.train][INFO] - Config validated.
[2025-06-23 16:14:34,119][anemoi.training.train.train][INFO] - Run id: c9d05d14-4198-4e88-8ad7-269ee17217f3
[2025-06-23 16:14:34,120][anemoi.training.train.train][INFO] - Checkpoints path: p1/training-output/checkpoint/c9d05d14-4198-4e88-8ad7-269ee17217f3
[2025-06-23 16:14:34,120][anemoi.training.train.train][INFO] - Plots path: p1/training-output/plots/c9d05d14-4198-4e88-8ad7-269ee17217f3
[2025-06-23 16:14:34,911][anemoi.graphs.nodes.builders.from_file

## 3) Inference

#### 3.1) Retrieve Model Runs and Load Checkpoint

Each model run is saved in a folder with a random hash identifier.

In [11]:
model_runs = os.listdir('p1/training-output/checkpoint')
print('Available model runs:')
for run in model_runs:
    print(run + '\n')

Available model runs:
c9d05d14-4198-4e88-8ad7-269ee17217f3



Select a model run from the list above and load the checkpoint.

In [12]:
model_run = 'c9d05d14-4198-4e88-8ad7-269ee17217f3'  # model run hash identifier

## Do not change this ##
checkpoint = f'p1/training-output/checkpoint/{model_run}/inference-last.ckpt'

## 3.2) Configure and Run Model Inference
- Select an initialization time from the **testing dataset** and set a forecast lead time.
- **NOTE**: Make sure that the valid time (i.e., time of the forecast) is **within the testing dataset**.

In [16]:
init_time = '1994-01-05T21'  # initialization time [YYYY]-[MM]-[DD]T[HH]
lead_time = 18  # hours

## Do not change these ##
inference_dataset = 'p1/dataset/testing.zarr'
output_file = 'forecast.nc'  # output file containing the model forecast

!anemoi-inference run checkpoint={checkpoint} date={init_time} lead_time={lead_time} input.dataset={inference_dataset} output.netcdf={output_file}

                No post_processors defined. Accumulations will be accumulated from the beginning of the forecast.

                🚧🚧🚧 In a future release, the default will be to NOT accumulate from the beginning of the forecast. 🚧🚧🚧
                Update your config if you wish to keep accumulating from the beginning.
                https://github.com/ecmwf/anemoi-inference/issues/131
                
2025-06-23 16:21:59 INFO Pre processors: []
2025-06-23 16:21:59 INFO Accumulating fields []
2025-06-23 16:21:59 INFO Post processors: [Accumulate([])]
2025-06-23 16:21:59 INFO Using DefaultRunner runner, device=cuda
2025-06-23 16:21:59 INFO Input: DatasetInput(('p1/dataset/testing.zarr',), {})
2025-06-23 16:22:00 INFO Output: NetCDFOutput(forecast.nc)
2025-06-23 16:22:00 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_julian_day, 0, (73728,)
2025-06-23 16:22:00 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_local_time, 0, (73728,)
2025-06-23 16:22:00 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX cos_longitude, 0, (73728,)
2025-06-23 16:22:00 INFO 🚧🚧🚧🚧🚧🚧 XXXXXX

## If you have reached the end of this notebook, congratulations! You have successfully trained an AI4NWP model using the Anemoi framework!