[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NOAA-EPIC/global-eagle/blob/feature/hello_world/examples/getting_started/colab_notebook_demo/pipeline_demo.ipynb)

# Welcome to the ufs2arco + anemoi + wxvx pipeline!

This notebook will guide you through this entire ML pipeline. Steps include:
1) `ufs2arco` to creating training and validation datasets
2) `anemoi-core` modules to train a graph-based model
3) `anemoi-inference` to create a forecast from a model checkpoint
4) `wxvx` to verify that forecast against GFS

If possible, use an A100 runtime instance to run this notebook. Otherwise, try to use a T4. A CPU instance will work if a GPU is not available, but it will be very slow.

### Step 1: Environment Setup
Runtime: 3 minutes

You will receive a popup after all packages are installed. Click "restart session" on the popup and continue on to the next step.

In [None]:
!pip install anemoi-datasets==0.5.25 anemoi-graphs==0.6.2 anemoi-models==0.8.1 anemoi-training==0.5.1 anemoi-inference==0.6.3 trimesh 'numpy<2.3' 'earthkit-data<0.14.0' ufs2arco

Clone repository:

In [None]:
!git clone -b feature/hello_world https://github.com/NOAA-EPIC/global-eagle.git

#TODO -- right before merging to main we need to update this to not load branch.

### Step 2: Create training and validation datasets with ufs2arco

Runtime: 3 minutes

`ufs2arco` is a python package that is designed to make NOAA forecast, reanalysis, and reforecast datasets more accessible for scientific analysis and machine learning model development. The name stems from its original intent, which was to transform output from the Unified Forecast System (UFS) into Analysis Ready, Cloud Optimized (ARCO; Abernathey et al., (2021)) format. However, the package now pulls data from a number of non-UFS sources, including GFS/GEFS before UFS was created, and even ECMWF's ERA5 dataset.

To learn more about ufs2arco, check out the documentation: https://ufs2arco.readthedocs.io/en/latest/index.html

While this cell is running, go into the `global-eagle/examples/getting_started/colab_notebook_demo/data` folder and look at `logs/logs.serial.out`. This will provide more insight into the dataset creation. 

In [None]:
!ufs2arco global-eagle/examples/getting_started/colab_notebook_demo/data/replay.yaml

After the dataset has completed, let's view it!

You will notice that this format looks different than a "typical" netcdf or zarr file. The gridded data is flattened to be 1D, and we have calculated various statistics that will be used during normalization during training.

In [None]:
import xarray as xr

ufs2arco_ds = xr.open_dataset("global-eagle/examples/getting_started/colab_notebook_demo/data/replay.zarr")
ufs2arco_ds

### Step 3: Train a model with anemoi-core modules

In [None]:
import os
os.environ["ANEMOI_BASE_SEED"] = "42"
os.environ["SLURM_JOB_ID"] = "0"

In [None]:
%cd global-eagle/examples/getting_started/colab_notebook_demo/train/

In [None]:
!anemoi-training train --config-name=config

### Step 4: Create a forecast with anemoi-inference

In [None]:
%cd /content/global-eagle/examples/getting_started/colab_notebook_demo/inference/

In [None]:
!anemoi-inference run inference_config.yaml

View inference

In [None]:
import xarray as xr
ds = xr.open_dataset("2022-01-03T00.nc")
ds

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fhr = 1
temp = ds['tmp2m'].isel(time=fhr).values
lat = ds['latitude'].values
lon = ds['longitude'].values

# Plotplt.figure(figsize=(10, 6))
plt.scatter(lon, lat, c=temp, s=10, cmap='coolwarm')
plt.colorbar(label='2m Temperature')
plt.title(f'2m Temperature at {ds["time"][fhr].values}')
plt.show()

Postprocess inference

In [None]:
!python postprocess.py

In [None]:
ds_post = xr.open_dataset("2022-01-03T00_postprocessed.nc")
ds_post

### Step 5: Verify the forecast against GFS with wxvx

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install() 

Go to your terminal and run the following commands:

`conda install -y -c ufs-community -c paul.madden wxvx`

`conda activate wxvx`

`cd global-eagle/examples/getting_started/colab_notebook_demo/verification/`

`wxvx -c wxvx_config.yaml -t plots`