# Ice Station Zebra Pipeline Demo

This demonstration showcases the complete Ice Station Zebra ML pipeline capabilities through CLI commands. 

**Target Audience:** Developer teams and future team members who want to understand our design decisions, 
trade-offs, and flexible experimentation capabilities.

**You'll learn how to:**
- Run our training pipeline end-to-endin three lines of code
- Swap between different modelling paradigms
- Reproduce runs and inspect the outputs
- Evaluate the performance of the models in line with community standards on sea ice forecasting

## Demo Structure

**Section 1: End-to-End Training**
- Run a full zebra pipeline end2end using a minimal configuration & data
- Inspect training artifacts and see evaluation outputs

**Section 2: Model Flexibility**
- Switch between Encode-Process-Decode paradigm and standalone persistence model
- Explore Encoder module functionality (Multimodality)

**Section 3: Evaluation Framework**
- Evaluate and compare model performance using a pretrained model checkpoint
- Explore different plotting formats and metrics

**Section 4: Train it yourself - Advanced Example**
- use anemoi functionality to fetch and inspect standard datsets
- write your own config to train a model on a full dataset
- see our pipeline data checks and validation in action

# Section 1: End-to-End Training Pipeline

In this section, we'll demonstrate the complete training pipeline using a simple **Naive encoder model** trained on a few days of data.
For the purpose of this notebook we have created a minimal config file and uploaded some small subset of the data.
The dataset contains a few days of sea ice concentration data (OSISAF) and corresponding atmospheric data (ERA5).
We don't expect the model to do well, but it will give us a sense of the pipeline.

You can install the repo by running the following commands in your terminal:

```bash
git clone https://github.com/alan-turing-institute/ice-station-zebra
cd ice-station-zebra
pip install .
```

### Environment Verification

Let's verify that our zebra cli tools are available and working.

To run this notebook, you'll need a kernel (e.g. .venv or conda) with the ice_station_zebra repo and jupyter installed.

In [10]:
!zebra --help

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mzebra [OPTIONS] COMMAND [ARGS]...[0m[1m                                      [0m[1m [0m
[1m                                                                                [0m
 Entrypoint for zebra application commands                                      
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m            Install completion for the current shell.    [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m               Show completion for the current shell, to    [2m│[0m
[2m│[0m                                 copy it or customize the installation.       [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                [1;32m-h[0

## Download the dataset for running the model

This assumes you have a folder called `my_data/` in the root of the repo.

In [13]:
!zebra datasets create --config-name=demo_nb.yaml

Working on samp-sicsouth-osisaf-25k-2017-2019-24h-v1.
Inspecting dataset samp-sicsouth-osisaf-25k-2017-2019-24h-v1 at /Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/my_data/data/anemoi/samp-sicsouth-osisaf-25k-2017-2019-24h-v1.zarr.
📦 Path          : /Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/my_data/data/anemoi/samp-sicsouth-osisaf-25k-2017-2019-24h-v1.zarr
🔢 Format version: 0.30.0

📅 Start      : 2017-01-01 00:00
📅 End        : 2019-01-31 00:00
⏰ Frequency  : 1d
🚫 Missing    : 0
🌎 Resolution : None
🌎 Field shape: [432, 432]

📐 Shape      : 761 × 1 × 1 × 186,624 (541.8 MiB)
💽 Size       : 51.5 MiB (51.5 MiB)
📁 Files      : 811

   Index │ Variable │ Min │ Max │      Mean │    Stdev
   ──────┼──────────┼─────┼─────┼───────────┼─────────
       0 │ ice_conc │   0 │   1 │ 0.0715942 │ 0.237269
   ──────┴──────────┴─────┴─────┴───────────┴─────────

  2025-10-16 16:56:11.459960 : initialised
  2025-10-16 16:56:11.460823 : tmp_statistics_initialised (version=3)
 

In [None]:
!zebra datasets inspect --config-name=demo_nb.yaml

Working on samp-sicsouth-osisaf-25k-2017-2019-24h-v1.
Inspecting dataset samp-sicsouth-osisaf-25k-2017-2019-24h-v1 at /Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/my_data/data/anemoi/samp-sicsouth-osisaf-25k-2017-2019-24h-v1.zarr.
📦 Path          : /Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/my_data/data/anemoi/samp-sicsouth-osisaf-25k-2017-2019-24h-v1.zarr
🔢 Format version: 0.30.0

📅 Start      : 2017-01-01 00:00
📅 End        : 2019-01-31 00:00
⏰ Frequency  : 1d
🚫 Missing    : 0
🌎 Resolution : None
🌎 Field shape: [432, 432]

📐 Shape      : 761 × 1 × 1 × 186,624 (541.8 MiB)
💽 Size       : 51.5 MiB (51.5 MiB)
📁 Files      : 811

   Index │ Variable │ Min │ Max │      Mean │    Stdev
   ──────┼──────────┼─────┼─────┼───────────┼─────────
       0 │ ice_conc │   0 │   1 │ 0.0715942 │ 0.237269
   ──────┴──────────┴─────┴─────┴───────────┴─────────

  2025-10-16 16:56:11.459960 : initialised
  2025-10-16 16:56:11.460823 : tmp_statistics_initialised (version=3)
 

In [18]:
!zebra train --config-name=demo_nb.yaml

Found 2 dataset_groups.
[34m[1mwandb[0m: Currently logged in as: [33mifenton[0m ([33mturing-seaice[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: [38;5;178m⢿[0m Waiting for wandb.init()...
[34m[1mwandb[0m: [38;5;178m⣻[0m Waiting for wandb.init()...
[34m[1mwandb[0m: [38;5;178m⣽[0m setting up run kn1ult31 (0.2s)
[34m[1mwandb[0m: [38;5;178m⣾[0m setting up run kn1ult31 (0.2s)
[34m[1mwandb[0m: [38;5;178m⣷[0m setting up run kn1ult31 (0.2s)
[34m[1mwandb[0m: Tracking run with wandb version 0.22.2
[34m[1mwandb[0m: Run data is saved locally in [35m[1m../my_data/training/wandb/run-20251027_154250-kn1ult31[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33msmart-rain-67[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/turing-seaice/naive-unet-naive[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/turing-se

In [19]:
!zebra evaluate --config-name=demo_nb.yaml --checkpoint="../my_data/training/wandb/latest-run/checkpoints/epoch=9-step=1810.ckpt"

[31m╭─[0m[31m────────────────────[0m[31m [0m[1;31mTraceback [0m[1;2;31m(most recent call last)[0m[31m [0m[31m─────────────────────[0m[31m─╮[0m
[31m│[0m [2m/Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/ice_station_zebra[0m [31m│[0m
[31m│[0m [2m/cli/[0m[1mhydra.py[0m:41 in wrapper                                                  [31m│[0m
[31m│[0m                                                                              [31m│[0m
[31m│[0m   [2m38 [0m[2m│   [0m) -> RetType:                                                       [31m│[0m
[31m│[0m   [2m39 [0m[2m│   │   [0m[94mwith[0m initialize(config_path=[33m"[0m[33m../config[0m[33m"[0m, version_base=[94mNone[0m):    [31m│[0m
[31m│[0m   [2m40 [0m[2m│   │   │   [0mconfig = compose(config_name=config_name, overrides=overrid [31m│[0m
[31m│[0m [31m❱ [0m41 [2m│   │   [0m[94mreturn[0m [1;4mfunction(*args, config=config, **kwargs)[0m                 [31m│

# Section 2: Model Flexibility

In this section, we'll demonstrate how easy it is to switch between different model architectures.
We'll show the difference between standalone models and the encode-process-decode paradigm.

In [20]:
# TODO: Add model swapping demonstration
!zebra train --config-name=persistence.yaml ++base_path="../my_data"

Found 2 dataset_groups.
[34m[1mwandb[0m: Currently logged in as: [33mifenton[0m ([33mturing-seaice[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: [38;5;178m⢿[0m Waiting for wandb.init()...
[34m[1mwandb[0m: [38;5;178m⣻[0m Waiting for wandb.init()...
[34m[1mwandb[0m: [38;5;178m⣽[0m setting up run kq0bdeti (0.2s)
[34m[1mwandb[0m: [38;5;178m⣾[0m setting up run kq0bdeti (0.2s)
[34m[1mwandb[0m: [38;5;178m⣷[0m setting up run kq0bdeti (0.2s)
[34m[1mwandb[0m: Tracking run with wandb version 0.22.2
[34m[1mwandb[0m: Run data is saved locally in [35m[1m../my_data/training/wandb/run-20251027_155821-kq0bdeti[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mpolar-brook-56[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/turing-seaice/persistence[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/turing-seaice

In [21]:
!zebra evaluate --config-name=persistence.yaml ++base_path="../my_data" --checkpoint="../my_data/training/wandb/latest-run/checkpoints/epoch=1-step=0.ckpt"

/Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/.venv_demo/lib/python3.11/site-packages/lightning/pytorch/core/saving.py:94: The state dict in PosixPath('/Users/ifenton/Documents/Projects/SeaIce/ice-station-zebra/my_data/training/wandb/run-20251027_155821-kq0bdeti/checkpoints/epoch=1-step=0.ckpt') contains no parameters.
Found 2 dataset_groups.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Assigning 9 workers for data loading.
[34m[1mwandb[0m: Currently logged in as: [33mifenton[0m ([33mturing-seaice[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: [38;5;178m⢿[0m Waiting for wandb.init()...
[34m[1mwandb[0m: [38;5;178m⣻[0m Waiting f

# Section 3: Evaluation Framework

Here we'll dive deep into the evaluation capabilities, comparing different models
and exploring various plotting formats and metrics.
For more interesting visualisations we will load a pretrained model checkpoint.

!uv run zebra evaluate --config-name=demo --checkpoint PATH_TO_CHECKPOINT

In [7]:
# TODO: Add evaluation framework demonstration

# Section 4: Train it yourself - Advanced Example

This section shows how to use this pipeline on your own data. Our pipeline builds on Anemoi functionality to fetch and inspect standard datasets,
write your own config, and see our pipeline data checks and validation in action.

In [8]:
# Configuration Management with Hydra
# Following the README instructions, we'll create a local config file that inherits from base.yaml
# This demonstrates Zebra's config-driven approach and Hydra's inheritance system

# First, let's see what the default base path is configured to
!cat ice_station_zebra/config/base.yaml

cat: ice_station_zebra/config/base.yaml: No such file or directory


In [9]:
# Let's examine our local configuration file
# This file inherits from base.yaml and overrides the base_path for local development
# Following the README instructions for creating local configs

!cat ice_station_zebra/config/demo.yaml

cat: ice_station_zebra/config/demo.yaml: No such file or directory
