# Surprise storms

On Thursday at 12pm we will ask you to evaluate your models on 10 new storms for each task, released then.

By Friday 12pm, you need to provide:

1. A Jupyter notebook presenting your results and justifying your predictions on these surprise storms.
2. A set of 10 numpy files (.npy) for each task, containing your 10 storm predictions. These files will be used to rank the teams in a competition for the best prediction for each task (using L1 (absolute) error).

This notebook explains:
1. how to access the surprise data.
2. what the format of the numpy prediction files should be.
> IMPORTANT: use the submission checker function below to check your numpy files are the correct submission format, before uploading! If your files do not pass, **your predictions won't be included in the final ranking!**
>
> Upload your final predictions to your team folder here: https://imperiallondon-my.sharepoint.com/:f:/g/personal/bm1417_ic_ac_uk/Et_sPeZqVCVAk9r7waf4p7kBsxCkmkT5gexrhFS45QphAw?e=fCKWui

# 1. How to access the surprise data

The data will be uploaded to huggingface on Thursday.

There is a .csv file and .h5 file *for each task*.

The .csv and .h5 files for each task have *exactly the same format and structure* as the training data (events.csv and train.h5), *except that some of the data is missing in the .h5 file (the data you must predict for each task)*.

Use the code below to download the surprise data.

In [2]:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task1.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task2.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task3.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events1.csv", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events2.csv", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events3.csv", repo_type="dataset", local_dir="data")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


surprise_task1.h5:   0%|          | 0.00/43.1M [00:00<?, ?B/s]

surprise_task2.h5:   0%|          | 0.00/114M [00:00<?, ?B/s]

surprise_task3.h5:   0%|          | 0.00/135M [00:00<?, ?B/s]

surprise_events1.csv:   0%|          | 0.00/9.85k [00:00<?, ?B/s]

surprise_events2.csv:   0%|          | 0.00/9.84k [00:00<?, ?B/s]

surprise_events3.csv:   0%|          | 0.00/9.85k [00:00<?, ?B/s]

'data/surprise_events3.csv'

In [16]:
import pandas as pd
import h5py
import numpy as np

# example reading in the csv for surprise task 1
df = pd.read_csv("data/surprise_events1.csv", parse_dates=["start_utc"])
print(f"Number of unique events: {len(df.id.unique())}")
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'data/surprise_events1.csv'

In [3]:
with h5py.File(f'data/surprise_task1.h5','r') as f:
    event = {img_type: f["S844398"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task1 has only 12 frames - you need to predict the 12 next "vil" frames.

with h5py.File(f'data/surprise_task2.h5','r') as f:
    event = {img_type: f["S852507"][img_type][:] for img_type in ['vis', 'ir069', 'ir107']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task2 has the "vil" image missing - you need to predict it.

with h5py.File(f'data/surprise_task3.h5','r') as f:
    event = {img_type: f["S844280"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
# note: surprise_task3 has the "lght" array missing - you need to predict it.

vis: (384, 384, 12) (int16)
ir069: (192, 192, 12) (int16)
ir107: (192, 192, 12) (int16)
vil: (384, 384, 12) (uint8)

vis: (384, 384, 36) (int16)
ir069: (192, 192, 36) (int16)
ir107: (192, 192, 36) (int16)

vis: (384, 384, 36) (int16)
ir069: (192, 192, 36) (int16)
ir107: (192, 192, 36) (int16)
vil: (384, 384, 36) (uint8)


# 2. What the format the numpy prediction files should be

You should upload 10 numpy arrays for each of the 4 tasks, containing your 10 storm predictions. Each array of each task should use the following filename and shape:

| Task      | Filename                     | Numpy Array Shape   | dtype |
|-----------|------------------------------|---------------------| |
| task 1a   | `<team-name>-task1a-vil-<storm-id>.npy` | (384, 384, 12)      | float32 |
| task 1b   | `<team-name>-task1b-vil-<storm-id>.npy` | (384, 384, 12)      | float32 |
| task 2    | `<team-name>-task2-vil-<storm-id>.npy`  | (384, 384, 36)      | float32 |
| task 3    | `<team-name>-task3-lght-<storm-id>.npy` | (N, 3)              | float32 |

> IMPORTANT: each `lght` array for task 3 should have 3 columns, where column 0 = time in seconds, column 1 = vil pixel x, column 2 = vil pixel y. The number of rows (lightning flashes), N, for each array can vary and is up to your model.


IMPORTANT: use the submission checker function below to check your numpy files are the correct submission format, before uploading! If not, **your predictions won't be included in the final ranking!**


In [3]:
!git clone https://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:x-oauth-basic@github.com//ese-ada-lovelace-2024/acds-storm-prediction-claudette
%cd /content/acds-storm-prediction-claudette
!git checkout main

Cloning into 'acds-storm-prediction-claudette'...
remote: Enumerating objects: 830, done.[K
remote: Counting objects: 100% (110/110), done.[K
remote: Compressing objects: 100% (76/76), done.[K
remote: Total 830 (delta 59), reused 73 (delta 32), pack-reused 720 (from 1)[K
Receiving objects: 100% (830/830), 20.62 MiB | 50.75 MiB/s, done.
Resolving deltas: 100% (536/536), done.
/content/acds-storm-prediction-claudette
Already on 'main'
Your branch is up to date with 'origin/main'.


In [4]:
%cd /content/acds-storm-prediction-claudette
!pip install -r /content/acds-storm-prediction-claudette/requirments.txt

/content/acds-storm-prediction-claudette
Collecting torchmetrics (from -r /content/acds-storm-prediction-claudette/requirments.txt (line 5))
  Downloading torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting pytorch-lightning (from -r /content/acds-storm-prediction-claudette/requirments.txt (line 6))
  Downloading pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->-r /content/acds-storm-prediction-claudette/requirments.txt (line 3))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->-r /content/acds-storm-prediction-claudette/requirments.txt (line 3))
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->-r /content/acds-storm-prediction-claudette/requirments.txt (line 3))
  Downloading nvidia_cuda_cu

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [19]:
%cd /content/acds-storm-prediction-claudette
!git checkout main
!git pull
%ls
!python -m scripts.test_tas1a configs/task1a.yaml

/content/acds-storm-prediction-claudette
Already on 'main'
Your branch is up to date with 'origin/main'.
remote: Enumerating objects: 4, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 (from 0)[K
Unpacking objects: 100% (4/4), 344 bytes | 344.00 KiB/s, done.
From https://github.com//ese-ada-lovelace-2024/acds-storm-prediction-claudette
   b887305..3085811  task3      -> origin/task3
Already up to date.
[0m[01;34macds-storm-prediction-claudette[0m/  [01;34mdata[0m/    [01;34mnotebooks[0m/  README.md        [01;34mscripts[0m/
[01;34mconfigs[0m/                          [01;34mmodels[0m/  [01;34moutputs[0m/    requirments.txt  [01;34mutils[0m/
Running inference on 10 storms.
First storm ID: S844398
Processing storm_id: S844398, frames: 12
Processing storm_id: S851491, frames: 12
Processing storm_id: S851111, frames: 12
Processing storm_id: S837416, frames: 12
Processing storm_id: S849444, frames: 12
Pro

In [20]:
test = np.load("/content/acds-storm-prediction-claudette/outputs/task1a/predictions/S837416_pred.npy")
test.shape

(384, 384, 12)

In [10]:
%cd /content/acds-storm-prediction-claudette
!git pull
%ls
!python -m scripts.test_task1b configs/task1b.yaml

/content/acds-storm-prediction-claudette
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (1/1), done.[K
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 (from 0)[K
Unpacking objects: 100% (4/4), 379 bytes | 379.00 KiB/s, done.
From https://github.com//ese-ada-lovelace-2024/acds-storm-prediction-claudette
   677d975..9c529a8  main       -> origin/main
Updating 677d975..9c529a8
Fast-forward
 utils/data_loader_1b_1.py | 2 [32m++[m
 1 file changed, 2 insertions(+)
[0m[01;34mconfigs[0m/  [01;34mdata[0m/  [01;34mmodels[0m/  [01;34mnotebooks[0m/  [01;34moutputs[0m/  README.md  requirments.txt  [01;34mscripts[0m/  [01;34mutils[0m/
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
(4, 11, 384, 384)
[INFO] Saved VIL visualization at: visualizations/train/batch_0_sample_0_vil.png
[IN

In [22]:
%cd /content/acds-storm-prediction-claudette
!git pull
%ls
!python -m scripts.test_task2 configs/task2.yaml

/content/acds-storm-prediction-claudette
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (1/1), done.[K
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 (from 0)[K
Unpacking objects: 100% (4/4), 376 bytes | 376.00 KiB/s, done.
From https://github.com//ese-ada-lovelace-2024/acds-storm-prediction-claudette
   1c3b2e1..515ccfd  main       -> origin/main
Updating 1c3b2e1..515ccfd
Fast-forward
 utils/preprocess_production.py | 2 [32m+[m[31m-[m
 1 file changed, 1 insertion(+), 1 deletion(-)
[0m[01;34mconfigs[0m/  [01;34mmodels[0m/     [01;34moutputs[0m/   requirments.txt  [01;34mutils[0m/
[01;34mdata[0m/     [01;34mnotebooks[0m/  README.md  [01;34mscripts[0m/         [01;34mvisualizations[0m/
Testing on 10 storms
  model.load_state_dict(torch.load(ckpt_path, map_location=torch.device('cpu')))
100% 5/5 [00:46<00:00,  9.28s/it]

Test Results:
MSE: 41.4650
MAE: 5.9077


In [23]:
ls = np.load("/content/acds-storm-prediction-claudette/outputs/task4/predictions/S834438_pred222.npy")
ls.shape

(384, 384, 36)

For task 3, we did this in the Task3 notebook given the size of the model. We provide below a summary workflow:

In [None]:
# example reading in the csv for surprise task 1
df = pd.read_csv("data/surprise_events3.csv", parse_dates=["start_utc"])
print(f"Number of unique events: {len(df.id.unique())}")
df.head()

In [None]:
with h5py.File(f'data/surprise_task1.h5','r') as f:
    event = {img_type: f["S844398"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task1 has only 12 frames - you need to predict the 12 next "vil" frames.

with h5py.File(f'data/surprise_task2.h5','r') as f:
    event = {img_type: f["S852507"][img_type][:] for img_type in ['vis', 'ir069', 'ir107']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task2 has the "vil" image missing - you need to predict it.

with h5py.File(f'data/surprise_task3.h5','r') as f:
    event = {img_type: f["S844280"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
# note: surprise_task3 has the "lght" array missing - you need to predict it.

In [None]:
def getting_test_data_id(df):
  #Ensure `start_utc` is in datetime format
  df['start_utc'] = pd.to_datetime(df['start_utc'])

  df = df.sort_values(by='start_utc')

  lght_data = df[df['start_utc'].isna()]  # Rows with missing `start_utc`
  image_data = df[df['start_utc'].notna()]  # Rows with valid `start_utc`

  test_data_id = image_data.iloc[:]['id'].unique()  # Earlier 80% for training

  return test_data_id

test_data_id = getting_test_data_id(df)

In [None]:
path = '/content/data'
event_test = load_multiple_events(10, train_val='test', filename='surprise_task3.h5', test_set=True, test_num='task3')

Please see Task3 Notebook for more details.

In [None]:
scales_test, standard_test = stacking(event_test)
test_preds = predicting_distributions(standard_test)
event_preds = flashes_per_second(test_preds)
distance_images = scales_test
mvn_preds = copy.deepcopy(event_preds)
mvn_predictions(mvn_preds, distance_images)

In [None]:
def chamfer_loss(array1, array2):
    """
    Compute the bi-directional Chamfer loss between two point clouds (array1 and array2).

    Parameters:
        array1 (torch.Tensor): Tensor of shape [n1, 3], the first set of 3D vectors.
        array2 (torch.Tensor): Tensor of shape [n2, 3], the second set of 3D vectors.

    Returns:
        torch.Tensor: The bi-directional Chamfer loss (a scalar).
    """
    # Compute pairwise distances
    diff_1_to_2 = torch.cdist(array1, array2, p=2)  # Shape: [n1, n2]
    diff_2_to_1 = torch.cdist(array2, array1, p=2)  # Shape: [n2, n1]

    # Compute the forward Chamfer distance
    forward_loss = torch.mean(torch.min(diff_1_to_2, dim=1).values)  # Min over array2, mean over array1

    # Compute the backward Chamfer distance
    backward_loss = torch.mean(torch.min(diff_2_to_1, dim=1).values)  # Min over array1, mean over array2

    # Total Chamfer loss (bi-directional)
    total_loss = forward_loss + backward_loss

    return total_loss

## SUBMISSION CHECKER

In [None]:
#### CHANGE THIS ###################

team_name = "your-team-name"# CHANGE TO YOUR TEAM NAME
prediction_directory = "predictions/"# CHANGE TO WHERE YOUR PREDICTION FILES ARE
# you should have 4 * 10 = 40 submission .npy files!

####################################



# DO NOT CHANGE THIS FUNCTION, MAKE SURE IT PASSES BEFORE UPLOADING!
def submission_checker(team_name, prediction_directory):
  "Checks your submission files exist and are in the correct format"

  import numpy as np

  task1_ids = ['S844398', 'S851491', 'S851111', 'S837416', 'S849444',
               'S843931', 'S858827', 'S856118', 'S849552', 'S854791']
  task2_ids = ['S852507', 'S834438', 'S847775', 'S838836', 'S851858',
               'S851835', 'S849415', 'S847917', 'S855381', 'S843625']
  task3_ids = ['S844280', 'S849688', 'S852994', 'S843281', 'S839048',
               'S846513', 'S847595', 'S840965', 'S849871', 'S848806']

  for ids, shape, tag in zip(
      [task1_ids, task1_ids, task2_ids, task3_ids],
      ((384, 384, 12), (384, 384, 12), (384, 384, 36), None),
      ["task1a-vil", "task1b-vil", "task2-vil", "task3-lght"]):

    for id_ in ids:

      # 1. check file exists
      try:
        file = f"{prediction_directory.rstrip('/')}/{team_name}-{tag}-{id_}.npy"
        x = np.load(file)
      except:
        raise Exception(f"ERROR: unable to load submission file: {file}")

      # 2. check shape of array
      if shape is not None:
        assert x.shape == shape, f"ERROR: array has wrong shape: {file}"
      else:
        assert x.ndim == 2 and x.shape[1] == 3, f"ERROR: array has wrong shape: {file}"
        if x.shape[0] > 1e6:
          print(f"WARNING: seems like too many events for lightning prediction? - check your model: {file}")

      # 3. check dtype
      assert x.dtype == np.float32, f"ERROR: array has wrong dtype: {file}"

  print("Submission files passed - please now upload them \U0001F600")

submission_checker(team_name, prediction_directory)

Submission files passed - please now upload them 😀


In [None]:
# example files which pass above
import os
os.makedirs("predictions", exist_ok=True)

task1_ids = ['S844398', 'S851491', 'S851111', 'S837416', 'S849444',
              'S843931', 'S858827', 'S856118', 'S849552', 'S854791']
task2_ids = ['S852507', 'S834438', 'S847775', 'S838836', 'S851858',
              'S851835', 'S849415', 'S847917', 'S855381', 'S843625']
task3_ids = ['S844280', 'S849688', 'S852994', 'S843281', 'S839048',
              'S846513', 'S847595', 'S840965', 'S849871', 'S848806']

for id_ in task1_ids:
  np.save(f"predictions/your-team-name-task1a-vil-{id_}.npy", np.zeros((384, 384, 12), dtype=np.float32))
for id_ in task1_ids:
  np.save(f"predictions/your-team-name-task1b-vil-{id_}.npy", np.zeros((384, 384, 12), dtype=np.float32))
for id_ in task2_ids:
  np.save(f"predictions/your-team-name-task2-vil-{id_}.npy", np.zeros((384, 384, 36), dtype=np.float32))
for id_ in task3_ids:
  np.save(f"predictions/your-team-name-task3-lght-{id_}.npy", np.zeros((1000000, 3), dtype=np.float32))

!du -sh predictions

453M	predictions
