<a href="https://colab.research.google.com/github/ese-ada-lovelace-2024/acds-storm-prediction-barry/blob/main/Surprise_storms_description_and_submission_instructions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Surprise storms

On Thursday at 12pm we will ask you to evaluate your models on 10 new storms for each task, released then.

By Friday 12pm, you need to provide:

1. A Jupyter notebook presenting your results and justifying your predictions on these surprise storms.
2. A set of 10 numpy files (.npy) for each task, containing your 10 storm predictions. These files will be used to rank the teams in a competition for the best prediction for each task (using L1 (absolute) error).

This notebook explains:
1. how to access the surprise data.
2. what the format of the numpy prediction files should be.
> IMPORTANT: use the submission checker function below to check your numpy files are the correct submission format, before uploading! If your files do not pass, **your predictions won't be included in the final ranking!**
>
> Upload your final predictions to your team folder here: https://imperiallondon-my.sharepoint.com/:f:/g/personal/bm1417_ic_ac_uk/Et_sPeZqVCVAk9r7waf4p7kBsxCkmkT5gexrhFS45QphAw?e=fCKWui

# 1. How to access the surprise data

The data will be uploaded to huggingface on Thursday.

There is a .csv file and .h5 file *for each task*.

The .csv and .h5 files for each task have *exactly the same format and structure* as the training data (events.csv and train.h5), *except that some of the data is missing in the .h5 file (the data you must predict for each task)*.

Use the code below to download the surprise data.

In [None]:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task1.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task2.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task3.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events1.csv", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events2.csv", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events3.csv", repo_type="dataset", local_dir="data")


In [None]:
import pandas as pd
import h5py
import numpy as np

# example reading in the csv for surprise task 1
df = pd.read_csv("data/surprise_events1.csv", parse_dates=["start_utc"])
print(f"Number of unique events: {len(df.id.unique())}")
df.head()

Number of unique events: 10


Unnamed: 0,id,img_type,start_utc,llcrnrlat,llcrnrlon,urcrnrlat,urcrnrlon,proj,height_m,width_m
0,S844398,ir069,2019-07-29 19:49:00+00:00,33.568521,-99.143269,36.986895,-94.869356,+proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...,384000.0,384000.0
1,S844398,ir107,2019-07-29 19:49:00+00:00,33.568521,-99.143269,36.986895,-94.869356,+proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...,384000.0,384000.0
2,S844398,lght,NaT,33.568521,-99.143269,36.986895,-94.869356,+proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...,384000.0,384000.0
3,S844398,vil,2019-07-29 19:50:00+00:00,33.568521,-99.143269,36.986895,-94.869356,+proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...,384000.0,384000.0
4,S844398,vis,2019-07-29 19:49:00+00:00,33.568521,-99.143269,36.986895,-94.869356,+proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...,384000.0,384000.0


In [None]:
with h5py.File(f'data/surprise_task1.h5','r') as f:
    event = {img_type: f["S844398"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task1 has only 12 frames - you need to predict the 12 next "vil" frames.

with h5py.File(f'data/surprise_task2.h5','r') as f:
    event = {img_type: f["S852507"][img_type][:] for img_type in ['vis', 'ir069', 'ir107']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
print()
# note: surprise_task2 has the "vil" image missing - you need to predict it.

with h5py.File(f'data/surprise_task3.h5','r') as f:
    event = {img_type: f["S844280"][img_type][:] for img_type in ['vis', 'ir069', 'ir107', 'vil']}
for img_type in event:
    print(f"{img_type}: {event[img_type].shape} ({event[img_type].dtype})")
# note: surprise_task3 has the "lght" array missing - you need to predict it.

vis: (384, 384, 12) (int16)
ir069: (192, 192, 12) (int16)
ir107: (192, 192, 12) (int16)
vil: (384, 384, 12) (uint8)

vis: (384, 384, 36) (int16)
ir069: (192, 192, 36) (int16)
ir107: (192, 192, 36) (int16)

vis: (384, 384, 36) (int16)
ir069: (192, 192, 36) (int16)
ir107: (192, 192, 36) (int16)
vil: (384, 384, 36) (uint8)


# 2. What the format the numpy prediction files should be

You should upload 10 numpy arrays for each of the 4 tasks, containing your 10 storm predictions. Each array of each task should use the following filename and shape:

| Task      | Filename                     | Numpy Array Shape   | dtype |
|-----------|------------------------------|---------------------| |
| task 1a   | `<team-name>-task1a-vil-<storm-id>.npy` | (384, 384, 12)      | float32 |
| task 1b   | `<team-name>-task1b-vil-<storm-id>.npy` | (384, 384, 12)      | float32 |
| task 2    | `<team-name>-task2-vil-<storm-id>.npy`  | (384, 384, 36)      | float32 |
| task 3    | `<team-name>-task3-lght-<storm-id>.npy` | (N, 3)              | float32 |

> IMPORTANT: each `lght` array for task 3 should have 3 columns, where column 0 = time in seconds, column 1 = vil pixel x, column 2 = vil pixel y. The number of rows (lightning flashes), N, for each array can vary and is up to your model.


IMPORTANT: use the submission checker function below to check your numpy files are the correct submission format, before uploading! If not, **your predictions won't be included in the final ranking!**


## SUBMISSION CHECKER

In [None]:
#### CHANGE THIS ###################

team_name = "your-team-name"# CHANGE TO YOUR TEAM NAME
prediction_directory = "predictions/"# CHANGE TO WHERE YOUR PREDICTION FILES ARE
# you should have 4 * 10 = 40 submission .npy files!

####################################



# DO NOT CHANGE THIS FUNCTION, MAKE SURE IT PASSES BEFORE UPLOADING!
def submission_checker(team_name, prediction_directory):
  "Checks your submission files exist and are in the correct format"

  import numpy as np

  task1_ids = ['S844398', 'S851491', 'S851111', 'S837416', 'S849444',
               'S843931', 'S858827', 'S856118', 'S849552', 'S854791']
  task2_ids = ['S852507', 'S834438', 'S847775', 'S838836', 'S851858',
               'S851835', 'S849415', 'S847917', 'S855381', 'S843625']
  task3_ids = ['S844280', 'S849688', 'S852994', 'S843281', 'S839048',
               'S846513', 'S847595', 'S840965', 'S849871', 'S848806']

  for ids, shape, tag in zip(
      [task1_ids, task1_ids, task2_ids, task3_ids],
      ((384, 384, 12), (384, 384, 12), (384, 384, 36), None),
      ["task1a-vil", "task1b-vil", "task2-vil", "task3-lght"]):

    for id_ in ids:

      # 1. check file exists
      try:
        file = f"{prediction_directory.rstrip('/')}/{team_name}-{tag}-{id_}.npy"
        x = np.load(file)
      except:
        raise Exception(f"ERROR: unable to load submission file: {file}")

      # 2. check shape of array
      if shape is not None:
        assert x.shape == shape, f"ERROR: array has wrong shape: {file}"
      else:
        assert x.ndim == 2 and x.shape[1] == 3, f"ERROR: array has wrong shape: {file}"
        if x.shape[0] > 1e6:
          print(f"WARNING: seems like too many events for lightning prediction? - check your model: {file}")

      # 3. check dtype
      assert x.dtype == np.float32, f"ERROR: array has wrong dtype: {file}"

  print("Submission files passed - please now upload them \U0001F600")

submission_checker(team_name, prediction_directory)

Submission files passed - please now upload them 😀


In [None]:
# example files which pass above
import os
os.makedirs("predictions", exist_ok=True)

task1_ids = ['S844398', 'S851491', 'S851111', 'S837416', 'S849444',
              'S843931', 'S858827', 'S856118', 'S849552', 'S854791']
task2_ids = ['S852507', 'S834438', 'S847775', 'S838836', 'S851858',
              'S851835', 'S849415', 'S847917', 'S855381', 'S843625']
task3_ids = ['S844280', 'S849688', 'S852994', 'S843281', 'S839048',
              'S846513', 'S847595', 'S840965', 'S849871', 'S848806']

for id_ in task1_ids:
  np.save(f"predictions/your-team-name-task1a-vil-{id_}.npy", np.zeros((384, 384, 12), dtype=np.float32))
for id_ in task1_ids:
  np.save(f"predictions/your-team-name-task1b-vil-{id_}.npy", np.zeros((384, 384, 12), dtype=np.float32))
for id_ in task2_ids:
  np.save(f"predictions/your-team-name-task2-vil-{id_}.npy", np.zeros((384, 384, 36), dtype=np.float32))
for id_ in task3_ids:
  np.save(f"predictions/your-team-name-task3-lght-{id_}.npy", np.zeros((1000000, 3), dtype=np.float32))

!du -sh predictions

453M	predictions
