# Surprise storms - Task 2

A surprise storm is coming 🤯🌪️! Let's see how well we can predict it!

## Mount on Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import packages

In [None]:
import os
import h5py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from skimage.transform import resize
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, TensorDataset
import matplotlib.pyplot as plt

In [None]:
# Load pacakged API
import sys
sys.path.append('/content/drive/MyDrive/ACDS-Barry/barry/')

import task_2

## Dowload the surprise storms data

In [None]:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_task2.h5", repo_type="dataset", local_dir="data")
hf_hub_download(repo_id="benmoseley/ese-dl-2024-25-group-project", filename="surprise_events2.csv", repo_type="dataset", local_dir="data")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


surprise_task2.h5:   0%|          | 0.00/114M [00:00<?, ?B/s]

surprise_events2.csv:   0%|          | 0.00/9.84k [00:00<?, ?B/s]

'data/surprise_events2.csv'

## Prepare the dataset

In [None]:
# Initialize the data preparer
data_preparer = task_2.Task2DataPreparer(
    file_path='data/surprise_task2.h5',
    test_size=0.2,
    val_size=0.1,
    sample_ratio=1,
    random_state=42,
    is_real_data=True,
)


# Prepare data set
X_data, norm_params = data_preparer.prepare_surprise_datasets()

dataset = data_preparer.WeatherDataset(X_data)

batch_size = 128
num_workers = 8

data_loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=num_workers,
    pin_memory=True
)

## Load the model and predict

In [None]:
# load the model
model = task_2.Task2UNet(input_channels=3, output_channels=1)
model.load_state_dict(torch.load('/content/drive/MyDrive/ACDS-Barry/supporting_materials/Task2/Unet_model.pth'))

  model.load_state_dict(torch.load('/content/drive/MyDrive/ACDS-Barry/supporting_materials/Task2/Unet_model.pth'))


<All keys matched successfully>

In [None]:
model.eval()  # Set the model to evaluation mode

# Example usage
# Assuming `data_loader` is your DataLoader object
task_2.Task2DataVisualization.plot_event(data_loader, model, output_gif=True, save_gif=True)

Output hidden; open in https://colab.research.google.com to view.

Each row represents a different storm event, with three input channels and a corresponding predicted output. The channels likely correspond to different meteorological variables, such as visible light, water vapor infrared, and cloud-top temperature, which serve as inputs for the model's prediction.

The predicted storm patterns (rightmost images in each row) show a reasonable correlation with the input storm structures. The model effectively captures the general storm morphology, particularly the regions with high storm intensity. The brightest areas in the predicted results correspond to regions with higher storm activity, suggesting that the model has learned some meaningful spatial correlations from the input features.

However, there are noticeable artifacts in the predictions, such as the presence of high-frequency noise and structured patterns that do not align perfectly with the input data. This could indicate issues such as overfitting, inadequate spatial feature learning, or sensitivity to specific input textures.

## Save prediction

In [None]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Assuming `storm_ids` is a list of storm IDs corresponding to the data in `data_loader`
csv_file_path = 'data/surprise_events2.csv'
storm_ids = pd.read_csv(csv_file_path)['id'].tolist()

storm_ids = data_preparer.load_event_ids()

# Initialize a dictionary to store predictions for each storm
predictions = {storm_id: [] for storm_id in storm_ids}

# Assuming `norm_params` is a dictionary containing normalization parameters
norm_params = {
    'vil': (0.0, 1.0)  # Replace with actual min and max values for VIL
}

# Process data in batch
with torch.no_grad():
    for i in range(10):
        for j in range(36):
            index = i * 36 + j
            channels = data_loader.dataset.__getitem__(index)
            channels = channels.unsqueeze(0).to(device)  # Add batch dimension and move to device
            with torch.no_grad():
                result = model(channels).squeeze(0).squeeze(0).cpu().numpy()  # Call the model to predict the result
                result = task_2.Task2DataVisualization.denormalize(result, norm_params['vil'])  # Denormalize the result
            storm_id = storm_ids[i]
            predictions[storm_id].append(result)

# Concatenate and save predictions for each storm
team_name = "barry"
for storm_id in storm_ids:
    # Check if predictions exist for this storm
    if len(predictions[storm_id]) == 0:
        print(f"Warning: No predictions found for storm {storm_id}. Skipping.")
        continue

    # Concatenate all frames for this storm
    storm_predictions = np.stack(predictions[storm_id], axis=0)  # Shape: (N, 192, 192)

    # Resize each frame to (384, 384)
    resized_predictions = np.array([task_2.Task2DataVisualization.resize_image(frame) for frame in storm_predictions])  # Shape: (N, 384, 384)

    # Ensure we have exactly 36 frames
    if resized_predictions.shape[0] < 36:
        print(f"Warning: Only {resized_predictions.shape[0]} frames found for storm {storm_id}. Padding with zeros.")
        padding = np.zeros((36 - resized_predictions.shape[0], 384, 384), dtype=np.float32)
        resized_predictions = np.concatenate([resized_predictions, padding], axis=0)
    elif resized_predictions.shape[0] > 36:
        print(f"Warning: {resized_predictions.shape[0]} frames found for storm {storm_id}. Truncating to 36 frames.")
        resized_predictions = resized_predictions[:36]

    # Transpose to (384, 384, 36)
    resized_predictions = resized_predictions.transpose(1, 2, 0)

    # Save as .npy file
    project_path = "/content/drive/MyDrive/ACDS-Barry"
    filename = project_path + f"/predictions/{team_name}-task2-vil-{storm_id}.npy"
    np.save(filename, resized_predictions.astype(np.float32))
    print(f"Saved predictions for {storm_id} to {filename}")

Saved predictions for S834438 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S834438.npy
Saved predictions for S838836 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S838836.npy
Saved predictions for S843625 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S843625.npy
Saved predictions for S847775 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S847775.npy
Saved predictions for S847917 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S847917.npy
Saved predictions for S849415 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S849415.npy
Saved predictions for S851835 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S851835.npy
Saved predictions for S851858 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S851858.npy
Saved predictions for S852507 to /content/drive/MyDrive/ACDS-Barry/predictions/barry-task2-vil-S852507.npy
Saved predictions for S855381 to /con