### Deep Learning for Computer Vision  
### Multi-Task Regression with the Digital Typhoon Dataset

This notebook demonstrates a **supervised multi-task regression** workflow for remote sensing using **TorchGeo** using the Digital Typhoon dataset, which consists of infrared (IR) satellite imagery of tropical cyclones paired with meteorological measurements.

The objective is to predict multiple continuous typhoon intensity variables from satellite imagery using a deep learning model.  

#### Dataset Overview
The [Digital Typhoon](https://torchgeo.readthedocs.io/en/stable/api/datasets.html#digital-typhoon) is derived from hourly infrared channel observations captured by multiple generations of the Himawari meteorological satellites, spanning the period from 1978 to the present. The satellite measurements have been converted to brightness temperatures and normalized across different sensors, resulting in a consistent spatio-temporal dataset covering more than four decades.  

**Dataset features:**
- Infrared (IR) satellite imagery of 512 × 512 pixels at ~5km resolution 
- Auxiliary metadata including wind speed, pressure and additional typhoon-related attributes  
- 1,099 typhoons and 189,364 images

**References**  
Digital Typhoon Dataset: *A Large-Scale Benchmark for Tropical Cyclone Analysis*      [arXiv:2411.16421](https://arxiv.org/pdf/2411.16421) ; [arXiv:2311.02665](https://arxiv.org/pdf/2311.02665)

In [1]:
## import libraries
import os
import shutil
import pandas as pd
import torch

from torch.utils.data import DataLoader
from torchgeo.datasets import DigitalTyphoon


In [2]:
# load dataset
root = "/home/ogallo/DL4CV/DigitalTyphoon"

dataset = DigitalTyphoon(
    root=root,
    features=["wind", "pressure"],
    targets=["wind", "pressure"],
    sequence_length=1,
    download=False
)


### Subset the dataset
This is based on the typhoon grade, number of typhoons and lifecycle??

In [3]:
# sample the data
target_typhoons = 180
min_images_per_typhoon = 40
root = "/home/ogallo/DL4CV/DigitalTyphoon/WP"
output_dir = "/home/ogallo/DL4CV/WP_sampled"

%run sample_v2.py

Loading data...
Full dataset: 1099 typhoons | 189,364 images
Average images per typhoon: 172.3

Sampling 180 typhoons (year-stratified)...
→ Sampled 180 unique typhoons
Collecting all images for selected typhoons...
Filtering typhoons with too few images...
After filtering ≥ 50 images:
→ Kept 170 typhoons | 30,606 images
→ Average: 180.0 images/typhoon

Saving auxiliary CSV...
Copying images (this may take a while)...
→ Copied 30,606 images | Not found: 0
Handling metadata...

Done! Subset created successfully.
Output directory: /home/ogallo/DL4CV/WP_sampled


In [None]:
# zip thr sampled subset for upload
zip -r -q WP_sampled.zip WP_sampled

In [4]:
dataset = DigitalTyphoon(
    root="/home/ogallo/DL4CV/WP_sampled",
    features=["wind", "pressure"],
    targets=["wind", "pressure"],
    sequence_length=1,
    download=False
)