# 0. Directory tree

In [21]:
.
├── create_datasets.py
├── create_hrrr_uid_grid_mapping.py
├── data
│   ├── interim
│   ├── processed
│   └── raw
├── download_hrrr_forecasts.py
├── download_satellite_data.py
├── environment.yml
├── main.ipynb
├── test.py
└── train.py


# 1. Download datasets

## Option 1: Download preprocessed data

In [6]:
!gdown -O satellite_data.zip 1pIi1ypZ0r1lfqKkyc_2BOl4va7LJYPuD 
!gdown -O meta.zip 13DSr0C9gC9cjUze-MbUsWpyNsWjOkhwb 
!gdown -O hrrr.zip 1-170AoILkG-N9Vism_F4dim6iG7Z9dpM 
!gdown -O processed.zip 13dWxOusuDyTIzVDfnZ8KY4ReDFnjF3Oh
!gdown -O model.zip 1nAGnprRQcT9gtNTxYLui5Z1U1oUNS6aS

!mkdir -p data
!unzip -qqnd data/raw meta
!unzip -qqnd data/interim satellite_data
!unzip -qqnd data/interim hrrr
!unzip -qqnd data/processed processed
!unzip -qqn model 

Downloading...
From: https://drive.google.com/uc?id=1pIi1ypZ0r1lfqKkyc_2BOl4va7LJYPuD
To: /home/karel/Desktop/bloom/src/satellite_data.zip
100%|██████████████████████████████████████| 1.15G/1.15G [08:40<00:00, 2.21MB/s]
Downloading...
From: https://drive.google.com/uc?id=13DSr0C9gC9cjUze-MbUsWpyNsWjOkhwb
To: /home/karel/Desktop/bloom/src/meta.zip
100%|█████████████████████████████████████████| 533k/533k [00:00<00:00, 866kB/s]
Downloading...
From: https://drive.google.com/uc?id=1-170AoILkG-N9Vism_F4dim6iG7Z9dpM
To: /home/karel/Desktop/bloom/src/hrrr.zip
100%|██████████████████████████████████████| 55.2M/55.2M [00:23<00:00, 2.38MB/s]
Downloading...
From: https://drive.google.com/uc?id=13dWxOusuDyTIzVDfnZ8KY4ReDFnjF3Oh
To: /home/karel/Desktop/bloom/src/processed.zip
100%|██████████████████████████████████████| 5.89M/5.89M [00:02<00:00, 2.18MB/s]
Downloading...
From: https://drive.google.com/uc?id=1nAGnprRQcT9gtNTxYLui5Z1U1oUNS6aS
To: /home/karel/Desktop/bloom/src/model.zip
100%|██████████

## Option 2: Download data from multiple sources and preprocess

> _DrivenData note: The download scripts below use [PQDM](https://github.com/niedakh/pqdm) to parallelize downloads. If you find that your download runs crash, you may need to modify them to reduce the number of jobs, or switch from multiprocessing to multithreading._

In [None]:
## Download satellite data
# lsat=landsat
# snel=sentinel
!python download_satellite_data.py train lsat
!python download_satellite_data.py train snel
!python download_satellite_data.py test lsat
!python download_satellite_data.py test snel

In [None]:
#create a reference file to mapping sample locations to HRRR grids 
!python create_hrrr_uid_grid_mapping.py

In [None]:
#Download temperature and specific humidity HRRR forecasts for locations and dates in the train and test metadata
!python download_hrrr_forecasts.py 'TMP' '2 m above ground' 
!python download_hrrr_forecasts.py 'SPFH' '2 m above ground' 

In [None]:
#merge data to create the final train and test datasets
!python create_datasets.py

426880
707340
425092
702528
24791 16867
41400 10855
lsat:  (16867, 28) (40714, 28)
13405 6430
21224 4803
snel:  (6430, 27) (20978, 27)


# 2. Model

In [17]:
#Train model. Models are saved in the model directory
!python train.py 

16880
180
88
67
100%|█████████████████████████████████████| 2180/2180 [00:00<00:00, 3041.64it/s]
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: midwest 0, best score: 0.816496580927726
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: midwest 1, best score: 0.7488308644489767
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: midwest 2, best score: 0.7588831362323394
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: midwest 3, best score: 0.8020853182721962
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: midwest 4, best score: 0.8357108940373449
100%|█████████████████████████████████████| 1142/1142 [00:00<00:00, 2976.60it/s]
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: northeast 0, best score: 0.9706434774573919
You can set `force_col_wise=true` to remove the overhead.
f

fininshed training region: south 2, best score: 0.7917484901417817
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: south 3, best score: 0.8166342120422185
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: south 4, best score: 0.777423945323351
100%|█████████████████████████████████████| 9872/9872 [00:03<00:00, 2887.69it/s]
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: west 0, best score: 0.5033027882027534
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: west 1, best score: 0.5021367618718294
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: west 2, best score: 0.5285806213578249
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: west 3, best score: 0.5481163192796085
You can set `force_col_wise=true` to remove the overhead.
fininshed training region: west 4, best score: 0.

In [18]:
#Run inference on the test data. Writes predictions to solution.csv
!python test.py
!head solution.csv

(6510, 13) (6430, 27) (20978, 27)
6433
77
88
67
100%|█████████████████████████████████████| 1565/1565 [00:00<00:00, 3070.72it/s]
100%|█████████████████████████████████████| 1042/1042 [00:00<00:00, 3141.35it/s]
100%|█████████████████████████████████████| 1507/1507 [00:00<00:00, 3136.75it/s]
100%|█████████████████████████████████████| 2316/2316 [00:00<00:00, 3157.13it/s]
100%|█████████████████████████████████████| 5592/5592 [00:01<00:00, 2930.13it/s]
100%|█████████████████████████████████████| 3717/3717 [00:01<00:00, 2929.90it/s]
100%|█████████████████████████████████████| 5038/5038 [00:01<00:00, 2951.14it/s]
100%|█████████████████████████████████████| 6631/6631 [00:02<00:00, 2959.40it/s]
filling 77 test samples w/o data with region average
  test = test.append(test_null)
[4 2 3 1]
uid,region,severity
aabn,west,4
aair,west,4
aajw,northeast,2
aalr,midwest,3
aalw,west,4
aamp,west,2
aapj,west,4
aaqf,northeast,2
aauy,south,1
