In [8]:
%load_ext autoreload
%autoreload 2



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Temporal Out-of-Sample Validation Tutorial

In this tutorial, we'll demonstrate how to perform temporal out-of-sample validation using the full CAMELS-Australia dataset instead of sample data. We'll leverage the built-in tools from the `hydroml` package to automatically handle data loading and execute the training and evaluation pipeline.

The workflow will:
1. Load the CAMELS-Australia dataset
2. Set up temporal splits for calibration and validation
3. Train and fine-tune a model
4. Evaluate model performance

Let's get started!


In [9]:
from hydroml.config.config import Config
from hydroml.workflow.evaluation import train_finetune_evaluate

# Set up the path mapping

As this training is going to load the dataset automativally from camels australia, we need to set up the path mapping first. The method to set up the path mapping is the same as the one in the 03_build_a_path_mapping.ipynb tutorial.


In [10]:
# We assume that the path mapping is already set up with the platform name 'win_2'.
platform='win_2'

We also need a list of basins for calibration and validation. These catchments should be already available in the camels australia postprocessed dataset and awara postprocessed dataset. So the pipeline will automatically load the data.

In [11]:
basins_file = '../sample_data/basins.txt'
with open(basins_file, 'r') as f:
    catchment_ids = f.read().splitlines()

Using these information we can set up the config object.

For a full explanation of the config object please refer to the readme documentation.

In [12]:
# we only use 2 catchments for calibration and validation for this tutorial.
config = Config(cal={'periods' : [['1991-01-01', '2014-01-01']], 'catchment_ids':catchment_ids[:2] },  
                val={'periods' : [['1985-01-01', '1990-01-01']], 'catchment_ids':catchment_ids[:2] }, 
                name = 'evaluation_tutorial',
                lstm_hidden_size=4, # we are selecting a small lstm size for this tutorial.
                device='cpu',
                platform=platform, # to introduce the paths stored in config/path_mapping/win.yaml
                max_epochs=2, # and we reduce the number of epochs to 2 for this tutorial.
                )    

# we  need to call this function to set up the version name - or you can manually set it up.
config.set_new_version_name()


Now we can run the **training and evaluation pipeline**. This pipeline works as follows:

1. Trains a **continental model** and evaluates it on the validation set.  
2. Starts the **fine-tuning process**, where the continental model is fine-tuned for each catchment and then evaluated on that specific catchment.

The results will be stored in the following structure:

| **File/Folder**                          | **Description**                                                      | **Path**                                                          |
|------------------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------|
| **Model Weights**                        | Weights of the trained continental model.                           | `root_path/VERSION_NAME/last.ckpt`                                |
| **Config File**                          | Configuration settings for the pipeline.                            | `root_path/VERSION_NAME/config.yaml`                              |
| **Transformer Parameters**               | Parameters for the transformer model.                               | `root_path/VERSION_NAME/params.yaml`                              |
| **Simulation Results (Continental Model)** | Simulation output for the continental model.                        | `root_path/VERSION_NAME/simulation.nc`                            |
| **Metrics**                              | Evaluation metrics for the continental model.                       | `root_path/VERSION_NAME/metrics.nc`                               |
| **Fine-tuned Models**                    | Fine-tuned models for each catchment.                               | `root_path/VERSION_NAME/finetune_all/catchment_id/VERSION_NAME/last.ckpt` |

- Note 1: that the finetuned models are stored with a same structure as the continental model.
- Note 2: the finetuned models do not have a params.yaml file as they are config to use the transformer parameter in the continental model.


In [13]:
train_finetune_evaluate(config)

                                                                        

Transforming data: calculating transform parameters and saving to \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml




valid data points per catchment {0: 5381, 1: 2219}


                                                                        

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml
\\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:654: Checkpoint directory \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e exists and is not empty.

  | Name              | Type       | Params | Mode 
---------------------------------------------------------
0 | static_embedding  | Linear     | 15     | train
1 | dynamic_embedding | Linear     | 5      | train
2 | lstm              | LSTM       | 128    | train
3 | dropout           | Identity   | 0      | train
4 | head              | Sequential | 61     | train
---------------------------------------------------------
209       Trainable params
0         Non-trainable params
209       Total params
0.001     Total estimated model params size (MB)
9         M

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 512. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\loops\fit_loop.py:298: The number of training batches (15) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 432. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:384: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['lr-Adam', 'train_loss', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?
`Trainer.fit` stopped: `max_epochs=2` reached.
                                                                        

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml


                                                                        

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:654: Checkpoint directory \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\finetune_all\912101A\241217143420_1376 exists and is not empty.

  | Name              | Type       | Params | Mode 
---------------------------------------------------------
0 | static_embedding  | Linear     | 15     | train
1 | dynamic_embedding | Linear     | 5      | train
2 | lstm              | LSTM       | 128    | train
3 | dropout           | Identity   | 0      | train
4 | head              | Sequential | 61     | train
---------------------------------------------------------
209       Trainable params
0         Non-trainable params
209       Total params
0.001     Total estim

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml
\\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\finetune_all\912101A\241217143420_1376


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\loops\fit_loop.py:298: The number of training batches (11) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 261. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


Validation: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 72. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=15` reached.


Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml


                                                                        

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:654: Checkpoint directory \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\finetune_all\912105A\241217143613_4c35 exists and is not empty.

  | Name              | Type       | Params | Mode 
---------------------------------------------------------
0 | static_embedding  | Linear     | 15     | train
1 | dynamic_embedding | Linear     | 5      | train
2 | lstm              | LSTM       | 128    | train
3 | dropout           | Identity   | 0      | train
4 | head              | Sequential | 61     | train
---------------------------------------------------------
209       Trainable params
0         Non-trainable params
209       Total params
0.001     Total estim

Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml
\\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\finetune_all\912105A\241217143613_4c35


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 296. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\loops\fit_loop.py:298: The number of training batches (5) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

c:\Users\sho108\AppData\Local\pypoetry\Cache\virtualenvs\hydroml-dFLAodHf-py3.11\Lib\site-packages\pytorch_lightning\utilities\data.py:78: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 171. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=15` reached.


Transforming data: loading transform parameters from \\fs1-cbr.nexus.csiro.au\{d61-coastal-forecasting-wp3}\work\sho108_handover\models\evaluation_tutorial\241217143238_a69e\params.yaml


WindowsPath('//fs1-cbr.nexus.csiro.au/{d61-coastal-forecasting-wp3}/work/sho108_handover/models/evaluation_tutorial/241217143238_a69e')

So the new set of results for the simulation over the validtion period using the continental model is stored in the following path:


In [14]:
import xarray as xr

p = config.current_path / config.version /'results' / 'simulation.nc'
xr.open_dataset(p)
