Skip to content

Code used in the study "Evaluation and interpretation of convolutional long-short term memory networks for regional hydrological modelling"

License

Notifications You must be signed in to change notification settings

andersonsam/cnn_lstm_era

Repository files navigation

CNN-LSTMs for regional hydrological modelling

This repository contains the code used in the study:

Anderson, Sam and Radic, Valentina. "Evaluation and interpretation of convolutional long short-term memory neural networks for regional hydrological modelling". Hydrology and Earth System Sciences, 2022.

Contact: Sam Anderson
Email: sanderson@eoas.ubc.ca


Overview

The code in this repository can reproduce all figures and findings in the study. All data used is publicly accessable and details to download data are given below. This repository contains the following files:

  • main_publish.ipynb: Defines functions, loads preprocessed data, builds/trains CNN-LSTM model, evaluates performance, interprets model learning, creates figures
  • preprocessing.ipynb: Loads raw data (temperature, precipitation, streamflow, and basin outlines) and preprocesses into format used in main.ipynb
  • figure_study_region.ipynb: Creates Figure 1 (study region).
  • era5_download_P_075grid.py: Connects to ERA5 API and downloads raw precipitation data
  • era5_download_T2m_075grid.py: Connects to ERA5 API and downloads raw temperature data
  • non_contributing_areas.ipynb: Calculates non-contributing areas of basins in the eastern cluster
  • mini.ipynb: A miniature version of main_publish.ipynb, which loads and structures one year of input/target data, clusters stream gauge stations, makes heat maps, and perturbs input temperature

Model structure

We use a sequential CNN–LSTM model to map weather predictors to streamflow at multiple stream gauge stations simultaneously throughout the region of southwestern Canada.

As input data, we use the past 365 days of weather, covering the whole study region, in order to predict the streamflow of the next day at N stream gauge stations. The CNN learns the spatial features in each day of the input, while the LSTM model learns the temporal relationship between these features in order to predict streamflow.


Model evaluation and interpretation

The model performs well overall, with a median NSE of 0.68 across the region.

Interpretation test 1: Spatial sensitivity

Over the test period, the model is most sensitive to spatial perturbations typically near and within the basins where streamflow is being predicted. Fine-tuning the model on each individual subregion improves the model's ability to focus on a the area of the subregion. Notably, the the eastern, north-eastern, and north-western clusters, the model appears to have two sensitive areas.

Interpretation test 2: Response to warming/cooling temperatures

When the input data is made warmer or cooler overall, the model responds by adjusting the timing and magnitude of the spring freshet in accordance with the timing the transition to above-freezing temperatures.

Interpretation test 3: Summer streamflow sensitivity to temperature in glacierized basins

When August temperatures are made warmer (cooler), glacierized basins are predicted to have more (less) streamflow. Streamflow in more highly glaciated basins is more sensitive to August temperature.


How to run code

Practically, main_publish.ipynb runs best on a GPU to train the models much faster. It is set up to run in Google Colab. Google Colab does not access locally saved files; rather, it can access those in Github and Google Drive. So, main_publish.ipynb can be run on Colab via Github, and all outputs/required data can be saved/organized in Google Drive as outlined in the notebook. The other files (preprocessing.ipynb, figure_study_region.ipynb, era5_download_P_075grid.py, era5_download_T2m_075grid.py, non_contributing_areas.ipynb) can be run locally. Here we give instructions to replicate the results in Google Colab.

  1. Download ERA5 data:

    • Locally, run era5_download_P_075grid.py and era5_download_T2M_075grid.py; save output files (ERA5_T_1979_2015_6hourly_075_grid_AB_BC.nc and ERA5_P_1979_2015_6hourly_075_grid_AB_BC.nc) locally in cnn_lstm_era/Data/ERA5/
  2. Download streamflow data:

    • See here for instructions to download available data for all active and naturalized stream gauge stations in Alberta and BC. ABActNatFlowAll.csv and BCActNatFlowAll.csv list the stations which should be downloaded. Save streamflow data in ./Data/Flow/
  3. Download basin outline data:

    • From here, download the folder WSC_Basins.gdb. A direct download link and other information can be found here Save this folder as ./Data/WSC_Basins.gdb/
  4. Download provincial border shapefiles:

    • From Statistic Canada, download the "Provinces/Territories Cartographic Boundary File - 2016 Census" shapefile (SHP). Note: This data is not necessary for the analysis, but it used for making maps.
  5. Download glacier data:

    • Download the file 02_rgi60_WesternCanadaUS.shp by clicking 'Western Canada and USA' from the Randolph Glacier Inventory V6.0. Save in Google Drive at './data/RGI/'.
  6. Preprocess the raw ERA5, streamflow, and basin outline data using preprocessing.ipynb

  7. Upload preprocessed files from Step 5 to Google Drive in folder './data/'. Upload shapefiles from Step 4 to Google Drive in folder ./data/province_borders/ (e.g. ./data/province_borders/lpr_000b16a_e.shp)

  8. Upload trained models (from './Models/') to Google Drive in folder './models/'.

  9. Run main_publish.ipynb in Colab.

If interested in non-contributing areas in the eastern cluster:

  1. Download non-contributing area data:

    • From here, download the folder "HYD_AAFC_TOTAL_NON_CTRB_DRAIN.gdb" by clicking 'Pre-packaged FGDB files (Bilingual)' --> 'Access'. Save this folder as './Data/HYD_AAFC_TOTAL_NON_CTRB_DRAIN.gdb/'.
  2. Run non_contributing_areas.ipynb

If interested in the Reference Hydrometric Basin Network (RHBN) and how stations in the RHBN overlap with those in this study:

  1. Download 'RHBN_Metadata.xlsx' from Environment and Climate Change Canada. Save in Google Drive in './data/'. This file is used in main_publish.ipynb.

Miniature code

To reproduce some of the key results without downloading and structuring the whole datasets in Steps 1-12 above, you can use mini.ipynb. This notebook loads enough preprocessed data to structure 1 year of climate reanalysis and streamflow data, load trained models, make sensitivity heat maps, and perturb input temperature data. This notebook uses data saved in './Data/mini/' which can be uploaded to Google Drive (for access in Colab) in the folder './data_mini/'. While mini.ipynb can be run locally, predicting streamflow under temperature perturbations (to identify freshet response) or spatial perturbations (to make heat maps) is much faster when predictions can be made in batches on a GPU (e.g. on Colab).


File organization

Local organization:

  • cnn_lstm_era/
    • main_publish.ipynb
    • preprocessing.ipynb
    • figure_study_region.ipynb
    • era5_download_P_075grid.py
    • era5_download_T2M_075grid.py
    • Models/
      • All trained bulk and fine-tuned models (.h5)
    • Data/
      • ERA5/
        • ERA5_T_1979_2015_6hourly_075_grid_AB_BC.nc
        • ERA5_P_1979_2015_6hourly_075_grid_AB_BC.nc
      • Flow/
        • AB/
          • ABActNatFlowAll.csv
          • 05AA004_Daily_Flow_ts.csv
          • ...
          • 11AA026_Daily_Flow_ts.csv
        • BC/
          • BCActNatFlowAll.csv
          • 07EA004_Daily_Flow_ts.csv
          • ...
          • 10DA001_Daily_flow_ts.csv
      • WSC_Basins.gdb/
        • ...
      • Mini/
        • x_intermediate_mini.pickle
        • y_mini.pickle
        • flowseason_norm.pickle
        • station_info.pickle
        • stationBasins.pickle

Google Drive organization (for Colab access)

  • My Drive/
    • Colab Notebooks/
      • cnn_lstm_era/
        • models/
        • output/
        • heat_maps/
        • data/
          • province_borders/
          • RGI/
        • heat_maps_mini/
        • data_mini/

About

Code used in the study "Evaluation and interpretation of convolutional long-short term memory networks for regional hydrological modelling"

Resources

License

Stars

Watchers

Forks

Packages

No packages published