# Hindcasting with CaSPAr-Archived ECCC forecasts

This notebook shows how to perform a streamflow hindcast, using CaSPar archived weather forecasts. It generates the hindcasts and plots them.

CaSPAr (Canadian Surface Prediction Archive) is an archive of historical ECCC forecasts developed by Juliane Mai at the University of Waterloo, Canada. More details on CaSPAr can be found here https://caspar-data.ca/.


Mai, J., Kornelsen, K.C., Tolson, B.A., Fortin, V., Gasset, N., Bouhemhem, D., Schäfer, D., Leahy, M., Anctil, F. and Coulibaly, P., 2020. The Canadian Surface Prediction Archive (CaSPAr): A Platform to Enhance Environmental Modeling in Canada and Globally. Bulletin of the American Meteorological Society, 101(3), pp.E341-E356.


In [None]:
# This entire section is cookie-cutter template to import required packages and prepare the temporary writing space.
import datetime as dt
import json
import tempfile
from pathlib import Path

import xarray as xr
import xskillscore
from clisops.core import average, subset
from matplotlib import pyplot as plt

from ravenpy import Emulator, RavenWarning
from ravenpy.extractors.new_config.forecasts import get_CASPAR_dataset
from ravenpy.new_config import commands as rc
from ravenpy.new_config.emulators import GR4JCN
from ravenpy.utilities.new_config import forecasting
from ravenpy.utilities.testdata import get_file

tmp = Path(tempfile.mkdtemp())

## Run the model simulations

Here we set model parameters somewhat arbitrarily, but you can set the parameters to the calibrated parameters as seen in the "06_Raven_calibration" notebook we previously encountered. We can then specify the start date for the hindcast ESP simulations and run the simulations.This means we need to choose the forecast (hindcast) date. Available data include May 2017 onwards.

In [None]:
# Date of the hindcast
hdate = dt.datetime(2018, 6, 1)

# Get the Forecast data from GEPS via CASPAR
ts_hindcast, _ = get_CASPAR_dataset("GEPS", hdate)

# Subset the data for the region of interest and take the mean to get a single vector
ts_subset = subset.subset_shape(ts_hindcast, "salmon_river.geojson").mean(
    dim=("rlat", "rlon")
)
ts_subset = ts_subset.resample(time="6H").nearest(
    tolerance="1H"
)  # To make the timesteps identical accross the entire duration

In [None]:
# See how many members we have available
len(ts_subset.members)

Now that we have the correct weather forecasts, we can setup the hydrological model for a warm-up run:

In [None]:
%%capture --no-display
# Adding this to avoid spamming warning messages for overwriting files.

# Prepare a RAVEN model run using historical data, GR4JCN in this case.
# This is a dummy run to get initial states. In a real forecast situation,
# this run would end on the day before the forecast, but process is the same.

# Here we need a file of observation data to run a simulation to generate initial conditions for our forecast.
ts = str(
    get_file("raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc")
)

# This is the model start date, on which the simulation will be launched for a certain duration
# to setup the initial states. We will then save the final states as a launching point for the
# forecasts.

start_date = dt.datetime(2000, 1, 1)
end_date = dt.datetime(2002, 6, 1)

# Define HRU to build the hydrological model
hru = {}
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

# Set alternative names for netCDF variables
alt_names = {
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "RAINFALL": "rain",
    "SNOWFALL": "snow",
}

# Data types to extract from netCDF
data_type = ["TEMP_MAX", "TEMP_MIN", "RAINFALL", "SNOWFALL"]

# Model configuration
model_config_warmup = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=rc.Gauge.from_nc(
        ts,
        data_type=data_type,
        alt_names=alt_names,
        extra={
            1: {
                "elevation": hru[
                    "elevation"
                ],  # No need for lat/lon as they are included in the netcdf file already
            }
        },
    ),
    HRUs=[hru],
    StartDate=start_date,
    EndDate=end_date,
    RunName="NB12_warmup_run",
    GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
)
"""
# TODO: WE NEED THIS TO DECUMULATE PRECIP AND ADJUST SCALE.

pr={  # This part is to scale the precipitation and temperature as well as align the UTC time zome differences
    "scale": 1000.0,
    "offset": 0.0,
    "time_shift": -0.25,
    "deaccumulate": True,
},
"""
# Run the model and get the outputs.
out1 = Emulator(config=model_config_warmup, workdir="/tmp/run_results_NB12_part_1").run(
    overwrite=True
)


# Extract the path to the final states file that will be used as the next initial states
hotstart = out1.files["solution"]

We now have the initial states ready for the next step, which is to launch the forecasts in hindcasting mode:

In [None]:
%%capture --no-display

# Configure and run a new model by setting the initial states (equal to the previous run's final states) and prepare
# the configuration for the forecasts (including forecast start date, which should be equal to the final simulation
# date + 1, as well as the forecast duration.)

# Model configuration for forecasting, including correct start date and forecast duration
model_config_fcst = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=rc.Gauge.from_nc(
        ts,
        data_type=data_type,
        alt_names=alt_names,
        extra={
            1: {
                "elevation": hru[
                    "elevation"
                ],  # No need for lat/lon as they are included in the netcdf file already
            }
        },
    ),
    HRUs=[hru],
    StartDate=end_date + dt.timedelta(days=1),
    Duration=10,
    RunName="NB12_forecast_run",
    GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
)

# Update the initial states
model_config_fcst = model_config_fcst.set_solution(hotstart)


# We need to write the netcdf data as a file for Raven to be able to access it.
ts_subset.to_netcdf(tmp / "hindcast.nc")

hindcast = forecasting.hindcast_from_meteo_forecast(
    model_config_fcst,
    forecast=ts_subset,
    path="/tmp/run_results_NB12_part_2",
    overwrite=True,
)

Explore the hindcast data:

In [None]:
hindcast.hydrograph

In [None]:
"""
# KEEPING THIS TO HAVE A MEMORY OF TAS/PR scale/offset
model(
    ts=str(tmp / "hindcast.nc"),
    nc_index=range(number_members),
    start_date=hdate,
    end_date=hdate + dt.timedelta(days=duration),
    hrus=hrus,
    params=(0.529, -3.396, 407.29, 1.072, 16.9, 0.947),
    overwrite=True,
    pr={  # This part is to scale the precipitation and temperature as well as align the UTC time zome differences
        "scale": 1000.0,
        "offset": 0.0,
        "time_shift": -0.25,
        "deaccumulate": True,
    },
    tas={"time_shift": -0.25},
)
"""


And, for visual representation of the forecasts:


In [None]:
import matplotlib.pyplot as plt

# Simulate an observed streamflow timeseries: Here we take a member from the ensemble, but you should use your own
# observed timeseries:
qq = hindcast.hydrograph.q_sim[0, :, 0]

# This is to be replaced with a call to the forecast graphing WPS as soon as merged.
# model.q_sim.plot.line("b", x="time")
hindcast.hydrograph.q_sim[:, :, 0].plot.line("b", x="time", add_legend=False)
hindcast.hydrograph.q_sim[1, :, 0].plot.line("b", x="time", label="forecasts")
qq.plot.line("r", x="time", label="observations")
plt.legend(loc="lower left")
plt.show()