# 04 - Emulating hydrological models

## Using Raven to emulate an existing hydrological model

In this notebook, we will demonstrate the versatility of the Raven modelling framework to emulate one of eight hydrological models that are currently supported. We will walk through the different configuration parameters required to build the model and simulate streamflow on a catchment.

## A note on datasets

There are numerous ways to run a Raven model and to pass its required input data. For this introduction to RavenPy, we will use our ERA5 data we generated in the previous notebook and we will configure the Raven model instance on the fly! In the next tutorials, we will see how users can import and use their own datasets to make the entire process flexible and tailored the user needs.


## Using templated model emulators
The first thing we need to run the raven model is... a Raven model! Raven is not a model per se, but a modelling framework that can be used to build hydrological models from their underlying components. For now, PAVICS-Hydro allows building a set of pre-determined models. The Python wrapper offers at present eight model emulators: GR4J-CN, HMETS, MOHYSE, HBV-EC, Canadian Shield, HYPR, Sacramento and Blended. For each of these, templated configuration files are available to facilitate launching the model with options passed by Python at run-time. 

In the next cell, we are going to configure and run the GR4J-CN model. Please see the documentation for more details on the mandatory vs optional parameters, and what they represent. A small glimpse is provided here.

In [None]:
# Import the list of possible model templates.
from ravenpy.models import (
    BLENDED,
    CANADIANSHIELD,
    GR4JCN,
    HBVEC,
    HMETS,
    HYPR,
    MOHYSE,
    SACSMA,
)

# Generate a GR4JCN-configured Raven model instance.
# By replacing "GR4JCN()" by "HMETS()", we would then be running a HMETS model emulator instead.
# Also, by adding the "workdir" path, the data used to run the model (RAVEN .RV files) will be made available
# from the PAVICS Jupyter environment. The default"workdir" setting puts model outputs in the temporary ("/tmp"), which
# is not visible from the Jupyter file explorer. Therefore, You can change the last subfolder,
# but '/notebook_dir/writable-workspace/' must be the beginning  of the path when running on the PAVICS platform.

#    model = GR4JCN(workdir="/notebook_dir/writable-workspace/run_results") #Value for server, to change
model = GR4JCN(workdir="/home/ets/src/run_results")

The GR4JCN model has now been created, and is ready to be parameterized. In a nutshell, when the Raven executable is launched, it will look for configuration files and use those to run the desired model components to generate streamflow. This is done behind the scenes in PAVICS-Hydro: All we need to do is provide the information, and we will write the text files as required.

In the next cell, we will provide the forcing data. This can be precipitation, max and min temperatures, evapotranspiration, rainfall, snowfall, and other variables that could be used by your Raven model. Note that forcing data can also include observed streamflow if you also want to compute an objective function, or as we will see in the tutorial notebook 06, for model calibration. Forcing data must be a path to one or multiple files. Here we will pass the three meteorological input files generated in the previous notebook:

In [None]:
# Generate a list of forcing data files. You can add the files you need! As long as there are timestamps
# associated with each value in the netcdf files, the code will accept them and use what it needs.
forcing = ("ERA5_tmax.nc", "ERA5_tmin.nc", "ERA5_pr.nc")

In this next step, we will define the hydrological response unit (HRU). For lumped models, there is only one unit so the following structure should be good. However, for distributed modelling, there will be more than one HRU, so we would use another tool to help us build the HRUs in that case. The HRU provides information on the area, elevation, and location of the catchment. 

For now, let's provide the basin properties such that Raven can run. These are the minimal values that must always be provided, but some models might require other inputs. Please see the documentation for more information.

In [None]:
# Define the hydrological response unit. We can use the information from the tutorial notebook #02! Here we are using
# arbitrary data for a test catchment.
hru = GR4JCN.LandHRU(
    area=4250.6, elevation=843.0, latitude=54.4848, longitude=-123.3659
)

The next required inputs are the start and end dates for the simulation. The `start_date` and `end_date` arguments indicate when a simulation should start and end. As long as the forcing data covers the simulation period, it should work. If these parameters are not defined, then start and end dates default to the start and end of the driving data. 

To keep things simple, we will use a short 5-year period. Note that the dates are python datetime.datetime objects.

In [None]:
import datetime as dt

start_date = dt.datetime(1985, 1, 1)
end_date = dt.datetime(1990, 1, 1)

Finally, we also need to provide some parameters to our model. We could calibrate them, but let's start by using some arbitrary parameters just to get the model to run. A more detailed overview is given towards the end of this notebook, but for now, you can get a glimpse of the required parameters by using the "help" command.

In [None]:
help(GR4JCN.Params)

You can see in the first lines that there are 6 parameters for GR4JCN, from GR4J_X1 to GR4J_X4, and two CEMANEIGE_X1(X2) parameters. Let's create a list with those parameters in order for our model:

In [None]:
params = (3.9, 1.396, 200.29, 10.072, 16.9, 0.947)

Here is where we launch the model using all of the configurations as specified. We simply call the "model" object we created earlier, and pass the required inputs. You might see some warning messages, these are for information only and can be disregarded (usually, but check to make sure this is not impacting your simulations in an unexpected way!)

In [None]:
# Run the model by passing the configuration variables we just established.
model(
    ts=forcing,
    start_date=start_date,
    end_date=end_date,
    hrus=(
        hru,
    ),  # Careful how this must be passed! This is due to the capability of running in distributed mode as well.
    params=params,
    run_name="test_basin",  # OPTIONAL: You can give your run a specific name to identify the results more easily. Files will contain the run_name as a prefix.
    overwrite=True,  # OPTIONAL: We can do this to overwrite old files with the new ones generated in this run (output files, etc.)
)

We can now explore the model outputs that have been generated, by using the following command:

In [None]:
# Display model output choices
model.outputs

In [14]:
# Convert the reference corrected data into netCDF file. We will then apply a special code to remove a dimension in the dataset to make it applicable to the RAVEN models.
import xarray as xr

ds1 = xr.open_dataset("ERA5_tmax.nc")
ds2 = xr.open_dataset("ERA5_tmin.nc")
ds3 = xr.open_dataset("ERA5_pr.nc")

ds4 = xr.merge([ds1, ds2, ds3])

ds4.to_netcdf("ERA5_weather_data.nc")

In [24]:
print(ds4)

<xarray.Dataset>
Dimensions:  (time: 3654)
Coordinates:
  * time     (time) datetime64[ns] 1980-12-31 1981-01-01 ... 1991-01-01
Data variables:
    tmax     (time) float32 ...
    tmin     (time) float32 ...
    pr       (time) float32 ...
Attributes:
    long_name:       2 metre temperature
    nameCDM:         2_metre_temperature_surface
    nameECMWF:       2 metre temperature
    product_type:    analysis
    shortNameECMWF:  2t
    standard_name:   air_temperature
    units:           degC
    grid_mapping:    crs


In [30]:
import datetime as dt

from ravenpy.new_config import commands as rc
from ravenpy.new_config.emulators import GR4JCN

hru = {}
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

data_type = ["TEMP_MAX", "TEMP_MIN", "PRECIP"]

alt_names = {
    "RAINFALL": "rain",
    "TEMP_MIN": "tasmin",
    "TEMP_MAX": "tasmax",
    "PET": "pet",
    "HYDROGRAPH": "qobs",
    "SNOWFALL": "snow",
    "PRECIP": "pr",
}

m = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=rc.Gauge.from_nc(
        "ERA5_weather_data.nc",
        data_type=data_type,
        alt_names=alt_names,
        extra={
            1: {"elevation": hru["elevation"], "latitude": 45.00, "longitude": -123.00}
        },
    ),
    HRUs=[hru],
    StartDate=dt.datetime(1985, 1, 1),
    EndDate=dt.datetime(1987, 1, 1),
    RunName="test",
    CustomOutput=rc.CustomOutput("YEARLY", "AVERAGE", "PRECIP", "ENTIRE_WATERSHED"),
    GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
)

In [44]:
m.build(workdir="/home/ets/src/run_results2", overwrite=True)

AttributeError: 'GR4JCN' object has no attribute 'run'

The outputs are as follow:

- hydrograph: The actual simulated hydrograph (q_sim), in netcdf format. It also contains the observed discharge (q_obs) if observed streamflow was provided as a forcing file.
- storage: The state variables of the simulation duration, in netcdf format
- solution: The state variables at the end of the simulation, which are saved as a ".rvc" file that can be used to hot-start a model (for forecasting, for example)
- rv_config: The model and data files packaged into a zip file that can be downloaded to run this exact model setup elsewhere on PAVICS-Hydro or on your local machine.

You can explore the outputs using one of the following two syntaxes. One provides a path to the data, one actually loads the data into memory to be used directly in another cell:


In [None]:
# This syntax will allow you to get the path to where the output files are located on the server.
print(model.outputs["hydrograph"])
print(model.outputs["storage"])
print(model.outputs["solution"])
print(model.outputs["rv_config"])

In [None]:
# The model outputs are actually already loaded as Python objects in memory, thus we can access the data directly.
# Note that there is no way to work with the zip-file directly, so only the objects that contain useful data can be read-in this way:
print("----------------HYDROGRAPH----------------")
print(model.hydrograph)
print("")
print("-----------------STORAGE------------------")
print(model.storage)
print("")
print("-----------------SOLUTION-----------------")
print(model.solution)
print("")

We can see that the model has generated a simulation using the forcing data we provided, but it only used the period between the start_date and end_date. We can confirm this by looking at the forcing data. Since we have 3 forcing data files, we will want to open them using the 'open_mfdataset' method in xarray:


In [None]:
import xarray as xr

dataset = xr.open_mfdataset(forcing)
print(dataset)
dataset.close()

We can see that the dates cover the period 1980-01-01 to 1991-01-01. In our simulation, we only ask to run over the period from 1985-01-01 to 1991-01-01. Raven takes care of subsetting the data for the required period. We can look at the simulated streamflow from Raven to confirm this:

In [None]:
# Import the graphing utility built to handle Raven model outputs
from ravenpy.utilities.nb_graphs import hydrographs

hydrograph_objects = model.hydrograph
hydrographs(hydrograph_objects)

As you can see, the simulated flow covers only the period we asked for. The results probably don't look good, but that's OK! We will soon calibrate our model to get reasonable parameters.

We could also simply do basic plots using:

In [None]:
model.hydrograph.q_sim.plot()

Finally, we can inspect and work with other state variables in the model outputs. For example, say we want to investigate the snow water equivalent timeseries. We can first get the list of available state variables: 

In [None]:
print(list(model.storage.keys()))

And then plot the variable of interest:


In [None]:
# Plot the "Snow" variable
model.storage["Snow"].plot()

As you can see, PAVICS-Hydro makes it easy to build a hydrological model, run it with forcing data, and then interact with the results! In the next notebooks, we will see how to use model configuration files (the .rvX files) to setup and run a model, and also how to calibrate its parameters. 



## Supplementary information on Hydrological response unit definition
Raven requires a description of the watershed streamflow is simulated in. Different models require different parameters, but minimally, area, elevation, latitude and longitude are required. These data need to be provided for a few reasons:
* Area is required since the size of the watershed will directly influence the simulated streamflow. Units are in square kilometers (km²).
* Elevation (average elevation of the watershed) is required, although in many models the value is not actually used and therefore can be set to an arbitrary number. We strongly recommend using the real elevation as that will ensure that the value is present if you decide to switch to another model that requires elevation. Elevation is expressed in meters above mean sea level.
* Latitude and longitude refer to the catchment centroid, and are used, among others, for evapotranspiration  computations. They are expressed in decimal degrees (°), with longitudes within [-180, 180].

These values should be either precomputed externally, or they can be computed using the PAVICS-Hydro geophysical extraction toolbox that we used in the second tutorial notebook.

## Supplementary information on model parameters

Each model requires a set of tuning parameters to represent and compensate for unknown quantities in certain hydrological processes. Some models have more parameters than others, for example:

* GR4JCN = 6 parameters
* HMETS = 21 parameters
* MOHYSE = 10 parameters
* HBVEC = 21 parameters

These parameters are found through calibration by tuning their values until the simulated streamflow matches the observations as much as possible. PAVICS-Hydro provides an integrated calibration toolbox that will be explored in the the 6th step of this tutorial. For now, we simply provided a set of parameters but it is not yet fully calibrated. This explains the poor quality of the simulated hydrograph.

## Explore!
With this information in mind, you can now explore running different models and parameters and on different periods, and display the simulated hydrographs. You can change the start and end dates, the area, latitude, and even add other options that you might find in the documentation or in later tutorials.

If you want to run other models than GR4JCN, you can use these parameter sets:

#### HMETS: 
params = (9.5019, 0.2774, 6.3942, 0.6884, 1.2875, 5.4134, 2.3641, 0.0973, 0.0464, 0.1998, 0.0222, -1.0919, 2.6851, 0.3740, 
          1.0000, 0.4739, 0.0114, 0.0243, 0.0069, 310.7211, 916.1947)
       
#### MOHYSE:
params = (1.0, 0.0468, 4.2952, 2.658, 0.4038, 0.0621, 0.0273, 0.0453, 0.9039, 5.6167)

#### HBVEC:
params = (0.059845, 4.07223, 2.00157, 0.034737, 0.09985, 0.506, 3.4385, 38.32455, 0.46066, 0.06304, 2.2778, 4.8737,
          0.5718813, 0.04505643, 0.877607, 18.94145, 2.036937, 0.4452843, 0.6771759, 1.141608, 1.024278)