# 02_Emulating_hydrological_models.ipynb

## Using Raven to emulate a well-known hydrological model

In this notebook, we will use the power of PAVICS-Hydro and the versatility of the Raven modelling framework to emulate one of four hydrological models that are currently supported. We will detail the different configuration parameters to build the model and simulate streamflow on a catchment.


In [None]:
import os
from glob import glob
import datetime as dt
from pathlib import Path
from ravenpy.utilities.testdata import get_file

## A note on datasets

For this introduction to RavenPy, we will use pre-existing datasets that are hosted on the PAVICS-Hydro servers, as we did in the previous example notebook. However, this time the model will not be pre-configured: We will configure it on the fly! In the next tutorials, we will see how users can import and use their own datasets to make the entire process flexible and taylored the user needs.

In [None]:
forcing = get_file("raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc")

# Display the datasets that we will be using
display(forcing)

## Using templated model emulators

Raven's framework can be used to emulate existing hydrological models. The Python wrapper offers at present four emulated models: GR4J-CN, HMETS, MOHYSE and HBV-EC. For each of these, templated configuration files are available to facilitate launching the model with options passed by Python at run-time. In this block of code, we are going to run the GR4JCN model and provide a list of configuration parameters. Please see the documentation for more details on the mandatory vs optional parameters, and what they represent. A glimpse is provided here.

In [None]:
# Import the list of possible model templates. 
from ravenpy.models import GR4JCN, HMETS, MOHYSE, HBVEC

# Generate a GR4JCN-configured Raven model instance. By replacing "GR4JCN()" by "HMETS()", we would then be running a HMETS model emulator instead. 
# Also, by adding this path, the data used to run the model (RAVEN .RV files) will be made available directly in your workspace and can be downloaded for future use. You can change the last subfolder, but '/notebook_dir/writable-workspace/' must be the beginning of the path.
model = GR4JCN('/notebook_dir/writable-workspace/run_results')

# Here is where we launch the model using all sorts of configuration parameters. We will run the code, and explanations will follow in the next notebooks.
model(
    forcing,
    start_date=dt.datetime(2000, 1, 1),
    end_date=dt.datetime(2002, 1, 1),
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    params=(0.529, -3.396, 407.29, 1.072, 16.9, 0.947),
    run_name="Salmon",
)

## Model configuration overview
As can be seen in the block of code above, we have provided a series of parameters and arguments to the model class. These allow configuring our model run to get exactly what we want.  We will explore the different parameters one by one here.

### start_date and end_date

One of the strengths of Raven is that the forcing data period can be much larger than the simulation period, and Raven will simply take what it needs to run the model on that time period. Therefore there is no need to manage the dates explicitly. For example, we have not specified the forcing data period but we can explore it to find out:

In [None]:
import xarray as xr
ds = xr.open_dataset(forcing)
print(ds)

We can see that the dates cover the period 1954-01-01 to 2010-12-31. However, in our simulation, we only asked to get the period:
start_date = 2000-01-01 (Simulation start)
end_date = 2002-01-01 (Simulation end)

Raven takes care of subsetting the data for the required period. The user does not need to do so independently.

We can look at the simulated streamflow from Raven to confirm this:

In [None]:
from ravenpy.utilities.nb_graphs import hydrographs

hydrographs(model.hydrograph)

As you can see, the simulated flow covers only the period we asked for.

### area, elevation, latitude and longitude
Raven requires these data to be provided for a few reasons:
* Area is required since the size of the watershed will directly influence the simulated streamflow. Units are in square kilometers (km<sup>2</sup>).
* Elevation (average elevation of the watershed) is required, although in many models the value is not actually used and therefore can be set to an arbitrary number. We strongly recommend using the real elevation as that will ensure that the value is present if you decide to switch to another model that requires elevation at a future time. Elevation is expressed in meters above mean sea level (MSL).
* Latitude and Longitude refer to the catchment centroid, and are used for evapotranspiration and other module computations. They are expressed in decimal degrees (°) using the [-180; 180] longitude system.

These values should be either precomputed (as is done here), or they can be computed using the PAVICS-Hydro geophysical extraction toolbox that will be presented later on in this tutorial.

### parameters

Each model requires a set of tuning parameters to represent and compensate for unknown quantities in certain hydrological processes. Some models have more parameters than others:

* GR4JCN = 6 parameters
* HMETS = 21 parameters
* MOHYSE = 10 parameters
* HBVEC = 21 parameters

These parameters are found through calibration by tuning their values until the simulated streamflow matches the observations as much as possible. PAVICS-Hydro provides an integrated calibration toolbox that will be explored in the the 5th step of this tutorial. For now, we simply provided a set of parameters but it is not yet fully calibrated. This explains the poor quality of the simulated hydrograph.

### name (optional)
The name is an optional parameter that is used to identify the mode simulation when the simulation runs. The exported files will have the run name in the filename so users can find them and classify them easily.

## Explore!
With this information in mind, you can now explore running different models and parameters and on different periods, and display the simulated hydrographs. You can change the start and end dates, the area, latitude, and even add other options that you might find in the documentation or in later tutorials.

If you want to run other models than GR4JCN, you can use these parameter sets:

#### GR4JCN: 
params = (0.529, -3.396, 407.29, 1.072, 16.9, 0.947)

#### HMETS: 
params = (9.5019, 0.2774, 6.3942, 0.6884, 1.2875, 5.4134, 2.3641, 0.0973, 0.0464, 0.1998, 0.0222, -1.0919, 2.6851, 0.3740, 
          1.0000, 0.4739, 0.0114, 0.0243, 0.0069, 310.7211, 916.1947)
       
#### MOHYSE:
params = (1.0, 0.0468, 4.2952, 2.658, 0.4038, 0.0621, 0.0273, 0.0453, 0.9039, 5.6167)

#### HBVEC:
params = (0.059845, 4.07223, 2.00157, 0.034737, 0.09985, 0.506, 3.4385, 38.32455, 0.46066, 0.06304, 2.2778, 4.8737,
          0.5718813, 0.04505643, 0.877607, 18.94145, 2.036937, 0.4452843, 0.6771759, 1.141608, 1.024278)

## Exporting a hydrological model configuration (Raven .rvX files)
PAVICS-Hydro includes a tool to extract Raven configuration files that can be used locally by users. This means that you can build your model on PAVICS-Hydro and run it on your own computer/server. You might also want to use this functionality if you want to explore different setups than what is currently available in PAVICS-Hydro.

We can access the configuration files easily, as they are automatically generated in the workspace we defined earlier. You can find the path here:

In [None]:
# Can't seem to download these...
print(model.outputs['rv_config'])

You can navigate to that path and find the rv.zip file ready to download and run locally. You can also find the actual data used to do the runs in the exec/model/p00/ folder.