# Exercise 3: Run HBV model with ERA5 forcing and GRDC observation

In this notebook you will run your own HBV model using ERA5 forcing data (from the previous notebook) and GRDC observation data. You will have to change a few settings below. Read carefully and decide which inputs and lines you need to change.

In [None]:
# Load all dependencies, including your own model through ewatercycle_wrapper_HBV
import ewatercycle.forcing
import ewatercycle.observation.grdc
import ewatercycle.analysis
from pathlib import Path
from cartopy.io import shapereader
import pandas as pd
import numpy as np
from rich import print
import matplotlib.pyplot as plt

from ewatercycle_wrapper_HBV import HBV
 


Add the name of your region in the cell below:

In [None]:
# Name of your shapefile/region without extension:
own_region = None #for example: "Rhine"

if own_region == None: # if nothing is provided, the Rhine shapefile will be used
    own_region = "Rhine"

In [None]:
# Shapefile that describes the basin we want to study.
path = Path.cwd()
forcing_path = path / "Forcing"
shapeFile = forcing_path / f"{own_region}.shp"

# Location to saved forcing results from previous notebook
forcingLocation = forcing_path / f"{own_region}Forcing2000-2002"

# GRDC station ID for the observation station
grdc_station_id = "6335020"  # GRDC station ID
basin_name = own_region

# Period of interest. Make sure that GRDC data is available for this period and that.
experiment_start_time = "2000-01-01T00:00:00Z"
experiment_end_time = "2002-12-31T00:00:00Z"


Forcing was created in the previous notebook and loaded here.

In [None]:
ERA5_forcing = ewatercycle.forcing.sources["LumpedMakkinkForcing"].load(forcingLocation)
print(ERA5_forcing)

Now we can prepare the configuration files just as in the first notebook. Note that we are using the same parameters as in the first notebook which will not be adequete for your new area (but might be pretty ok, let's see!):

In [None]:
s_0 = np.array([0,  100,  0,  5])

p_min_initial = np.array([0,   0.2,  40,    .5,   .001,   1,     .01,  .0001])
p_max_initial = np.array([8,    1,  800,   4,    .3,     10,    .1,   .01])

p_names = ["$I_{max}$",  "$C_e$",  "$Su_{max}$", "β",  "$P_{max}$",  "$T_{lag}$",   "$K_f$",   "$K_s$"]
S_names = ["Interception storage", "Unsaturated Rootzone Storage", "Fastflow storage", "Groundwater storage"]

param_names = ["Imax","Ce",  "Sumax", "beta",  "Pmax",  "Tlag",   "Kf",   "Ks"]

par_0 = (p_min_initial + p_max_initial)/2

In [None]:
model = HBV(forcing=ERA5_forcing)

In [None]:
config_file, _ = model.setup(
                            parameters=','.join([str(p) for p in par_0]),
                            initial_storage=','.join([str(s) for s in s_0]),
                               )

In [None]:
model.initialize(config_file)

Now it is time again to run the HBV model:

In [None]:
Q_m = []
time = []
while model.time < model.end_time:
    model.update()
    discharge_this_timestep = model.get_value("Q")
    Q_m.append(discharge_this_timestep[0])
    time.append(pd.Timestamp(model.time_as_datetime.date()))

In [None]:
df = pd.DataFrame(data=Q_m,columns=["Modeled discharge"], index=time)

We can plot the output discharge directly using the DataFrame.

In [None]:
fig, ax = plt.subplots(1,1)

df.plot(ax=ax,label="Modeled discharge HBV-bmi")
plt.ylabel(f"Discharge ({model.bmi.get_var_units('Q')})")
plt.xlabel("Time")

## Analyse results
We can also use the ```hydrograph``` function from eWaterCycle. This will make a hydrograph that compares model output to observations. For this we need to load observations and make sure that the observations and model output are in the same units. Observations typically are in m$^3$/s. 

Note that the unit of discharge from this model is in mm/d. Conversion to m$^3$/s requires the area of the catchment.

In [None]:
shapeObject = shapereader.Reader(shapeFile.absolute())
record = next(shapeObject.records())
shape_area = record.attributes["SUB_AREA"] * 1e6
print("The catchment area is:", shape_area)

The hydrograph function requires xarrays. We use the DataFrame ```df``` and convert the discharge to m$^3$ per second. Then we convert it to a ```xarray```

In [None]:
df['model output'] = df['Modeled discharge'] * shape_area / (1000 * 86400)
sim_data = df['model output'].to_xarray().rename({'index': 'time'}) 
sim_data.name = 'Simulated data'

The observation data is loaded using ```the get_grdc_data()``` function build into eWaterCycle. The GRDC data is given as a xarray object. The observation data and discharge data are combined together into one xarray. Note that we re-index the discharge data to make sure they are at the same timestamp.

In [None]:
observations = ewatercycle.observation.grdc.get_grdc_data(
    station_id=grdc_station_id,
    start_time=experiment_start_time,
    end_time=experiment_end_time,
    column='Observations from GRDC',
)

Have a look at the DataFrame with the model output and the observations from GRDC:

In [None]:
discharge = xr.merge([sim_data, observations["Observations from GRDC"]]).to_dataframe()
hydro_data = discharge[["Observations from GRDC", "Simulated data"]].dropna()
hydro_data

Finally plot the hydrograph. It is remarkabe to see how well a simple model like HBV, without calibration, is already able to predict discharge in the Rhine. Is it also good for your own area? 

In [None]:
# Plot hydrograph and show metrics
ewatercycle.analysis.hydrograph(hydro_data, reference='Observations from GRDC', filename = 'experiment_hydrograph.png')

It is good practice to remove a model object when done using ```.finalize()```. For small models like this, it doesn't matter too much, but larger models that run in containers keep using resources when not ```finalized```.

In [None]:
model.finalize()