![image](https://www.ewatercycle.org/assets/logo.png)

# Case study 2: replace subbasin in PCRGlobWB2.0 with a MARRMoT model
This notebooks demonstrates how to use eWaterCycle to combine the output of two very different models in a single experiment. We run PCRGlobWB2.0 for the Rhine basin, but we replace the Moselle sub-basin with a MARRMoT model.

PCRGlobWB2.0 (Edwin 20XX) is a distributed model written in python and MARRMoT (Knoben 20XX) is a suite of conceptual models written in Matlab. To make the difference as large as possible, we choose the simplest model available within MARRMoT: the m01, a single leaky bucket.

This example use case assumes that the ewatercycle platform has been installed and configured on your system. See our [system setup documentation](https://ewatercycle.readthedocs.io/en/latest/system_setup.html) for instructions if this is not the case.

## Import statements
We'll be using the following modules

In [1]:
import warnings

warnings.filterwarnings("ignore", category=UserWarning)

import logging

logging.getLogger("esmvalcore").setLevel(logging.WARNING)

import ewatercycle.analysis
import ewatercycle.forcing
import ewatercycle.models
import ewatercycle.observation.grdc
import ewatercycle.parameter_sets
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr

import configparser
from shutil import copyfile

from cartopy import crs
from cartopy import feature as cfeature
from cartopy.io import shapereader

## List of files created for this experiment

### for PCRGlobWB2.0

- `rhine_05min_era5.ini` This file is identical to the file with the same name used in the big comparison study. The only difference is that the time period (variables startTime and endTime) have been set to 2002-01-01 and 2002-12-31 respectivly.    ***PK: you can probably just use the default now, possible modify start and end date in model.setup() ***
- `rhine_05min_era5_without_moselle.ini` This file is identical to rhine_05in_era5.ini with the only exception that the landmask variable points to no_moselle_landmask_05min_rhine.map in the input directory
- `no_moselle_landmask_05min_rhine.map` This file is based on rhine_05min.map. Using a shape file with the shape of the Moselle catchment (see MARRMoT below) all pixels that are part of the moselle catchment have been removed (set to zero) in this landmask.   ***PK: this file is now available on the jupyter machine in the default PCRGlob parameter set dir, but this is not very FAIR...***

### for MARRMoT

- `marrmot_ERA5_Moselle_2001_2016.mat` This file contains the forcing that MARRMoT needs to run. It was created by running the ESMVAlTool recipe (TODO Jerom to provide details)   ***PK: now available are pre-generated forcing, see updated forcing load cell

## Settings and parameters to run this experiment
The settings below are seperated in settings that 'belong' to the experiment, those that belong to PCRGlobWB or those that belong with MARRMoT

In [24]:
# Settings for GRDC station for final comparison of streamflow
station_id = "6335020"  # GRDC station ID
basin_name = "Rhine"

# Location of the mouth of the Moselle.
lat_moselle_mouth = [50.366852]
lon_moselle_mouth = [7.609666]

experimentLandMaskLocation = "./settingFiles"
experimentLandMaskName = "no_moselle_landmask_05min_rhine.map"
####################### PK: I think you can also use the defaults instead; also see comments below
# Custom setting files for PCRGlobWB2.0  
#pcrglob_ref_setting_file = "/settingFiles/rhine_05min_era5.ini"
#pcrglob_exp_setting_file = "/settingFiles/rhine_05min_era5_without_moselle.ini"

####################### PK: for better reproducibility, I think you should add the custom landmask map here, and ship it with the repo. 
# It is now located in: /mnt/data/parameter-sets/pcrglobwb_global/global_05min/cloneMaps/no_moselle_landmask_05min_rhine.map

The closest PCRGlobWB pixel to this location will be used to 'dump' the output of MARRMoT into the 'channel_storage' of PCRGlobWB. Location from Google maps.

In [5]:
spinup_start_date = "2001-01-01T00:00:00Z"
spinup_end_date = "2001-12-31T00:00:00Z"
experiment_start_date = "2002-01-01T00:00:00Z"
experiment_end_date = "2002-12-31T00:00:00Z"

# variable of interest to get out of the model
marrmot_output_variable = "flux_out_Q"

# flux_out_Q unit conversion factor from mm/day to m3/s
conversion_mmday2m3s = 1 / (1000 * 86400)

# parameters, in this case max soil moisture storage (in mm), ranging between 100 en 2000
# https://github.com/wknoben/MARRMoT/blob/dev-docker-BMI/MARRMoT/Models/Parameter%20range%20files
maximum_soil_moisture_storage = 1500.0
initial_soil_moisture_storage = 0.9 * maximum_soil_moisture_storage

## Setting up the model objects
For MARRMoT we first need to generate a config file. After that is done, the three BMI model objects are created

In [None]:
# Pre-generated forcing is available for this experiment:
marrmot_forcing = ewatercycle.forcing.load("/mnt/data/forcing/marrmot-m01_ERA5_2001-2016_moselle")
print(marrmot_forcing)

In [None]:
# A shapefile is included with the forcing. Use it to find the Moselle area
shape = shapereader.Reader(marrmot_forcing.shape)
record = next(shape.records())
moselle_area = record.attributes["SUB_AREA"] * 1e6
print("The catchment area is:", moselle_area)

In [None]:
marrmot_model = ewatercycle.models.MarrmotM01(
    version="2020.11", forcing=marrmot_forcing
)

# Create config file and write to work directory (cfg_dir)
# Start up a container for MARRMoT
marrmot_cfg_file, marrmot_cfg_dir = marrmot_model.setup(
    # No need to specifiy start and end date, using dates from forcing_output
    maximum_soil_moisture_storage=maximum_soil_moisture_storage,
    initial_soil_moisture_storage=initial_soil_moisture_storage,
)
marrmot_cfg_file, marrmot_cfg_dir

In [None]:
# Initialize using the created config file
marrmot_model.initialize(marrmot_cfg_file)

In [None]:
# MARRMot needs to spin-up, so we run it for a while, see the period for timing
while marrmot_model.time_as_isostr < spinup_end_date:
    marrmot_model.update()

In [4]:
pcrglob_parameter_set = ewatercycle.parameter_sets.get_parameter_set('pcrglobwb_rhine_05min')

In [6]:
# sometimes it times out: just retry
pcrglob_ref_model = ewatercycle.models.PCRGlobWB(
    version="setters", parameter_set=pcrglob_parameter_set
)
# Start up a container for PCRGlob for the reference run
pcrglob_ref_cfg_file, pcrglob_Ref_cfg_dir = pcrglob_ref_model.setup(
    start_date = experiment_start_date, 
    end_date = experiment_end_date)
pcrglob_ref_cfg_file, pcrglob_Ref_cfg_dir

Running /mnt/data/singularity-images/ewatercycle-pcrg-grpc4bmi_setters.sif singularity container on port 59221


('/mnt/home/user42/technicalPaperExampleNotebooks/ewatercycle_output/pcrglobwb_20210827_145840/pcrglobwb_ewatercycle.ini',
 '/mnt/home/user42/technicalPaperExampleNotebooks/ewatercycle_output/pcrglobwb_20210827_145840')

In [7]:
############################ PK: I think you can use the default cfg file here as well, as long as you pass start_date and end_date in model.setup() in the cell above

# Override the default settings file with the one we prepared for this experiment
pcrglob_ref_model.initialize(pcrglob_ref_cfg_file)

In [8]:
# Print available output variable names of PCRGlob model
pd.Series(sorted(pcrglob_ref_model.output_var_names))

0                   accumulated_land_surface_baseflow
1                               bare_soil_evaporation
2                                            baseflow
3                                     channel_storage
4     consumptive_water_use_for_non_irrigation_demand
                           ...                       
92                                 upper_soil_storage
93                           upper_soil_transpiration
94                      water_body_actual_evaporation
95                    water_body_evaporation_fraction
96                   water_body_potential_evaporation
Length: 97, dtype: object

In [20]:
# Start up a container for PCRGlob for the experiment run, using the same parameter set object
pcrglob_exp_model = ewatercycle.models.PCRGlobWB(
    version="setters", parameter_set=pcrglob_parameter_set
)
# Start up a container for PCRGlob for the reference run
pcrglob_exp_cfg_file, pcrglob_exp_cfg_dir = pcrglob_exp_model.setup(
    start_date = experiment_start_date, 
    end_date = experiment_end_date)

pcrglob_exp_cfg_file, pcrglob_exp_cfg_dir

Running /mnt/data/singularity-images/ewatercycle-pcrg-grpc4bmi_setters.sif singularity container on port 60443


('/mnt/home/user42/technicalPaperExampleNotebooks/ewatercycle_output/pcrglobwb_20210827_150723/pcrglobwb_ewatercycle.ini',
 '/mnt/home/user42/technicalPaperExampleNotebooks/ewatercycle_output/pcrglobwb_20210827_150723')

### changing the configuration of the model for the experiment
For this experiment we want PCRGlobWB to use a different `landmasp`, ie. a map that the model uses to identify which part of the earth within the `clonemap` needs to be modelled. In the changed landmask we have removed the Moselle sub-basin from the Rhine basin. Since PCRGlobWB expects the landmask file to be in the `inputdir` location, we have to copy the changed landmask there. Note this only works when users have write access on the inputdir location, which is not per se true on all infrastructure. 

In [21]:
config = configparser.ConfigParser()
config.read(pcrglob_exp_cfg_file)
print(config.sections())

['globalOptions', 'meteoOptions', 'meteoDownscalingOptions', 'landSurfaceOptions', 'forestOptions', 'grasslandOptions', 'irrPaddyOptions', 'irrNonPaddyOptions', 'groundwaterOptions', 'routingOptions', 'reportingOptions']


In [22]:
for key in config['globalOptions']:
    print(key)

outputdir
inputdir
clonemap
landmask
institution
title
description
starttime
endtime
maxspinupsinyears
minconvforsoilsto
minconvforgwatsto
minconvforchansto
minconvfortotlsto


In [23]:
print(config['globalOptions']['inputdir'])
print(config['globalOptions']['landmask'])

/mnt/data/parameter-sets/pcrglobwb_global
global_05min/cloneMaps/rhine_05min.map


In [25]:
copyfile(experimentLandMaskLocation + "/" + experimentLandMaskName, config['globalOptions']['inputdir'] + "/" + experimentLandMaskName)

PermissionError: [Errno 13] Permission denied: '/mnt/data/parameter-sets/pcrglobwb_global/no_moselle_landmask_05min_rhine.map'

In [26]:
config['globalOptions']['clonemap'] = experimentLandMaskName
with open(pcrglob_exp_cfg_file,"w") as f:
    config.write(f)

In [27]:
#################### PK: alternatively to using a custom cfg file, you can fiddle with the settings here. 
# The land mask map is already present in the parameter set on our Jupyter machine, but that's not really FAIR. 
# Alternatively, you can ship it with the repo. In that case, also make sure to move the custom landuse map to the cfg directory so the container can see it

# Initialize using the custom setting file
pcrglob_exp_model.initialize(pcrglob_exp_cfg_file)

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: 'inputDir'"
	debug_error_string = "{"created":"@1630076939.767142818","description":"Error received from peer ipv6:[::1]:60443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Exception calling application: 'inputDir'","grpc_status":2}"
>

In [29]:
pcrglob_exp_model.finalize()

In [28]:
pcrglob_ref_model.finalize()

### Import GRDC observations

Read the GRDC data for the period of the experiment, for the station given above.

In [None]:
observations_df, metadata = ewatercycle.observation.grdc.get_grdc_data(
    station_id,
    start_time=start_date,
    end_time=end_date,
)
grdc_obs = observations_df.rename(columns={"streamflow": "Observations from GRDC"})
grdc_lon = metadata["grdc_longitude_in_arc_degree"]
grdc_lat = metadata["grdc_latitude_in_arc_degree"]

The GRDC station in the PCRGlobWB grid is one pixel below the river. So we correct the location.

In [None]:
gridded_grdc_lat = grdc_lat - 0.05
gridded_grdc_lon = grdc_lon

## Running the experiment
With all pieces in place, we can now start to run the actual experiment. Each timestep first the reference model (PCRGlobWB 2.0 with the Moselle) is run and it's output stored. Subsequently, MARRMoT is run, it's output is stored and it is added to the experiment model (PCRGlobWB 2.0 without the Moselle). Than, finally, the experiment model is run for one timestep, it's output stored and we go on to the next timestep.

In [None]:
# Create variables needed during the experiment run
time_range = []
pcrglob_ref_output = []
pcrglob_exp_output = []
marrmot_output = []

In [None]:
print(f"Running experiment", flush=True)

# the experiment model is used as central 'time keeper'
while pcrglob_exp_model.time < pcrglob_exp_model.end_time:
    print(f"Current time: {pcrglob_exp_model.time_as_isostr}", end="\r")
    time_range.append(pcrglob_exp_model.time_as_datetime.date())

    # run the reference model for one timestep, store the output at grdc station
    pcrglob_ref_model.update()
    pcrglob_ref_discharge = pcrglob_ref_model.get_value_at_coords(
        "discharge", lat=[gridded_grdc_lat], lon=[gridded_grdc_lon]
    )[0]
    pcrglob_ref_output.append(pcrglob_ref_discharge)

    # run MARRMoT and store the output, note that MARRMoT output is in mm!
    marrmot_model.update()
    marrmot_output.append(marrmot_model.get_value(marrmot_output_variable)[0])

    # add the output of MARRMoT to PCRGlob Experiment model. Note that channel storage
    # is in m3, while MARRMoT output is in mm, so we need to convert
    water_to_add_to_pcrglob = marrmot_output[-1] * moselle_area / 1000  
    current_value_in_pcrglob = pcrglob_exp_model.get_value_at_coords(
        "channel_storage", lat=[lat_moselle_mouth], lon=[lon_moselle_mouth]
    )
    value_to_set_in_pcrglob = water_to_add_to_pcrglob + current_value_in_pcrglob
    pcrglob_exp_model.set_value_at_coords(
        "channel_storage",
        lat=[lat_moselle_mouth],
        lon=[lon_moselle_mouth],
        values=value_to_set_in_pcrglob,
    )

    # Run the experiment PCRGlobWB model one timestep, store the output at grdc station
    pcrglob_exp_model.update()
    pcrglob_exp_discharge = pcrglob_exp_model.get_value_at_coords(
        "discharge", lat=[gridded_grdc_lat], lon=[gridded_grdc_lon]
    )[0]
    pcrglob_exp_output.append(pcrglob_exp_discharge)

print("")

In [None]:
# Capture all discharge of last timestep of experiment
data_exp_last = pcrglob_exp_model.get_value_as_xarray("discharge")

### clean up after the model run
The models have to be 'finalized', which deletes any temporary files and the containers have to be shut down.

In [None]:
pcrglob_ref_model.finalize()

In [None]:
pcrglob_exp_model.finalize()

In [None]:
marrmot_model.finalize()

Combine values of each time step into single dataset

In [None]:
data_exp_at_grdc_location = pd.DataFrame(
    {"PCRGlobWB Moselle replaced by MARRMoT-m01": pcrglob_exp_output},
    index=pd.to_datetime(time_range),
)
data_ref_at_grdc_location = pd.DataFrame(
    {"PCRGlobWB normal": pcrglob_ref_output}, index=pd.to_datetime(time_range)
)

In [None]:
# Store results on disk
data_exp_at_grdc_location.to_csv(pcrglob_exp_cfg_dir + '/data_exp.csv')
data_ref_at_grdc_location.to_csv(pcrglob_Ref_cfg_dir + '/data_ref.csv')
data_exp_last.to_netcdf(pcrglob_exp_cfg_dir + '/data_exp_last.nc')

In [None]:
# Retrieve results from disk when models are run in seperate session, adjust paths to previous run
# data_exp_at_grdc_location = pd.read_csv('/mnt/home/user36/temp/Case1/pcrglobwb_20210426_100036/data_exp.csv', index_col=0, parse_dates=True)
# data_ref_at_grdc_location = pd.read_csv('/mnt/home/user36/temp/Case1/pcrglobwb_20210426_095636/data_ref.csv', index_col=0, parse_dates=True)
# data_exp_last = xr.open_dataset('/mnt/home/user36/temp/Case1/pcrglobwb_20210426_100036//data_exp_last.nc')

## Plot the results

First we draw a map of the discharge at the final timestep of the model run. We add a blue dot at the location of the GRDC observation gauge and we add the shape of the Moselle sub-basin as well.

In [None]:
# Use matplotlib to make the figure slightly nicer
fig = plt.figure(dpi=120)
ax = fig.add_subplot(111, projection=crs.PlateCarree())

# Add the mosell catchment to the map
ax.add_feature(
    cfeature.ShapelyFeature(
        shape.geometries(),
        crs.PlateCarree(),
        edgecolor="green",
        facecolor="none",
    )
)

# Plotting the model field is a one-liner
data_exp_last.plot(ax=ax, cmap="GnBu", robust=True)

# Also plot the station location
ax.scatter(grdc_lon, grdc_lat, s=25, c="b")
# TODO plot the grid cell of model for which hydrograph is made
# ax.scatter(model_longitude, model_latitude, s=25, c='r')

# Overlay ocean and coastines
ax.add_feature(cfeature.OCEAN, zorder=2)
ax.coastlines(zorder=3)
fig.savefig(f"pcrglobwb_RolfTestRhine_discharge_map", bbox_inches="tight", dpi=300)

In [None]:
# Combine timeseries of ERA-Interim, ERA5 and GRDC observations in a pandas dataframe
df = data_exp_at_grdc_location.join(data_ref_at_grdc_location).join(grdc_obs)

In [None]:
ewatercycle.analysis.hydrograph(
    discharge=df,
    reference="Observations from GRDC",
    filename="case2CouplingModels",
)