# Hillslope Scale Calibration Tutorial

This notebook outlines how to perform model calibration for a selected subset of REWs. It is assumed that the selected subset is a unique sub-watershed of the full model watershed. It is furthermore assumed that channel effects are not important at the scale of the sub-watershed, so that the model can be compared to streamflow data simply by up-scaling the hillslopes output. 

Two files are required for hillslope scale calibration:

1. A shapefile corresponding to the sub-basin to be calibrated must be stored in `raw_data/watershed_poly`. 
2. Streamflow data (in units of cm/day) stored in the `calibration_data` folder. This must be gapless, daily streamflow data spanning at least the time period from `spinup_date` to `stop_date`. 

Full model calibration including channel transport is overviewed in the [Network Scale Calibration Tutorial](./network_scale_calibration.ipynb). 

In [1]:

import os
import sys
from os.path import dirname
parent_dir = dirname(dirname(os.getcwd()))
sys.path.append(os.path.join(parent_dir,'StreamflowTempModel','2_hillslope_discharge'))
sys.path.append(os.path.join(parent_dir,'StreamflowTempModel','3_channel_routing'))

from vadoseZone import *
import glob
from groundwaterZone import *
from REW import REW
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns
import pickle
from datetime import date
import pandas as pd
import numpy as np
import geopandas as gp
import mpld3
import time
import sys
import shapely
import fiona
from pyDOE import *
import folium
from ast import literal_eval as make_tuple

# Load config files, forcing file, and paramters for each group
parent_dir = os.path.dirname(os.path.dirname(os.getcwd()))

sys.path.append(os.path.join(parent_dir, 'StreamflowTempModel', '1_data_preparation'))
from prep import rew_params
rew_params()

# These dictionaries contain the all the data we'll need to instantiate 
rew_config = pickle.load( open( os.path.join(parent_dir,'model_data','rew_config.p'), "rb" ) )
climate_group_forcing = pickle.load( open( os.path.join(parent_dir,'model_data','climate_group_forcing.p'), "rb" ) )
parameter_group_params = pickle.load( open( os.path.join(parent_dir,'model_data','parameter_group_params.p'), "rb" ))
model_config = pickle.load( open( os.path.join(parent_dir, 'model_data', 'model_config.p'), 'rb'))
parameter_ranges = pickle.load( open( os.path.join(parent_dir, 'model_data', 'parameter_ranges.p'), 'rb'))

start_date = model_config['start_date']
stop_date = model_config['stop_date']
spinup_date = model_config['spinup_date']
Tmax = model_config['Tmax']
dt = model_config['dt_hillslope']
t = model_config['t_hillslope']
resample_freq_hillslope = model_config['resample_freq_hillslope']
timestamps_hillslope = model_config['timestamps_hillslope']



## Get REWs located within calibration sub-watershed 

Here, we use the representative REW points to determine which REWs are located within the sub-watershed that we are calibrating. We want to make sure to run the model only for the REWs that are relevant for calibration. 

In [2]:
subwatershed_name = 'elder'
shapefile_path = os.path.join(parent_dir, 'raw_data','watershed_poly', subwatershed_name + '.shp')
points = pd.read_csv(os.path.join(parent_dir, 'raw_data','basins_centroids', 'points.csv')).set_index('cat')

# get coordinate tuples corresponding to each REW
for index, row in points.iterrows():
    new_tuple = make_tuple(points['coords'].loc[index])
    points['coords'].loc[index] = new_tuple

# check to see which REWs fall within sub-watershed
ids_in_subwatershed = []
with fiona.open(shapefile_path) as fiona_collection:
    for shapefile_record in fiona_collection:
        # Use Shapely to create the polygon
        shape = shapely.geometry.Polygon( shapefile_record['geometry']['coordinates'][0] )

        for index, row in points.iterrows(): 
            point = shapely.geometry.Point(row[0][0], row[0][1]) # longitude, latitude
            # Alternative: if point.within(shape)
            if shape.contains(point):
                ids_in_subwatershed.append(index)

ids_in_subwatershed = list(set(ids_in_subwatershed))

# if no REWs found inside sub-watershed, assume the sub-watershed is contained within a single REW. 
# find the id of that REW. First get centroid of sub-watershed
if len(ids_in_subwatershed)==0:
    subwatershed_shape = gp.GeoDataFrame.from_file(shapefile_path)
    subwatershed_point = subwatershed_shape['geometry'].apply(lambda x: x.representative_point().coords[:])
    basins = glob.glob(os.path.join(parent_dir,'raw_data','basins_poly','*.shp'))[0]
    with fiona.open(basins) as fiona_collection:
        for shapefile_record in fiona_collection:
            # Use Shapely to create the polygon
            shape = shapely.geometry.Polygon( shapefile_record['geometry']['coordinates'][0] )
            if shape.contains(point):
                ids_in_subwatershed.append(shapefile_record['properties']['cat'])
    
groups_to_calibrate = []
for rew_id in ids_in_subwatershed:
    groups_to_calibrate.append(rew_config[rew_id]['group'])
    
    
print('REWs %s are located within the calibration sub-watershed' % str(ids_in_subwatershed))
print('The groups %s will be run for calibration purposes' % str(groups_to_calibrate))



REWs [5] are located within the calibration sub-watershed
The groups [(0, 4)] will be run for calibration purposes


## Monte Carlo procedure

The hillslope calibrator will perform a simple Monte Carlo calibration on any parameters found in the `parameter_ranges` dictionary loaded above. For each iteration, parameters will be chosen using [latin hypercube sampling](https://en.wikipedia.org/wiki/Latin_hypercube_sampling) where the sampling endpoints of each parameter are determined from the values in `parameter_ranges`. For this particular calibration, the log Nash Sutcliffe will be computed: 

$$\log NSE = 1 - \frac{\sum_t \left|\log O_t - \log M_t \right|}{ \sum_t \left| \log O_t - \log \overline{O} \right| } $$

where $O_t$ denotes the observation at time $t$, $M_t$ denotes the corresponding modeled value, and $\overline{O}$ denotes the overall observation mean.


First, we specify the number (`N`) of calibration runs to perform. Each parameter realization is obtained using  to ensure adequate exploration of parameter space. 


In [3]:
def objective_function(modeled, observed):
    inds = ((modeled != 0) & (observed != 0))
    return 1 - np.sum(np.abs(np.log(observed.loc[inds]) - np.log(modeled.loc[inds])))/np.sum(np.abs(np.log(observed.loc[inds]) - np.log(np.mean(observed.loc[inds]))))

In [4]:
# specify the number of parameter sets to generate
N = 10

num_params = 0
for parameter_group in parameter_ranges.values():
    num_params += len(parameter_group)
lhd = lhs(num_params,samples=N)

# for each parameter realization
solved_subwatersheds = []
for i in range(len(lhd)):
    solved_groups = {}
    
    # alter parameter set dictionary for each parameter set realization
    # the lhs function produces parameter samples between 0 and 1, which 
    # must be re-scaled to the specified range for each parameter
    for j, parameter_group in enumerate(parameter_ranges.keys()):
        for k, parameter in enumerate(parameter_ranges[parameter_group].keys()):
            new_value = lhd[i,j*k]*(parameter_ranges[parameter_group][parameter][1] - parameter_ranges[parameter_group][parameter][0]) + parameter_ranges[parameter_group][parameter][0]
            parameter_group_params[parameter_group][parameter] = new_value

            
    solved_group_hillslopes_dict = {}
    for group_id in groups_to_calibrate:

        parameter_group_id = group_id[0]
        climate_group_id = group_id[1]

        vz = parameter_group_params[parameter_group_id]['vz'](**parameter_group_params[parameter_group_id])
        gz = parameter_group_params[parameter_group_id]['gz'](**parameter_group_params[parameter_group_id])    

        rew = REW(vz, gz,  **{'pet':climate_group_forcing[climate_group_id].pet, 'ppt':climate_group_forcing[climate_group_id].ppt, 'aspect':90})

        storage    = np.zeros(np.size(t))
        groundwater     = np.zeros(np.size(t))
        discharge       = np.zeros(np.size(t))
        leakage         = np.zeros(np.size(t))
        ET              = np.zeros(np.size(t))

        # Resample pet and ppt to integration timestep
        ppt = np.array(rew.ppt[start_date:stop_date].resample(resample_freq_hillslope).ffill())
        pet = np.array(rew.pet[start_date:stop_date].resample(resample_freq_hillslope).ffill())

        # Solve group hillslope
        for l in range(len(t)):
            rew.vz.update(dt,**{'ppt':ppt[l],'pet':pet[l]})
            storage[l] = rew.vz.storage
            leakage[l]      = rew.vz.leakage
            ET[l]           = rew.vz.ET   
            rew.gz.update(dt,**{'leakage':leakage[l]})
            groundwater[l] = rew.gz.storage
            discharge[l] = rew.gz.discharge

        # resample as daily data
        solved_groups[group_id] = pd.DataFrame({'discharge':discharge}, index=timestamps_hillslope).resample('D').mean()
        
    total_area = 0
    for rew_id in ids_in_subwatershed:
        total_area += rew_config[rew_id]['area_sqkm']
    
    name = str(i) + 'discharge'
    solved_subwatershed = pd.DataFrame({name:np.zeros(len(timestamps_hillslope))}, index=timestamps_hillslope).resample('D').mean()
 
    solved_subwatershed_array = np.zeros(int(len(solved_subwatershed)))
    for rew_id in ids_in_subwatershed:
        solved_subwatershed_array += rew_config[rew_id]['area_sqkm']/total_area*solved_groups[rew_config[rew_id]['group']]['discharge']
    
    solved_subwatershed[name] = solved_subwatershed_array
    solved_subwatersheds.append(solved_subwatershed)
    
solved_subwatersheds = pd.concat(solved_subwatersheds,axis=1)

## Model goodness of fit

Here, each model run is compared to calibration data using the objective function as defined above. The user must specify the pickled dataframe with calibration runoff data in units of cm/day. Calibration data must be available at least from `spinup_date` to `stop_date`. 

In [5]:
calibration_data_filename = 'elder_runoff.p'

calibration_data = pickle.load( open(os.path.join(parent_dir,'calibration_data',calibration_data_filename)))
calibration_data = calibration_data[spinup_date:stop_date]
col_name = calibration_data.columns[0]
calibration_data.columns = ['calibration_data']
df = pd.concat([calibration_data, solved_subwatersheds],1)

nses = []
for i in range(len(lhd)):
    name = str(i) + 'discharge'
    if int(np.sum(df[name][spinup_date:stop_date])) == 0:
        nses.append(-1)
    else:
        nses.append(objective_function( df['calibration_data'][spinup_date:stop_date], df[name][spinup_date:stop_date]))

In [6]:
best_column = str(np.argmax(nses)) + 'discharge'
i = int(best_column.replace('discharge',''))
fig = plt.figure(figsize=(8,4))
plt.plot(df[['calibration_data',best_column]][spinup_date:stop_date])
plt.legend(['Calibration data', 'Best model run (NSE = %0.2f)' % np.max(nses)])
plt.xlabel('Date')
plt.ylabel('Runoff [cm/day]')
plt.title( subwatershed_name + ' subwatershed calibration results')
html = mpld3.fig_to_html(fig)

In [7]:
print 'The best fit parameter set has an NSE of %0.2f' % (np.max(nses))

for j, parameter_group in enumerate(parameter_ranges.keys()):
    for k, parameter in enumerate(parameter_ranges[parameter_group].keys()):
        new_value = lhd[i,j*k]*(parameter_ranges[parameter_group][parameter][1] - parameter_ranges[parameter_group][parameter][0]) + parameter_ranges[parameter_group][parameter][0]
        print 'The best fit value for parameter %s in parameter group %s is %f' % (parameter, parameter_group, new_value)




The best fit parameter set has an NSE of 0.80
The best fit value for parameter a in parameter group 0 is 0.000333
The best fit value for parameter nR in parameter group 0 is 0.033277
The best fit value for parameter zrS in parameter group 0 is 40.690558
The best fit value for parameter zrR in parameter group 0 is 1146.579937


In [8]:
watershed_name = 'sf_miranda'
subwatershed_name = 'elder'

shapefile_path = os.path.join(parent_dir, 'raw_data','watershed_poly', subwatershed_name + '.shp')
basins_shape = gp.GeoDataFrame.from_file(shapefile_path)
basins_shape['coords'] = basins_shape['geometry'].apply(lambda x: x.representative_point().coords[:])
basins_shape['coords'] = [coords[0] for coords in basins_shape['coords']]
basins = basins_shape.to_crs(epsg='4326').to_json()

mapa = folium.Map([basins_shape['coords'][0][1], basins_shape['coords'][0][0]],
                  zoom_start=11,
                  tiles='Stamen Terrain')

folium.GeoJson(
    basins,
    style_function=lambda feature: {
        'color' : '#FF0000'
        }
    ).add_to(mapa)

iframe = folium.element.IFrame(html=html, width=650, height=400)
popup = folium.Popup(iframe, max_width=2650)
folium.Marker([basins_shape['coords'][0][1], basins_shape['coords'][0][0]], popup=popup, icon=folium.Icon(color='red',icon='info-sign')).add_to(mapa)

streams_path = glob.glob(os.path.join(parent_dir,'raw_data','streams_poly','*.shp'))[0]
streams_shape = gp.GeoDataFrame.from_file(streams_path).to_crs(epsg='4326')
streams = gp.GeoDataFrame(streams_shape['geometry'], crs=streams_shape.crs)
streams['RGBA'] = '#0000ff'
streams = streams.to_crs(epsg='4326').to_json()
colors = []
folium.GeoJson(
    streams,
    style_function=lambda feature: {
        'color' : feature['properties']['RGBA'],
        'weight' : 4, 
        'opacity': 1
        }
    ).add_to(mapa)

#Add watershed
shapefile_path = os.path.join(parent_dir, 'raw_data','watershed_poly', watershed_name + '.shp')
basins_shape = gp.GeoDataFrame.from_file(shapefile_path)
basins_shape['coords'] = basins_shape['geometry'].apply(lambda x: x.representative_point().coords[:])
basins_shape['coords'] = [coords[0] for coords in basins_shape['coords']]
basins = basins_shape.to_crs(epsg='4326').to_json()

folium.GeoJson(
    basins,
    style_function=lambda feature: {
        'color' : '#00ff00',
        'fillOpacity': .05
        }
    ).add_to(mapa)


calibration_output_name = subwatershed_name + '_calibration.html'
mapa.save(os.path.join(parent_dir, 'calibration_output', calibration_output_name))
mapa.save(os.path.join(calibration_output_name))

In [12]:
%%HTML
<iframe width="100%" height="600" src="elder_calibration.html"></iframe>
