# Module 08: Model Calibration – Assignment

In this assignment, you will extend the work we have done on model calibration to explore the following questions:

1. When we calibrate the model on RMSE, how sensitive are our optimized parameters to our initial guess? And
2. How different are the results when we calibrate the model on MAE? 

To address these questions, you will calibrate the model for four sets of initial guesses that correspond effectively to the four "corners" of our feasible space of parameters `DD` and `Tt`. You will plot the resulting calibrated values of these parameters to investigate the degree to which your calibrated parameters depend on where on the RMSE objective function surface you started. Then you'll repeat the analysis, but instead of calibrating to RMSE, you'll write code to calibrate the Snow-17 model to MAE. 

## 1. Notebook Setup

Below we load the libraries we will need and initialize important variables

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import optimize
import numbers

# The name of file that contains forcing and observed SWE during every day of water year 2001-2020
forcing_fname = 'EastRiver_hydro_data_2001-2020.csv'

# The name of the file containing parameter combinations and associated KGE values from our
# sensitivity analysis
saved_error_metric_values_fname = 'Snow17_sensitivity_analysis.csv'

date_beg = '2000-10-01' # This is the first day of water year 2016
date_end = '2020-09-30' # This is the last day of water year 2020

DD_i = np.array([2.0, 2.0, 9.0, 9.0]) # Initial guesses for degree-day factor
Tt_i = np.array([0.0, 6.0, 6.0, 0.0]) # Initial guesses for temperature threshold


## 2. Load the Forcing Data

This data corresponds to the same East River watershed data we have been using throughout this module.

In [None]:
# Read in the forcing data data
df_forcing = pd.read_csv(forcing_fname)

# Reindex to create make sure that the index for the dataframe is a datetime64 object
df_forcing['Date'] = pd.to_datetime(df_forcing['Date'],format='%Y-%m-%d')
df_forcing.index = df_forcing['Date']

ForcingDates = df_forcing[date_beg:date_end]['Date'].values
P_exp = df_forcing[date_beg:date_end]['pcp'].values
Ta_exp = df_forcing[date_beg:date_end]['tair'].values
SWE_o = df_forcing[date_beg:date_end]['SWE'].values

t = pd.date_range(start=date_beg, end=date_end, freq='1D')

# Here's what a pandas "dataframe" looks like:
df_forcing

## 3. Define Some Critical Functions

### 3.1 Create a Function Defining the Snow-17 Model

In [None]:
def Snow17(Ta,P,DD,Tt):
    
    assert Ta.shape == P.shape, 'Precipitation and Temperature vectors must have the same shape'
    assert isinstance(DD, numbers.Number), 'Degree day coefficient must be a scalar'
    assert isinstance(Tt, numbers.Number), 'Temperature threshold must be a scalar'

    Nt = np.max(Ta.shape)
    
    SWE_s17 = np.zeros(Ta.shape)
    Sm_s17 = np.zeros(Ta.shape)
    Pliq_s17 = np.zeros(Ta.shape)
    
    for i in np.arange(Nt):

        P_i  = P[i] # The value of precipitation on this date
        Ta_i = Ta[i] # The value of average air temperature on this date

        # Initial conditions: we are starting when there should not be any appreciable snow in the watershed, 
        # so we will assume that SWE = 0. If you decide to run another date when there might be snow (e.g., Jan. 1)
        # then you would need a more realistic value of SWE.
        if(i==0):
            SWE_i = 0.0 
        else:
            SWE_i = SWE_s17[i-1] # The initial SWE on these dates is simply the SWE from the day before. We will add snow or subtract melt.
            
        # If SWE is greater than zero, there *may* be snowmelt
        if(SWE_i>0.0):
            if(Ta_i>Tt): # If the air temperature is greater than the threshold, there **will** be melt
                Sm_i = DD*(Ta_i-Tt) # Snowmelt via degree-day factor
            else: # If the air temperature is below the threshold, there is no melt
                Sm_i = 0.0 # No snowmelt if temperature does not exceed threshold
        else: # If there is no SWE, by definition there is no snowmelt
            Sm_i = 0.0
        
        # If there is precipitation, figure out its phase
        if((P_i>0.0) and (Ta_i<=Tt)):
            SWE_i += P_i # All precip will be added to SWE storage
            Pliq_i = 0.0 # There is no liquid precipitation
        elif((P_i>0.0) and (Ta_i>Tt)):
            Pliq_i = P_i # All precipitation falls as liquid. NOTE: We are assuming rain does not melt snow!!!
        else: # If there is no precipitation, there is nothing to accumulate
            Pliq_i = 0.0
        
        SWE_s17[i] = np.max([SWE_i - Sm_i,0.0]) # Make sure we can only melt as much SWE as there is. This only matters at low SWE
        Sm_s17[i] = Sm_i # Save the snowmelt... QUESTION: Is this something we can observe?!?!?!?!
        Pliq_s17[i] = Pliq_i
        
        
    return SWE_s17, Sm_s17, Pliq_s17

### 3.2 Create a Function to Calculate RMSE

In [None]:
def RMSE(y_m,y_o):
    
    # Inputs: 
    # y_m: Modeled time series
    # y_o: Observed time series
    
    RMSE = np.sqrt(np.nanmean((y_m - y_o)**2))
    
    return RMSE

### 3.3 Create an Objective Function Based on RMSE

In [None]:
def objective_function_rmse(params):
    DD_exp, Tt_exp = params # Get DD and Tt parameters
    
    # 1. Call Snow-17 model 
    SWE_m, Sm_m, Pliq_m = Snow17(Ta_exp,P_exp,DD_exp,Tt_exp)
    
    # 2. Get RMSE value for simulated SWE
    RMSE_exp = RMSE(SWE_m,SWE_o)

    # 3. Return RMSE because the optimization function we're using seeks minimization
    return RMSE_exp
    

## 4. Calibrate on RMSE

Below, find the lines with comments labelled `TODO:` and insert/modify the code appropriately

In [None]:
N_ig = DD_i.size # The number of initial conditions being considered

# TODO: Create containers to store optimized DD and Tt 


for i in np.arange(N_ig):
    initial_guess = # TODO: Get the initial conditions for this combination

    # Calibrate the model based on this initial guess
    optimized_params_rmse = optimize.minimize(
        objective_function_rmse,
        initial_guess, 
        method='CG',
        jac='3-point',
        options={
            'disp': True,
            'maxiter': 2000,
        }
        )

    # Print the values of the optimized parameters to the screen
    print("Optimized Parameters:", optimized_params_rmse.x)

    # TODO: Store the calibrated parameters in the array you created above
     = optimized_params_rmse.x[0]
     = optimized_params_rmse.x[1]



### 4.1 Plot the Results

In [None]:
# TODO: 
# 1. Read in the RMSE surface we created in the sensitivity analysis/brute force calibration notebook
# 2. Use the contour() and contourf() to create a plot of the RMSE surface
# 3. Plot the initial guesses on the plot as large Xs (see the markersize property in matplotlib)
# 4. Plot the optimized guesses on the plot as large Os

## 5. Now Calibrate to MAE

Now you will repeat the above analysis, but instead calibrate the model to the mean absolute error (MAE), an error metric we also examined in our sensitivity analysis. 

### 5.1 Define a Function to Calculate MAE

In [None]:
def MAE(y_m,y_o):
    # TODO: Add code here
    
    return MAE

### 5.2 Create an Objective Function Based on MAE

In [None]:
def objective_function_mae(params):
    # TODO: Get the parameters
    
    # TODO: Calculate SWE with input parameters
    
    # TODO: Call function to calculate MAE
    
    return MAE_exp

### 5.3 Calibrate the Model to MAE

Use the code immediately below section 4 above to calibrate the model. Use the same 4 initial guesses we defined at the top of the notebook.

In [None]:
# TODO: Insert code to call `optimize.minimize()` for each initial guess and using MAE

### 5.4 Plot the Results

In [None]:
# TODO: 
# 1. Read in the MAE surface we created in the sensitivity analysis/brute force calibration notebook
# 2. Use the contour() and contourf() to create a plot of the MAE surface
# 3. Plot the initial guesses on the plot as large Xs 
# 4. Plot the optimized guesses on the plot as large Os

## 6 Reflection Questions

Answer the following reflection questions:

1. How sensitive are the optimized parameters to the initial guess? Did calibrating to MAE result in appreciably different results when compared to calibrating to RMSE? 
2. What are the implications of the above results? How would you coach a colleague who is considering creating a Snow-17 model for a different location in, for example, Washington or Idaho?
3. How generalizable do you think your conclusions are for __*other*__ models? Based on what we've done in this module? How might you approach calibration of a model that is of a completely different process but of similar complexity? For example, say you needed to calibrate the advection-dispersion code to a time series of contaminant concentration in an observation well some known distance away from an oil spill with a known date?  