# Lesson 4 - Interpret Output // STATUS - EDIT MODE

## Calibration Diagnostic: 
For each iteration of the calibration all the metrics listed in Table below are calculated and a number of plots are generated to ease the assessment of the progress made at a given iteration. These plots help to check whether the calibration is heading toward the right direction and diagnose for possible bugs and disruption in the calibration process. Table below summarized all the metrics that are currently calculated as a part of calibration process. 

**PATRICK: INSERT TABLE HERE**

After 1st iteration model (WRF-Hydro) run is complete, then a script called `calib_workflow` will do the followings:
* Read the model simulated flows
* Pair the model simulations with the observed streamflow
* Calculate the error metrics and objective function
* Generate a new parameter set for next iteration
* Generate a series of diagnostic plots. A directory called `plots` will be populated in the `RUN.CALIB` directory with a number of plots depending on what options have been activated in the `setup.parm` file. The example in the previous lessons was using only the streamflow observation for calibration and therefore there is only plots related to the progress of parameters as well as streamflow error metrics and hydrographs. 

Let's take a look at few of these plots. 


In [None]:
%%bash 
# List all the plots generated while calibration is progressing 
ls /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots

As seen above some of the plots are duplicated with the tag name `outlier`. This is added to the calibration workflow to remove the very large numbers (ouliers) from the plots so they are readable. Figures with the tag `_outlier` are containing all the interations and are the ones that we will check here. Let's begin with checking how the objective function is progressing with iterations. 

In [None]:
%%bash 
## PATRICK TO DO: 
# bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots/01447720_calib_run_obj_outlier.png

The above plot keep tracks of how objective function defined in the `setup.parm` file is progressing with iterations. It should be noted that DDS is a minimization algorithm. If user select an objective function like KGE that the ideal value is the max value, then in the calibration code we are using 1-KGE as objective function. The same goes for different variant of NSE, correlation coefficient and lbem. A healthy calibration procedure would asymptote at the higher iterations numbers when we are getting close to the user defined number of iterations. The red start shows the iterations which has the best results so far. 

Next, we will take a look at how other error metrics are progressing with iterations next. 

In [None]:
%%bash 
## PATRICK TO DO: 
# bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots/01447720_metric_calib_run_outlier.png

The above figure shows how different streamflow metrics are evolving with iterations. Ideally the overall trend is that with the improvement of the objective function, all other metrics improve as well. However, that is not the case always and therefore we keep an eye on the performance of the model to make sure we are not degrading other model performance aspects during calibrtion. 

**AREZOO**: Lets point out the details on categorical metrics and event based ones if not specified clearly in the table. 

Plot below is the equivalent plot from official NWMv21 calibration. Note that the objective function in NWMv21 was 1 - weighted NSE and log NSE while the objective function used in this exercise is 1 - KGE. As part of NWMv30 RnD, we tested several different objective functions and decided to use KGE for NWMv30 onboarding, and therefore used in this training. Also note, there is a number of metrics that are reported in NWMv30 that did not exists in NWMv21 calibration. In NWMV21 for this gage with improvements in the objective function, we also improved other metrics such correlation coefficient, KGE and etc whihc is the desired outcome. 

<p style="text-align:center;">
<img src="./images/01447720_metric_calib_run_outlier.png" width="600" height="600" />
</p>

Let's take a look at how parameters have evolved with calirbation progress. 


In [None]:
%%bash 
## PATRICK TO DO: 
# bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots/01447720_parameters_calib_run_outlier.png

**AREZOO** revisit this in case you add DDS description in lesson1. 

The above plot shows how parameters are changing with iterations, it is one way to make sure you are calibrating all the parameters that you flagged. Note that in the early iterations, a larger subset of parameters are being perturbed and as we reach to the end of calibration (larger iterations), fewer parameters are purtubed. It might be more obvious in plot below which is for the same gage from NWMv21. 

<p style="text-align:center;">
<img src="./images/01447720_parameters_calib_run_outlier.png" width="600" height="600" />
</p>

In initial iterations, the DDS algorithm searches globally and as the procedure approaches the maximum user-defined number of iterations, the search transitions from a global to a local search. This transition from a global to local search is achieved by dynamically and probabilistically reducing the search dimension which is the subset of the calibration parameters that will be updated in a given iteration. 

The probability of a parameter to be chosen for inclusion in the search is equal to P(i) = 1 - ln(i)/ln(m), where i is the iteration number and m is the maximum iteration number. Therefore the possibility of a parameter to be chosen reduces with increase in iteration numbers. In the initial iterations almost all the parameters will be modified and as it approaches the maximum number of iterations it will only modify a few parameters or only one. Parameters selected in each iteration are perturbed within the defined parameter range. The suggested lower and upper limits were shown in lesson 1. The limits are selected based on previous literature review and experts opinion. 

The maximum number of iterations used in the previous versions of NWM calibration was set to 300 except for the domains that were too large (> 5000 km2) in that case 150 iterations were used for the calibration.  

Next we will take a look at streamflow hydrograph and the scatter plots. 

In [None]:
%%bash 
## PATRICK TO DO: 
bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots/01447720_hydrograph.png
bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.CALIB/plots/01447720_scatter.png

This plot has only the model simulations with the default parameters (first iteration), the best parameters and the last iterations. We do not plot all iterations in order to have a cleaner picture. However, all the time series are saved in a Rdataset and could be used afterward to plot any other iterations if required. We will descript this Rdataset and how to pull info from it in a separate lesson. Below plots are the equivalent plots from NWMv21 for the same gage. 

<p style="text-align:center;">
<img src="./images/01447720_hydrograph.png" width="600" height="600" />
<img src="./images/01447720_scatter.png" width="600" height="600" />
</p>

## Validation Diagnostics:

After calibration is finished, the model runs for both the default and best parameters (from calibration step) for the full duration of (calibration/validation) and metrics are calculate for calibration, validation and full period. Finally a set of diagnostic plots are generated depending on what options user has selected in the `setup.parm` file. We have only calibrated using streamflow in the example provided in lesson 2 and 3, and therefore only streamflow plots are provided here. Let; take a look at the metrics first. 


In [None]:
%%bash 
## PATRICK TO DO: 
bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.VALID/plots/01447720_valid_metrics.png

This plot summarizes how model performance has changed from default to best calibration iterations as well as how they perform during an independent time period (validation). Usually the performance of the validation period is not as good as the calibration period. 

The main task after calibration and validation workflow finishes is the classification of the calibration basin to Donor, Keep and Drop basins (definitions provided below) that will be used for the regionalization. Definition of Donor, Keep and Drop basins is as follows:

* **Donor basins**: basins that the calibrated model performance is good, and the parameters are good to be transferred to other ungaged/uncalibrated locations. 
* **Keep basins**: basins where the calibration improved the model statistics, however we do not believe the parameters are good enough to be used for the ungages/uncalibrated locations. Therefore, the basin parameters will be kept for the basin itself but it will not be donated to other uncalibrated locations.
* **Drop basins**: calibration was not beneficial, and therefore the parameters from calibration will not be kept for those basins. These basins will receive parameters from a donor basin in the regionalization process similar to uncalibrated areas. 

**AREZOO** Check whether we want to discuss the criteria's used for the selection of donors or not ... 

Below is the same plot from NWMv21. As one can see all the metrics improved compared to the default for both the calibration and validation period. 

<p style="text-align:center;">
<img src="./images/01447720_valid_metrics.png" width="600" height="600" />
</p>


Let s also check out the hydrograph and scatter plots for our current experiment. 

In [None]:
%%bash 
## PATRICK TO DO: 
bring this plots /home/docker/example_case/Calibration/output/example1/01447720/RUN.VALID/plots/01447720_valid_hydrogr.png and this one 01447720_valid_scatter.png

Here is the equivalent plots from NWMv21. 


<p style="text-align:center;">
<img src="./images/01447720_valid_hydrogr.png" width="600" height="600" />
<img src="./images/01447720_valid_scatter.png" width="600" height="600" />
</p>


**AREZOO** Add description on how things were tuned in NWMv21 .... change in paramters and how they impacted change in the hydrograph ... 

## Conclusion:

We reviewed the plots that are generated during calibration and after validation and described the type of the information they provide. 

In [None]:
### DEV_ END

In [None]:
# table template

| Filename | Description | Source | Required for NWM V2.0 |
| ------------- | ------------- | ------------- | ------------- |