# Lesson 1 - Calibration Overview

# Dynamically Dimensioned Search:
We need some general content on the DDS ... 


# Programs & Workflow:
Calibration workflow is run through a series of six individual python scripts, typically run from the command line (terminal). The python scripts must be run in order, and only once the previous script has completed. Brief descriptions of each script are provided below and more detailed explanation would be provided in the upcoming lessons. 

<p style="text-align:center;">
<img src="./images/Calibration_workflow_2021_image.png" width="600" height="600" />
</p>

*initDB.py:* 

This program is run once to initialize and set up the calibration database and associated tables used during the experiment. Completing this step will create an empty database with all the relevant tables that will be filled with information provided during the calibration procedure. More information will be provided in Lesson 2. 


*inputDomainMeta.py:* 

This program reads in a CSV file you will need to fill out that describes modeling domains to be used for calibration. This information is entered into the database for later workflow use. More description on the CSV will occur during the setup section (Lesson 2).

*jobInit.py:* 

This program is run to establish a calibration ‘experiment’. The program reads a config file 
(explained in depth below) and sets up the necessary run directories, paths to necessary files, and inputs associated metadata into the database. Upon successful completion, the program will return a unique job ID value which you will use in subsequent programs to run the calibration.

*spinOrchestrator.py*

 This is the first program that is run to initialize the calibration experiment. The only mandatory argument to this program is the unique job ID for the calibration experiment. The main purpose of this program is to run the NWM/WRF-Hydro spinup for all domains being calibrated. This program needs to successfully complete before moving onto the next step.


*calibOrchestrator.py:* 

This is the second program that is run in the calibration workflow. As with spinup.py, the only mandatory argument is the job ID value. This program runs the main workflow to adjust parameter values, execute interim model simulations, evaluate model output against observations, and further adjust parameter values. This program must be completed successfully before moving onto the next step.

*runValidOrchestrator.py:*

This is the third and final main program in the calibration workflow. The only mandatory argument is the unique job ID value associated with the calibration experiment. This program manages running the model with the final calibrated parameters over a specified evaluation period for the evaluation of the parameters.


# Pre Calibration Step 1: Configure Setup File: "setup.parm"
The primary file you will be editing in preparation for setting up a calibration workflow job is the `setup.parm` file. It is best to think of this file as a master configuration file to guide the workflow. This file contains multiple options that define how the workflow will submit jobs for models/analysis, which basins to calibrate from the database, methods for reporting errors to the user, model physics options, and paths to general parameter files and executables.
The `setup.parm` file is divided up into sections: 
* *logistics*: contains all the options related to calibration options and calibration job submission. 
* sensitivity: contains all the options related to sensibility analysis. 
* gageInfo: user could specify which gages to calibrate.  
* lsmPhysics: contains all NoahMP physics options.  
* forcing: contains the forcing type. 
* modelTime: contains all the option related to the frequency of outputting.
* hydroIO: contains all the options related to hydro outputting. 
* hydroPhysics. contains all the hydro physics options. 

Let's take a look at the this file and its content: 

In [None]:
%%bash
cat /home/docker/PyWrfHydroCalib/setup_files/setup.parm

note file: 'setup.parm' 
(Users can open a terminal session in jupyter notebook and copy/paste the following command to view this file-->
*vi setup.parm*


# Description of the setup file content. 
As mentioned above the setup file had different sections which each target a specific part of the model calibration. Below we are providing a short description of each section and its arguments. 

## logistics section: 
The first section of the setup.parm file is ‘logistics’, which guides the workflow. 

|Argument|Required|Description|
|-|-|-|
| **outDir** | Req. |Where do you want your calibration experiment to be constructed? |
|**expName** | Req. |What is the name of your experiment. This can be anything you want. |
|**acctKey** | Opt. |If you are running on a que system that requires credentials to submit a job, specify your account key here. |
|**optQueName** | Opt. |If you need to direct model simulations/jobs to a specific que, you can specify that here. |
|**nCoresModel** | Req. |How many CPUs are you running your model simulations over? |
|**nNodesModel** | Req.| If you are running across multiple nodes on a que system, specify here.| 
|**nCoresPerNode** | Req. |If you are running on a que system, specify how many CPU cores you have available per node. |
|**runSens** | Req.| Are we running sensitivity analysis? 0 - No, 1 - Yes |
|**sensParmTbl** | Req.| Path to the table of parameter values to use in sensitivity analysis. Same format as the calibration parameter table. |
|**runCalib** | Req. |Are we running calibration? 0 - No, 1 - Yes |
|**calibParmTbl** | Req.| Path to the table of parameter values to use in calibration. |
|**dailyStats** | Req. | Flag to direct worfklow to calculate error statistics on a daily scale, instead of hourly by default. Specify 1 to activate. |
|**dbBackup** |Req. | Flag to turn on/off database backup. If on, the databse will be locked and backed up once an hour during execution to the job directory output file.|
|**coldStart** | Req. |Flag to direct calibration workflow to cold start your model simulation for each iteration during calibration. Specify 1 to activate. |
|**optSpinFlag** | Req. | Flag to direct the workflow to use an alternative spinup file already in place in the input directory for the basins being calibrated. This allows the user to bypass the spinup step. Specify 1 to activate. |
|**stripCalibOutputs** | Req. |If you desire to ommit outputs during an intial window for each model iteration, you can specify 1 here to activate this feature. This was designed to minimize I/O model burdens. |
|**stripCalibHours** | Req. |Specify an initial window in hours to strip outputs. This is only used if stripCalibOutputs has been activated. |
|**jobRunType** | Req. |This will specify how the model simulations and calibration code being executed. |
|**mpiCmd** | Req. |What is the MPI command being used to execute the model simulations. This is required, as the MPI command is also used in job scheduler scripts. |
|**cpuPinCmd** |Opt. |If you are running on a job scheduler, how do you want to pin specific model simulations on a compute node? This put in to allow multiple basins on one node. |
|**numIter** | Req. |How many model iterations would you like to run your calibration experiment over? |
|**calibMethod** | Req.| Right now only DDS is allowed. Future upgrades will incorporate additional calibration methods.|
|**enableStreamflowCalib**| Req.| Specify whether we are calibrating to streamflow, 0- not calibrating, 1 - calibrating | 
|**enableSnowCalib**| Req. | Specify if snow input data will be used in calibration, 0- not calibrating, 1 -calibrating.|
|**enableSnowCalib**| Req. | Specify if soil moisture input data will be used in calibration, 0- not calibrating, 1 -calibrating.|
|**streamflowObjectiveFunction** | Req.|Select what error metric for streamflow to minimize during the calibraion experiment. |
|**snowObjectiveFunction** | Req.|Select what error metric for snow to minimize during the calibraion experiment. |
|**soilMoistureObjectiveFunction** |Req.|Select what error metric for soil moisture to minimize during the calibraion experiment. |
|**streamflowWeight**| Req. | Multiplier to streamflow objective function|
|**snowWeight**| Req.|Multiplier to the snow objective function|
|**soilMoistureWeight** |Req.|Multiplier to soil moisture objective function|
|**basinType**| Req. | Type of the baisn is used in event identification, 0 - snowy, 1- slow, 2-flashy, 3- regular|
|**weight1Event**| Req. | weight for peak bias used in event based metric (combined peak bias and volume bias)|
|**weight2Event**|Req.  | weight for volume bias used in event based metric (combined peak bias and volume bias)|
|**ddsR** | Req.| This is a DDS-specific parameter that tunes how random values are generated for each iteration. |
|**enableMask**|Req.| Specify 1 to use mask covering part of basin and not calibrating that part, or 0 otherwise (calibrating all basin)|
|**enableMultiSites**|Req.| Specify 1 to calibrate to more than 1 streamgage simultaneously|
|**email** | Req. |Where do you want status and error messages to be directed to?|
|**wrfExe** | Req.| Path to the WRF-Hydro executable to be used in the workflow. ]
|**genParmTbl** | Req. |Path to the GENPARM.TBL file used by the model. |
|**mpParmTbl** | Req. |Path to the MPARM.TBL file used by the model. |
|**urbParmTbl** |Opt. |Path to the URBPARM.TBL file used by the model. |
|**vegParmTbl** | Opt. |Path to the VEGPARM.TBL file used by the model. |
|**soilParmTbl** | Req. |Path to the SOILPARM.TBL file used by the model. |
|**bSpinDate** | Opt. |Beginning date for the spinup. |
|**eSpinDate** | Opt. |Ending date for the spinup. |
|**bCalibDate** | Req. |Beginning date for each calibration iteration. |
|**eCalibDate** | Req. |Ending date for each calibration iteration. |
|**bCalibEvalDate** | Req.| The date within each calibration iteration to begin analysis. |
|**bValidDate** | Opt. |Beginning date for the validation simulation. |
|**eValidDate** | Opt. |Ending date for the validation simulation. |
|**bValidEvalDate** | Opt. |The date within the validation simulation to begin analysis. |

### Sensitivity: 
Contains all the options relevant to the sensitivity analysis. 

|Argument|Required|Description|
|-|-|-|
|**sensParmSample** | Opt. |Sensitivity parameter sample size|
|**sensBatchNum** | Opt. | How many sensitivity simulations to run at once. |
|**bSensDate** | Opt. |Beginning date for the sensitivity model simulation. |
|**eSensDate** | Opt. |Ending date for the sensitivity model simulation. |
|**bSensEvalDate** | Opt. |The date within the sensitivity simulation to begin analysis. |

### gageInfo: 
User could specify a subset of gages to be calibrated. 

|Argument|Required|Description|
|-|-|-|
|**gageListSQL** | Req. |SQL command to extract basins for calibration out of the database file. |
|**gageListFile** | Opt. |Alternative list of basins to calibrate instead of using an SQL command. |

### lsmPhysics:
LSM physics options as well as other LSM related options. 

|Argument|Required|Description|
|-|-|-|
|**SplitOutputCount**| Req.| Output Options: 1 - output 1 file per output time step, 0 - append all the timesteps to one file called LDASOUT_DOMAIN1.nc |
|**dynVegOption** | Req. |DYNAMIC_VEG_OPTION for NoahMP. |
|**canStomResOption** | Req. |CANOPY_STOMATAL_RESISTANCE_OPTION for NoahMP. |
|**btrOption** | Req. | BTR_OPTION for NoahMP. |
|**runoffOption** | Req.| RUNOFF_OPTION for NoahMP. |
|**sfcDragOption** | Req. |SURFACE_DRAG_OPTINO for NoahMP. |
|**frzSoilOption** | Req. |FROZEN_SOIL_OPTION for NoahMP. |
|**supCoolOption** | Req. |SUPERCOOLED_WATER_OPTION for NoahMP. |
|**radTransferOption** | Req. |RADIATIVE_TRANSFER_OPTION for NoahMP. |
|**snAlbOption** | Req.| SNOW_ALBEDO_OPTION for NoahMP. |
|**pcpPartOption** | Req. |PCP_PARTITION_OPTION. |
|**tbotOption** | Req. |TBOT_OPTINO for NoahMP. |
|**tempTimeScOption** | Req.| TEMP_TIME_SCHEME_OPTION for NoahMP. |
|**sfcResOption** | Req. |SURFACE_RESISTANCE_OPTION for NoahMP. |
|**glacierOption** | Req. |GLACIER_OPTION for NoahMP. |
|**soilThick** | Req. |Soil thicknesses for specified soil layers in NoahMP. |
|**zLvl** | Req.| Level of wind speeds in NoahMP. |

### forcing:
User defined forcing type: 

|Argument|Required|Description|
|-|-|-|
|**forceType** | Req. |Specified forcing type. |

### modelTime:
Contains all the options that control model temporal resolution (LSM), the frequency of model outputting and restart files frequency. 

|Argument|Required|Description|
|-|-|-|
|**forceDt** | Req. |Input forcing timestep in seconds. |
|**lsmDt** | Req. |NoahMP timestep in seconds. |
|**lsmOutDt** | Req. |NoahMp output timestep in seconds. |
|**lsmRstFreq** | Req. |NoahMp restart frequency in seconds. |
|**hydroRstFreq** | Req. |WRF-Hydro restart frequency in seconds. |
|**hydroOutDt**  |Req. |WRF-Hydro output frequency in seconds. |

### hydroIO
Contains all the options related to the hydro outputting. 

|Argument|Required|Description|
|-|-|-|
|**rstType** | Req. |Flag for overwritting accumulation vars in restart file. |
|**SplitOoutputCount**| Req.| 1- output one CHANOBS file per timestep, 0 - output one file containing all timesteps: CHANOBS_DOMAIN1.nc|
|**ioConfigOutputs** | Req. |Output flag for varible grouping in WRF-Hydro. |
|**ioFormOutputs** | Req. |Flag to for specifying output format. |
|**chrtoutDomain** | Req. |Flag to turn on CHRTOUT_DOMAIN files. |
|**chanObsDomain** | Req. |Flag to turn on CHANOBS_DOMAIN files. This needs to be set to if enableStreamflowCalib is set to 1. |
|**chrtoutGrid** | Req. |Flag to turn on CHRTOUT_GRID files. |
|**lsmDomain** | Req. |Flag to turn on LSMOUT_DOMAIN files. |
|**rtoutDOmain** | Req.| Flag to turn on RTOUT_DOMAIN files. |
|**gwOut** | Req. | Flag to turn on GWOUT_DOMAIN files. |
|**lakeOut** | Req. |Flag to turn on LAKEOUT_DOMAIN files. |
|**frxstOut** | Req. |Flag to turn on FRXST output text files. |
|**resetHydroAcc** | Req. |Flag to reset accumulation variables in the restart files. |
|**streamOrderOut** | Req.| Flag to specify the minimum Strahler order to output. |

### hydroPhysics
Hydro physics options as well as few other related options such as model time step. 

|Argument|Required|Description|
|-|-|-|
|**dtChSec** | Req.| Channel routing timesetp in seconds. |
|**dtTerSec** | Req. |Surface and subsurface routing timestep in seconds. |
|**subRouting** | Req.| Flag to turn on/off subsurface routing. |
|**ovrRouting** | Req. | Flag to turn on/off overland flow routing. |
|**channelRouting** | Req. | Flag to turn on/off channel routing. |
|**rtOpt** | Req. | Overland/subsurface routing option. |
|**chanRtOpt** | Req. |Channel routing option. |
|**udmpOpt** | Req. | User-defined spatial mapping flag to turn on/off. |
|**gwBaseSw** | Req. | Groundwater option. |
|**gwRestart** | Req. | Flag to use restart states in groundwater scheme. |
|**enableCompoundChannel** | Req. | Flag to activate compound channel in the hydro.namelist. 1 - on, 0 - off.|
|**compoundChannel** | Req. |Activation flag for compound channel. enableCompoundChannel must be on. |
|**enableGwBucketLoss** | Req. |Flag to activate groundwater bucket loss in hydro.namelist. 1 - on, 0 - off. |
|**bucket_loss** | Req.| Activation flag for groundwater bucket loss. enableGwBucketLoss must be on. |

## Pre Calibration Step 2: Configure Calibration Parameter Selection "calib_parms.tbl"
In addition to the `setup.parm` file, the `calib_parms.tbl` file is needed to direct the workflow to determine which model parameters will be calibrated, along with the range of parameter values to be used and the default value for the parameter. A template table is located under `PyWrfHydroCalib/setup_files/calib_parms.tbl` which you can copy and edit for your own calibration experiment.


In [None]:
%%bash
cd /home/docker/PyWrfHydroCalib/setup_files
cat calib_params.tbl

Within this table, you will find:

- Parameter -- all the potential parameters to be calibrated
- calib_flag -- of 1 or 0. This flag will turn calibration on (1) for that parameter or off (0). 
- minValues -- The minimum range value for parameter calibration
- maxValues -- The maximum range value for parameter calibration
- ini -- specifies the default values to be used for either default un-calibrated values, or the initial values going into the calibration workflow. 

We have summarized the parameters that were used in NWMv21 calibration and the newly exposed parameters that we are testing right now as part of NWMv30 RnD in the below table. 

|Name|Units|Multiplier|NWMv21|NWMv30|Description|
|-|-|-|-|-|-|
|**BEXP** | dimensionless| Yes | Yes | Yes | *Pore size distribution index*: BEXP controls the shape of the soil water retention curve, and therefore how slowly/quickly water will move through the soil column. It can have any positive values where small values indicate the medium having a wide range of pore sizes and high values indicating more uniform pore size distribution. Higher values of the BEXP will result in higher effective saturation values and consequently higher relative hydraulic conductivity. With higher BEXP values, the water will move faster and one would observe higher peak values in the hydrographs and lower baseflow values at interevents. |
|**SMCMAX** | volumetric fraction| Yes | Yes | Yes | *Saturation soil moisture content (i.e., porosity)*|
|**DKSAT** |m/s | Yes | Yes | Yes | *Saturated hydraulic conductivity*:As with most physically-based hydrological models, the soil saturated hydraulic conductivity (DKSAT) controls the speed at which water moves through the subsurface. This is a sensitive parameter in the model, and while easy to measure at the point scale, DKSAT is tricky to estimate at the scale of kilometers. Initial values are estimated based on soil texture class, but reported ranges have large (many orders of magnitude) variability.|
|**RSURFEXP** | dimensionless| No | Yes | Yes | *Exponent in the resistance equation for soil evaporation*: RSURFEXP controls the shape for the resistance curve as it relates to soil moisture, higher RSURFEXP will result in larger resistance for a given soil moisture and hence less soil evaporation.|
|**REFKDT** | unitless| No | Yes | No | *Surface runoff parameter*: REFKDT is a surface runoff parameter that controls how easily precipitation reaching the surface infiltrates into the soil column versus staying on the surface where it can become surface runoff. Higher values of REFKDT lead to more infiltration and less surface (fast) runoff. This tunable parameter can be set to a relatively high value (e.g., 3.0) suitable for running the column land surface model only. When activating terrain routing to explicitly model these processes, we often reduce this parameter. It should be also noted that this parameter is time step dependent and in the case of finer LSM timestep you may want to reduce this parameter to compensate for the more frequent calls to the vertical infiltration scheme.|
|**SLOPE** |unitless | No | Yes | Yes | *Linear scaling of "openness" of bottom drainage boundary*: This is another important parameter that affects water partitioning in Noah/NoahMP. Originally estimated based on land surface topography (hence the name SLOPE), the SLOPE parameter actually controls how open or closed the bottom boundary of the soil column is. Values range from 0 to 1, where 0 is a completely closed bottom boundary and 1 is completely open. Lower SLOPE values will keep more water in the soil column, while higher values will allow more water to drain to the channel or to deeper baseflow stores, depending on the selected baseflow physics options.|
|**RETDEPRTFAC** | unitless | No | Yes | Yes | *Multiplier on retention depth limit*: Ponded water on the surface above this retention depth threshold can be moved around the landscape via overland flow. Maximum retention depth is variable by terrain slope where higher slopes have lower maximum retention depth, while flat areas have higher initial values. The default value in the NWM code is quite small (~0.001mm), however, in many regions landscape features like wetlands, small detention ponds, and heavy vegetation litter/debris can trap water on the land surface. Increasing the RETDEPRTFAC multiplier will hold more ponded water on the surface before it becomes runoff.|
|**LKSATFAC** |unitless | No | Yes | Yes | *Multiplier on lateral hydraulic conductivity (controls anisotropy between vertical and lateral conductivity)*: By default, lateral conductivity matches vertical conductivity. However, in the real world we frequently see many orders of magnitude higher conductivities in the lateral direction vs. the vertical direction (due to soil stratigraphy, preferential flow paths caused by roots and animals, etc.). LKSATFAC is a parameter to adjust the lateral saturated hydraulic conductivity and account for this anisotropy.
|
|**ZMAX** | mm | No | Yes | Yes | *Maximum groundwater bucket depth before "spilling" occurs.* It should be noted that ZMAX has no physical basis as the bucket model is a simple conceptualized model |
|**EXPON** |dimensionless | No | Yes | Yes | *Exponent controlling rate of bucket drainage as a function of depth*|
|**LOSS** | unitless | No | No (used in Hi) | Testing | |
|**CWPVT** | 1/m | Yes | Yes | Yes | *Canopy wind parameter for canopy wind profile formulation*: It is a canopy wind parameter absorption/attenuation which helps to remove saturated air from the canopy. Higher value meaning more wind attenuation from the canopy, which reduces the transpiration. Higher CWPVT values also reduce snow sublimation which result in more snow in the pack and increasing SWE as well as pushing the melt season slightly further back in the season. |
|**VCMX25** | unol/m2/s | Yes | Yes | Yes | Maximum carboxylation at 25C|
|**MP** | unitless | Yes | Yes | Yes | *Slope of Ball-Berry conductance relationship*: larger values of mp indicates the leaf consumes more water to produce the same carbon mass and therefore will have greater transpiration|
|**MFSNO** | dimensionless | Yes | Yes | Yes | *Melt factor for snow depletion curve* (relationship between the snow covered fraction and snow depth) in the melting season. Increasing MFSNO yields a smaller snow cover fraction for the same snow height, reduces SWE in the snowpack and moves the timing of snowmelt forward in time. Therefore, the streamflow peaks are earlier in time and lower for larger values of MFSNO.|
|**SSI** | unitless | No | No | Testing | |
|**TAU0** | unitless | No | No | Testing | |
|**SCAMAX** | unitless | No | No | Testing | |
|**RSURFSNOW** | unitless | No | No | Testing | |
|**SNOWRETFAC** | unitless | No | No | Testing | |


### Sensitivity Analysis: 
It is highly encouraged to perform a sensitivity analysis over your region of interest to help determine which parameters have significant impact on hydrologic response. A limited sensitivity study was performed on a subset of the above parameters in which the sensitivity was measured with respect to change in the streamflow error metrics such as bias and correlation coefficient as a result of parameter change. It showed that BEXP, SMCMAX, DKSAT and MP are the most sensitive parameters when the sensitivity is measured based on streamflow biases. In addition to this list, ZMAX, EXPON, SLOPE, and REFKDT also become sensitive parameters when looking at the error metrics such as correlation coefficient where the timing of the flow is of importance. There was also another sensitivity anlsysis study focused on the snow parameters where it was shown CWPVT, MFSNO, TAU0, SNOWRERFAC, RSURFSNOW, SCAMAX and SSI are the most sensitive parameters among the snow related parameters. 


It should be noted that it is up to user to determine what range is best for your calibration experiment, the suggested range above is based on the previous literature review and our team's best estimate. 


### Absolute value versus multipliers: 
Some of these parameters are a function of vegetation type (CWPVT, VCMX25, MP) or soil type (BEXP, SMCMAX, DKSAT), and therefore we use a multiplier number in calibration to preserve the spatial pattern of those parameters. MFSNO is also formulated as a multiplier in the calibration code, since in the earlier NWM version. The rest of the parameters are substitute values and will overwrite the default values. 


### Full Routing versus Long Range configuration: 
It should be noted that the RETDEPRTFAC and LKSATFAC parameters are only active in full physics and it is not used and calibrated in the long range configuration since the terrain routing is inactive.

## Conclusion:

Once the setup.parm file and calibration parameters are set and verified, we can begin the process of initializing databases for calibration. Proceed to *Lesson 2 - Create Database*