# Lesson 2 - Initialization

## Overview:
Once you have the necessary library dependencies installed, you have the model domain setup, and you have a compiled executable for WRF-Hydro, you are ready to begin setting up your calibration workflow. Next steps are to review the list of the parameters that you would like to calibrate, and finalize the options in the setup file. We reviewed both of these files in Lesson 1 and we are going to use the default setup up options given in Lesson 1 for this exercise. 

In this lesson, we will go through the first 3 steps of the 6 steps required in the calibration process and the remaining of the steps will be covered in next lesson (Lesson 3). Here, we will 

1. Create an empty SQL database to store all of the necessary and auxiliary metadata for the calibration experiments, and ultimately the final results of the basin calibration. 
2. Enter the basin metadata into the SQL database.
3. Initialize an experiment using the setup file reviewed in Lesson 1. 


## Step 1: Initialize Database:

Most of the information about the calibration, domains, different experiments, status of the jobs and results of calibration will be stored in a database. This database has mainly 6 tables as follows: 

| Database Table | Description | 
| ------------- | ------------- |
|*Domain_Meta*|  contains metadata about each basin/domain you are using in your calibration efforts. There is no limit to how many ‘domains’ you can enter in here as it depends on the scope of the experiment. Information is entered into this table by the inputDomainMeta.py program. |
| *Job_Meta* | contains metadata about the calibration experiment being run. This information is entered into the table when jobInit.py is successfully run to completion. Key table variables (calib_complete, valid_complete, su_complete) are updated throughout the calibration workflow as specific tasks are completed. Additionally, other table variables from this table are used during the workflow to create necessary namelist files and symbolic links to the necessary input files to execute the model. Note that there are a few more columns which are related to the sensitivity analysis available in the package, slack messaging and job submission properties that are not explained here.  |
| *Job_Params* | describes the various parameters chosen for calibration for this particular experiment across all the basins. |
| *Calib_Params* | a dynamic table updated as the calibration workflow works through the model iterations. The table describes the parameter values calculated after every model iteration and stores them in this table. |
| *Calib_Stats* | a dynamic table updated as the calibration workflow works through the model iterations. The table describes the analysis statistics for each model iteration, along with a status value that aids the workflow in monitoring jobs.  |
| *Valid_Stats* | is a table that describes the error metrics associated with both the default parameter values chosen at the beginning of the experiment, along with the final calibrated values |

To create a database with the above tables (empty tables), you must execute the initDB.py program. 

Run the following command:

In [None]:
%%bash

# create a folder for all the calibration 
mkdir /home/docker/wrf-hydro-training/output/Calibration/

python /home/docker/wrf-hydro-training/PyWrfHydroCalib/initDB.py --optDbPath /home/docker/wrf-hydro-training/output/Calibration/DATABASE.db

Running the above command, will create a database with empty tables as described above. 

**Important Note**: Remember you only need to execute this program once. If you try to run it again, you will receive an error indicating you have already created the database:
“ERROR: PATH/TO/DATABASE.db Already Exists. ”

## Step 2: Entering Basin Information into the Database
Next step is to add the domain information into the database and fill out the required information for the `Domain_Meta` table. Some of the content of the `Domain_Meta` table should be provided by the user and some are filled out by the python workflow. The Table below lists all the fields in the `Domain_Meta` table that should/could be provided by the user and their description. 


| Filename | Optional | Description | 
| ------------- | ------------- | ------------- |
|**gage_id**|No|A character entry that describes the ID associated with whatever stream gauge the user is calibrating against. For example, a USGS ID, CA DWR ID, etc.|
|**link_id**| No|A unique integer feature_id value that associates an unique point in the streamflow output files to the stream gauge being used for calibration. The user will need to determine which feature_id in their output files corresponds to the observation point being calibrated.|
|**domain_path**|No| A character entry that points to a directory containing all necessary input domain files for this associated domain (i.e. geogrid file, wrfinput file, etc). It is assumed that a ‘FORCING’ subdirectory will be contained within this path that contains necessary input forcing files. The user can also create a symbolic link to the actual forcings directory as well. There should also be an OBS directory that has the obsStrData.Rdata file containing the streamflow observations. |
|gage_agency|Yes|A character entry that describes the agency in charge of the reporting stream gauge (i.e. USGS, CA DWR, etc).|
|geo_e|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 1 km domain row that specifies the eastern edge of your cutout domain. If the domain is not a NWM cutout, this value will be -9999.|
|geo_w|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 1 km domain row that specifies the western edge of your cutout domain. If the domain is not a NWM cutout, this value will be -9999.|
|geo_s|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 1 km domain row that specifies the southern edge of your cutout domain.  If the domain is not a NWM cutout, this value will be -9999.|
|geo_n|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 1 km domain row that specifies the northern edge of your cutout domain. If the domain is not a NWM cutout, this value will be -9999.|
|hyd_e|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 250 meter domain column that specifies the eastern edge of your cutout domain.  If the domain is not a NWM cutout, this value will be -9999.|
|hyd_w|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 250 meter domain column that specifies the western edge of your cutout domain. If the domain is not a NWM cutout, this value will be -9999.|
|hyd_s|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 250 meter domain row that specifies the southern edge of your cutout domain.  If the domain is not a NWM cutout, this value will be -9999.|
|hyd_n|Yes|An integer entry describing the cutout from the parent NWM domain. Specifically, the NWM conus 250 meter domain row that specifies the northern edge of your cutout domain. If the domain is not a NWM cutout, this value will be -9999.|
|site_name|Yes|A character entry giving a description of the stream gage|
|lat|Yes|The floating point latitude of the location of the stream gage|
|lon|Yes|The floating point longitude of the location of the stream gage|
|area_sqmi|Yes|The area of the watershed in square miles|
|area_sqkm|Yes|The area of the watershed in squared km|
|county-cd|Yes|An integer entry describing the county code for where the stream gage resides|
|state|Yes|The state the stream gage resides within|
|huc2|Yes|The HUC2 that this basin falls within|
|huc4|Yes|The HUC4 that the basin falls within|
|huc6|Yes|The HUC6 that the basin falls within|
|huc8|Yes|The HUC8 that the basin falls within|
|ecol3|Yes|The ecological level 3 region the basin falls within|
|ecol4|Yes|The ecological level 4 region the basin falls within|
|rfc|Yes|The NWS River Forecast Center the basin falls within|

The above information should be collected for all the calibration basins and saved in a csv file. For example, the `domainMeta.csv`, contained within the setup_files directory. All the entries that are optional could be left as NA or -9999. Let us review the content of the domainMeta file provided for our experiment.  

In [None]:
%%bash
cat  /home/docker/wrf-hydro-training/PyWrfHydroCalib/setup_files/domainMeta.csv

**NOTE** The workflow expect the domain files, forcing and obsevation to exist in the folder: /home/docker/wrf-hydro-training/output/Calibration/13010065. Lets create that folder and place all the required items there. 


Now let's add the above information to the `Domain_Meta` table in the data base by calling the program `inputDomainMeta.py`. 

In [None]:
%%bash
# Create an empty folder 
mkdir /home/docker/wrf-hydro-training/output/Calibration/13010065
cd /home/docker/wrf-hydro-training/output/Calibration/13010065

# symlink the domain files - From the subsetting exercise 
ln -sf /home/docker/wrf-hydro-training/output/subsetting/13010065/* .

# symlink the forcing files 
ln -sf /home/docker/wrf-hydro-training/example_case/FORCING .

# symlink the obsevration file used by the calibration workflow 
mkdir /home/docker/wrf-hydro-training/output/Calibration/13010065/OBS
cd /home/docker/wrf-hydro-training/output/Calibration/13010065/OBS
ln -sf /home/docker/wrf-hydro-training/PyWrfHydroCalib/setup_files/obsStrData.Rdata .

Now let's add the above information to the `Domain_Meta` table in the data base by calling the program `inputDomainMeta.py`. 

In [None]:
%%bash
python /home/docker/wrf-hydro-training/PyWrfHydroCalib/inputDomainMeta.py /home/docker/wrf-hydro-training/PyWrfHydroCalib/setup_files/domainMeta.csv --optDbPath /home/docker/wrf-hydro-training/output/Calibration/DATABASE.db

Running the above command will fill in the `Domain_Meta` table with the information provided in the domainMeta file (one entry for each basin) and also automatically fills out few more columns. The `dirname` field is crucial as this is where all your domain files associated with your basin are located (geogrid, Fulldom, etc). In addition, within that directory, it is expected that a `FORCING` subdirectory is placed containing all the necessary forcing files, or symbolic links to those forcing files. There should also be an `OBS` directory that has the `obsStrData.Rdata` file containing the streamflow observations as well as `obsSnowData.Rdata` and `obsSoilData.Rdata` if calibrating to snow and soil moisture observations, respectively. The table below lists all the fields that are automatically filled out by the python workflow (the majority of them are filled out based on the directory name). 


| Filename | Optional |
| ------------- | ------------- |
|domainID|A unique integer ID associated with a particular domain in the table. This value is automatically generated as the information is entered into the database.|
|geo_file|A character entry describing the path to the geogrid file necessary to run the model|
|land_spatial_me ta_file|A character entry describing the path to the optional 2D NetCDF file created by the WRF-Hydro GIS pre-processing for creating CF-compliant land surface output. This entry will be -9999 if missing.|
|wrfinput_file|A character entry describing the path to the wrfinput file necessary to run the model|.
|soil_file|A character entry describing the path to the soil_properties.nc file necessary to run the model. This is a key 2D file containing parameters that are adjusted during the calibration process.|
|fulldom_file|A character entry describing the path to the Fulldom.nc file necessary to run the model. This is a key 2D file containing parameters that are adjusted during the calibration process. |
|rtlink_file|A character entry describing the path to the route link file necessary to run the model. For non-NWM calibrations, this entry will be -9999. |
|spweight_file|A character entry describing the path to the spatial weight file necessary to run the model. For non-NWM calibrations, this entry will be -9999. |
|gw_file|A character entry describing the path to the groundwater bucket parameter file necessary to run the model. This is a key parameter file that gets adjusted during the calibration process. |
|gw_mask|A character entry describing the path to the groundwater mask file used for groundwater configuration. If not used, this entry will be -9999.|
|lake_file|A character entry describing the path to the lake parameter file necessary to run lakes within the model. If your domain does not contain lakes, this value will be set to -9999. |
|forcing_dir|A character entry describing the path to the directory containing necessary forcing files to run the model|
|obs_file|A character entry describing the path to the pre-processed R file containing the observed streamflow|
|dx_hydro|The grid spacing on the routing grid. This is calculated by the workflow.|

**Potential errors may arise if:**

* You did not enter the correct number of columns into the .csv file
* The directory you entered for the input domain files does not exist.
* Expected files within the directory (FORCING subdirectory, geogrid, Fulldom, etc.) do not exist.
* The headers in the .csv file are not the expected format the program is expecting, or contain the incorrect header column names. This is why it is recommended to simply make a copy of the template file included and edit it appropriately for your basins.

Additionally, you may receive warning messages if certain optional files are not found. For example, the workflow will look for a lake parameter file. However, it is possible that your model domain does not contain lakes, if so this file is not necessary. The workflow will provide a warning message indicating this file was not found. Once this step is complete for all the basins you plan on calibrating, you are ready to create your configuration file, `setup.parm` and initialize your experiment. 

## Step 3: Initialize Model

Once you are satisfied with the `setup.parm` file and `calib_parms.tbl` (discussed in Lesson 1) and have finished step 1 and 2, you are ready to initialize your experiment using `jobInit.py`.

This program will use the parameter table, along with the `setup.parm` file and the specified job ID by user. After making any desired changes to the setup files, Enter the following command:


In [None]:
%%bash
python /home/docker/wrf-hydro-training/PyWrfHydroCalib/jobInit.py /home/docker/wrf-hydro-training/PyWrfHydroCalib/setup_files/setup.parm --optExpID 1 --optDbPath /home/docker/wrf-hydro-training/output/Calibration/DATABASE.db

##### Important Note :

JobID needs to be an integer number. The program does some broad checking of options entered into the `setup.parm` file to make sure they make reasonable sense before proceeding. However, it is up to you to ensure you are choosing the right modeling options for your experiment. 


##### What Happens:
* Each Domain specified in the `Domain Meta` table is now populated as a subdirectory named by the `gageID` listed in table `Domain_Meta`, at the location specified in `setup.parm` file. Each Domain Directory contains the following subdirectories:

| SubDirectory Name | Description |
| ------------- | ------------- |
| FORCING | A symbolic link to the forcing directory for this particular basin |
| OBS | The directory containing symbolic links to the observation files necessary for the calibration workflow |
| RUN.CALIB | The directory that contains output for the calibration iterations. |
|RUN.SPINUP|The directory that contains output for the calibration spinup. |
|RUN.VALID|The directory that contains output for the calibration validation.|

* The tables `Job_Meta` and `Job_Params` in the database will be filled out using the information provided in the setup file. `Job_Params` is a table describing the various parameters chosen for calibration for this particular experiment across all the basins. The following table columns exist in the table:

| Field Name | Description |
| ------------- | ------------- |
|jobID|An integer entry connecting the table to the unique job ID created during initialization of your experiment|
|param|A character entry describing the name of the parameter value being calibrated (i.e. ‘bexp’,’refkdt’,etc)|
|defaultValue|Default value used to initialize the parameter of interest for the first model iteration|
|min|The minimum possible parameter value that can be searched for during the calibration workflow|
|max|The maximum possible parameter value that can be searched for during the calibration workflow|

`Job_Meta` is a table that contains metadata about the calibration experiment being run. This information is entered into the table when `jobInit.py` is successfully run to completion. Key table variables (calib_complete, valid_complete, su_complete) are updated throughout the calibration workflow as specific tasks are completed. Additionally, other table variables from this table are used during the workflow to create necessary namelist files and symbolic links to the necessary input files to execute the model. Note that there are a few more columns which are related to the sensitivity analysis available in the package, slack messaging and job submission properties that are not explained here. The following table columns exist in the table:

| Field Name | Description |
| ------------- | ------------- |
|jobID|A unique integer value associated with the calibration experiment. This value is selected by the user at the time of the experiment initiation.|
|Job_Directory|A character entry describing the top-level directory containing all output for your calibration experiment. Each basin will be a sub-directory under this top-level directory. The jobInit program will create and populate sub-directories appropriately.|
|calib_flag|Flag indicating if this is a calibration experiment. Flag = 1 indicates this is an calibration experiment.|
|calib_table|Path to where the calibration table containing the parameters to be calibrated with their default, min and max values exists. |
|date_su_start|A datetime entry that specifies the start of the model spinup period|
|date_su_end|A datetime entry that specifies the end of the model spinup period|
|su_complete|A 0/1 integer entry indicating if the spinup has completed for all basins in the experiment|
|date_calib_start|A datetime entry that specifies the start of the calibration period|
|date_calib_end|A datetime entry that specifies the end of the calibration period|
|date_calib_start_eval|The beginning date within the calibration period that will be used to perform analysis against observations for parameter adjustment. The analysis period will run until the end of the calibration period|
|num_iter|An integer entry indicating the number of model iterations to take place for calibrations|
|calib_complete|A 0/1 integer entry indicating if the calibration has completed for all basins in the experiment|
|valid_start_date|A datetime entry that specifies the start date for the validation simulation period|
|valid_end_date|A datetime entry that specifies the end date for the validation simulation period|
|valid_start_date_eval|A datetime entry that specifies the beginning date within the validation period to perform analysis on. Analysis will take place from this date until the end of the validation simulation period.|
|valid_complete|A 0/1 integer entry indicating if the validation has completed within all basins in the calibration experiment|
|acct_key|An optional character string indicating an account key to run the jobs on if you are running with BSUB/QSUB/Slurm on an HPC environment|
|que_name|An optional character string directing the workflow which que to place model simulations into when submitting jobs.|
|num_cores_model|An integer entry indicating the number of CPU cores to execute the model over|
|num_nodes_model|An integer entry indicating the number of compute nodes running the model simulations over. If you are using mpiexec/mpirun, set this to 1 in the configuration file.|
|num_cores_per_node|Number of available cores per each node.|
|job_run_type|An integer entry indicating the method of executing the model simulations|
|exe|A character entry pointing to the compiled NWM/WRF-Hydro executable to run the simulations|
|num_gages|An integer entry indicating the total number of basins in the experiment |
|owner|The local owner on the computer running the simulations. This is established during the jobinit.py phase.|
|email|An email to pipe status/error messages to during the calibration workflow|

Let us check out the run directory that is being created: 

In [None]:
%%bash 
ls /home/docker/wrf-hydro-training/output/Calibration/IWAA_Calib

In [None]:
%%bash 
ls /home/docker/wrf-hydro-training/output/Calibration/IWAA_Calib/13010065

## Conclusion:

Once all Input and Setup files are prepared, and verified, we can begin the process of setting up a calibration. Proceed to lesson 3. 