<img src="statics/iguide_logo.png" width=200 height=200 />

# National Water Model (WRF-Hydro) on I-GUIDE Platform for Scalable Modeling of Hydroclimatic Extremes

<h4 style="color:red;"> Note: Some cells require User Interaction; "Run cell-by-cell" is recommended; "Run All" will not work. </h4>

As a use case to support and drive the development of scalable modeling of hydroclimatic extremes, a workflow system for setting up and executing subsets of the [National Water Model (NWM)](https://water.noaa.gov/about/nwm) has been prototyped. The NWM is an implementation of the [Weather Research and Forecasting Model Hydrological modeling system (WRF-Hydro)](https://ral.ucar.edu/projects/wrf_hydro) at the US continental scale. The cyberinfrastructure and workflows needed to set up and configure the data product subsets for a modeling domain of interest are complex, and demand a high level of skill from researchers, thus limiting broad applicability. [I-GUIDE](https://iguide.illinois.edu/) is building on modeling capabilities developed by the [HydroShare](https://www.hydroshare.org/) and [CyberGIS-Compute](https://cybergisxhub.cigi.illinois.edu/knowledge-base/components/cybergis-compute/what-is-cybergis-compute/) projects enabling scalable computing for the research community and integrating data connectors and processors from the [GeoEDF](https://github.com/geoedf) project to deploy easy to use workflows for setting up and running WRF-Hydro configured like the NWM for any watershed specified in the NWM modeling domain. The goal is to make it easy for researchers to identify or select a watershed of interest, set up the inputs for WRF-Hydro configured equivalently to the NWM and run the model to reproduce NWM results. This then serves as a starting point for in-depth evaluation of NWM results in the watershed of interest and research to improve understanding and modeling of the hydrological processes in that watershed that could be fed back into the NWM. It also serves as a starting point for broader collaborations connecting with social and environmental research, one example being the geospatial aspects of perspectives from river-related organizations elicited through surveys and interviews. Our work serves as a prototype of a general methodology enabled by the I-GUIDE Platform that broadens participation in modeling by reducing the needs for software installation and configuration, enabling easy access to high-performance computing, and supporting reproducibility and interoperability. The use of the I-GUIDE Platform that combines cloud and high-performance computing enables scaling up of this modeling workflow to more complex and larger domains than what researchers could address with their own personal computing resources.

<center><img src="statics/wrfhydro_flow.png" width=600 height=400 /></center>

A typical WRF-Hydro model run is composed of 4 essential elements as shown below: model configurations, domain data, forcing data and model execution. I-GUIDE platform takes care of the last three element as much as possible and enables users to focus on the model configurations to explore scientific questions of interest. The diagram below shows the setup for a typical WRF-Hydro model on I-GUIDE platform.

<center><img src="statics/wrfhydro.png" width=800 height=600 /></center>

## Case Study

This map layout represents the Logan River Watershed in Utah. For this example, we have selected the upstream watershed (the highlighted one).

<center><img src="statics//Layout.jpg" width=400 height=300 /></center>

## Setup Simulation Parameters

<div class="alert alert-block alert-success">
<b>Import all libraries needed to run this Jupyter notebook</b> 

</div>

In [None]:
import os # allows us to communicate with the operating system
import time # this module provides various time-related functions
from datetime import datetime, timedelta # to specify the time domain for analysis
import shutil # this module offers a number of high-level operations on files and collections of files
import requests # request module lets us make request to web pages
import xarray as xr #xarray makes it easier to work with multidimensional datasets like the NWM forecasts.
import matplotlib.pyplot as plt  # helps to create plots.
from IPython.display import display, HTML # for display warnings
import matplotlib.pyplot as plt # for plotting

<div class="alert alert-block alert-success">
<b>Specify the spatial and time domains as well as the WRF-Hydro version </b> 

</div>

In [None]:
# huc12 id
huc12 = "160102030302"  # huc 12 to identify the spatial domain of interest

# Start at 00:00 (12AM)
start_datetime = datetime(2016, 1, 1)     # specify the simulation start time  (Year, Month, Day)
#                          Y   M   D
# End at 00:00 (12AM)
end_datetime = datetime(2016, 1, 7)     # specify the simulation end time  (Year, Month, Day)
#                        Y   M   D

# version WRFHydro codebase on github (tag/release/commit_id)
wrfhydro_version = "v5.2.0"  

In [None]:
params_subset_domain = {"huc12_id": huc12, 
                        "start_date": start_datetime.strftime("%m/%d/%Y"), 
                        "end_date": end_datetime.strftime("%m/%d/%Y")}
params_subset_domain

## Subset DOMAIN Files with GeoEDF Data Connector on CyberGIS-Compute

The source of WRFHydro DOMAIN files is [CUAHSI Domain Subsetter](https://subset.cuahsi.org/) service. I-GUIDE provides a reusable [GeoEDF Data Connector](https://dl.acm.org/doi/10.1145/3311790.3396631) ([CUAHSISubsetterInput-Connector](https://github.com/I-GUIDE/cybergis-compute-cuahsisubsetterinput-connector) ) that makes requests to CUAHSI Domain Subsetter REST APIs and retrieves the domain files ready for model use. The GeoEDF Data Connector has been integrated into [CyberGIS-Compute](https://cybergis.github.io/cybergis-compute-python-sdk/reference.html) as a job that can be invoked by users from Jupyter environment and executed on supported HPC resources. The subset domain files staged remotely is ready for use by WRFHydro model on HPC, and user has the option to download the files from HPC back to Jupyter for local manipulation.

<h4 style="color:red;"> User Interaction Required </h4>

- Run the cell below 
- Click on "Submit Job" on the "Your Job Status" tabpage 
- Wait until Job is finished (5-8 mins)
- Switch to "Download Job Result" tabpage
- Choose "/" and click on Download
- Wait until downloading is finished
- Proceed to the next cell

In [None]:
import cybergis_compute_client
from cybergis_compute_client import CyberGISCompute

cybergis = CyberGISCompute(url="cgjobsup.cigi.illinois.edu", isJupyter=True, protocol="HTTPS", port=443, suffix="v2")
cybergis.show_ui(defaultJob="CUAHSI_Subsetter_Connector", input_params=params_subset_domain)

<div class="alert alert-block alert-info">
<b> NOTE: </b> Before running the next cell click on the download tab above <b> "Download Job Result" </b> and <b> "Download" </b> the outputs using <b> "/" </b>
</div>

Retain Domain Subsetter JobID for later reference

In [None]:
jobid_cuahsi_subset_domain = cybergis.job.id
jobid_cuahsi_subset_domain

Ensure results have been downloaded for future analysis

In [None]:
domain_output = cybergis.recentDownloadPath
domain_output
if not os.path.isfile(os.path.join(domain_output, "Route_Link.nc")):
    display(HTML('<h4 style="color:red;">It appears you did not download the job results per instruction above, please double check!</h4>'))

### Set up parameters for the next job in the workflow

In [None]:
params_subset_forcing = {"Domain_Path": jobid_cuahsi_subset_domain, 
                        "start_date": start_datetime.strftime("%m/%d/%Y"), 
                        "end_date": end_datetime.strftime("%m/%d/%Y")}
params_subset_forcing

## Subset FORCINGS with GeoEDF Data Processor on CyberGIS-Compute

The source of WRF-Hydro FORCING files is [AORC](https://registry.opendata.aws/nwm-archive/) dataset hosted on AWS. I-GUIDE provides a reusable [GeoEDF Data Processor](https://dl.acm.org/doi/10.1145/3311790.3396631) ([SubsetForcingData-Processor](https://github.com/I-GUIDE/cybergis-compute-subsetaorcforcingdata-processor) ) that subsets forcing files spatially and temporally. This GeoEDF Data Processor has been integrated into [CyberGIS-Compute](https://cybergis.github.io/cybergis-compute-python-sdk/reference.html) as a job that can be invoked by users from Jupyter environment and executed on supported HPC resources. The subset forcing files staged remotely is ready for use by WRFHydro model on HPC, and user has the option to download the files from HPC back to Jupyter for local manipulation.

<h4 style="color:red;"> User Interaction Required </h4>

- Run the cell below 
- Click on "Submit Job" on the "Your Job Status" tabpage 
- Wait until Job is finished (2-4 mins)
- Proceed to the next cell

In [None]:
import cybergis_compute_client
from cybergis_compute_client import CyberGISCompute

cybergis = CyberGISCompute(url="cgjobsup.cigi.illinois.edu", isJupyter=True, protocol="HTTPS", port=443, suffix="v2")
cybergis.show_ui(defaultJob="Subset_AORC_Forcing_Data_Processor", input_params=params_subset_forcing)

Retain Forcing Processor JobID for later reference

In [None]:
jobid_subset_forcing = cybergis.job.id
jobid_subset_forcing

## Prepare Model Configurations

In [None]:
# Create a "Simulation" directory
workspace = os.getcwd()
simulation_dir = os.path.join(workspace, 'Simulation')
if os.path.exists(simulation_dir):
    shutil.rmtree(simulation_dir)
os.makedirs(simulation_dir)
#List of files
os.listdir(simulation_dir)

In [None]:
! wget https://raw.githubusercontent.com/NCAR/wrf_hydro_nwm_public/v5.2.0/trunk/NDHMS/template/setEnvar.sh
! sed -i '/export HYDRO_D=0/c\export HYDRO_D=1' ./setEnvar.sh
! sed -i '/export SPATIAL_SOIL=0/c\export SPATIAL_SOIL=1' ./setEnvar.sh
! cat ./setEnvar.sh | grep -E 'HYDRO_D|SPATIAL_SOIL'
! mv ./setEnvar.sh {simulation_dir}

In [None]:
# namelist.hrldas --> START_YEAR START_MONTH START_DAY START_HOUR START_MIN RESTART_FILENAME_REQUESTED
start_year = start_datetime.year
start_month = "{:02d}".format(start_datetime.month)
start_day = "{:02d}".format(start_datetime.day)
start_hour = "{:02d}".format(start_datetime.hour)
start_minute = "{:02d}".format(start_datetime.minute)
khour = (end_datetime - start_datetime) / timedelta(hours=1)
khour = "{}".format(int(khour))

In [None]:
! rm -rf namelist.hrldas
! wget https://raw.githubusercontent.com/NCAR/wrf_hydro_nwm_public/v5.2.0/trunk/NDHMS/template/NoahMP/namelist.hrldas
! sed -i  '/HRLDAS_SETUP_FILE/c\HRLDAS_SETUP_FILE = "./DOMAIN/wrfinput_d0x.nc"' ./namelist.hrldas
! sed -i  '/START_YEAR/c\START_YEAR = '"$start_year" ./namelist.hrldas
! sed -i  '/START_MONTH/c\START_MONTH = '"$start_month" ./namelist.hrldas
! sed -i  '/START_DAY/c\START_DAY = '"$start_day" ./namelist.hrldas
! sed -i  '/START_HOUR/c\START_HOUR = '"$start_hour" ./namelist.hrldas
! sed -i  '/START_MIN/c\START_MIN = '"$start_minute" ./namelist.hrldas
! sed -i  '/KHOUR =/c\KHOUR = '"$khour" ./namelist.hrldas
! sed -i  '/RESTART_FILENAME_REQUESTED/c\!RESTART_FILENAME_REQUESTED = ""' ./namelist.hrldas
! grep -E 'HRLDAS_SETUP_FILE|START_|KHOUR' ./namelist.hrldas
! mv ./namelist.hrldas {simulation_dir}

In [None]:
# hydro.namelist  --> RESTART_FILE
! rm -rf hydro.namelist
! wget https://raw.githubusercontent.com/NCAR/wrf_hydro_nwm_public/v5.2.0/trunk/NDHMS/template/HYDRO/hydro.namelist
! sed -i '/GEO_STATIC_FLNM/c\GEO_STATIC_FLNM = "./DOMAIN/geo_em.d0x.nc"' ./hydro.namelist
! sed -i '/RESTART_FILE/c\!RESTART_FILE = ""' ./hydro.namelist
! sed -i  '/GW_RESTART/c\GW_RESTART = 0' ./hydro.namelist
! sed -i  '/LSMOUT_DOMAIN/c\LSMOUT_DOMAIN = 1' ./hydro.namelist
! sed -i  '/output_gw/c\output_gw = 1' ./hydro.namelist
! sed -i '/outlake/c\outlake  = 0' ./hydro.namelist
! sed -i '/output_gw/c\output_gw  = 0' ./hydro.namelist
! sed -i '/GWBASESWCRT/c\GWBASESWCRT  = 1' ./hydro.namelist
! sed -i '/output_channelBucket_influx/c\output_channelBucket_influx  = 2' ./hydro.namelist
! sed -i '/channel_option/c\channel_option  = 2' ./hydro.namelist
! sed -i '/route_link_f/c\route_link_f  = "./DOMAIN/Route_Link.nc"' ./hydro.namelist
! sed -i '/compound_channel/c\!compound_channel  = .FALSE.' ./hydro.namelist
! sed -i '/route_lake_f/c\!route_lake_f  = "./DOMAIN/LAKEPARM.nc"' ./hydro.namelist
! sed -i '/gwbasmskfil/c\!gwbasmskfil  = "./DOMAIN/GWBASINS.nc"' ./hydro.namelist
! sed -i '/UDMP_OPT/c\UDMP_OPT  = 1' ./hydro.namelist
! sed -i '/!udmap_file/c\udmap_file = "./DOMAIN/spatialweights.nc"' ./hydro.namelist
! grep -E "RESTART|outlake|GWBASESWCRT|route_lake_f|gwbasmskfil" ./hydro.namelist
! mv ./hydro.namelist {simulation_dir}

## Run WRFHydro Model on HPC with CyberGIS-Compute

In [None]:
params_wrfhydro = {"Model_Version": wrfhydro_version,
                   "LSM_Type": "NoahMP",
                   "Forcing_Path": jobid_subset_forcing,
                   "Domain_Path": jobid_cuahsi_subset_domain,
                   "Merge_Output": "True"}
params_wrfhydro

<h4 style="color:red;"> User Interaction Required </h4>

- Run the cell below 
- Click on "Submit Job" on the "Your Job Status" tabpage 
- Wait until Job is finished (3-5 mins)
- Switch to "Download Job Result" tabpage
- Choose "Outputs_Merged/CHRTOUT" and click on Download
- Wait unitl downloading is finished
- Proceed to the next cell

In [None]:
import cybergis_compute_client
from cybergis_compute_client import CyberGISCompute

cybergis = CyberGISCompute(url="cgjobsup.cigi.illinois.edu", isJupyter=True, protocol="HTTPS", port=443, suffix="v2")
cybergis.create_job_by_ui(defaultJob="wrfhydro-5.x", defaultDataFolder=simulation_dir,input_params=params_wrfhydro)

<div class="alert alert-block alert-info">
<b> NOTE: </b> Before running the next cell click on the download tab above <b> "Download Job Result" </b> and <b> "Download" </b> the <b> "Outputs_Merged/CHRTOUT" </b>
</div>

Retain WRF-Hydro JobID for later reference

In [None]:
jobid_wrfhydro = cybergis.job.id
output_wrfhydro = cybergis.recentDownloadPath
!ls -LR {output_wrfhydro}

In [None]:
if not os.path.isfile(os.path.join(output_wrfhydro, "CHRTOUT_DOMAIN1_merged.nc")):
    display(HTML('<h4 style="color:red;">It appears you did not download the job results per instruction above, please double check!</h4>'))

## Data Visualization

In [None]:
# path of the merged file of channel routing "CHRTOUT_DOMAIN1_merged.nc"
ch_file = '{}/CHRTOUT_DOMAIN1_merged.nc'.format(output_wrfhydro)
print(ch_file)
# path of the route link "Rouet_Link.nc"
routelink ='/home/jovyan/globus_download_{}/Route_Link.nc'.format(jobid_cuahsi_subset_domain)
print(routelink)

# convert rouetlink to dataframe
route_df = xr.open_dataset(routelink).to_dataframe() # convert routelink to dataframe
route_df.gages = route_df.gages.str.decode('utf-8').str.strip()
route_df # print out the routelink dataframe

# Re-name the "Route_Link.nc" variables
cols = ['order', 'link', 'gages', 'lat', 'lon', 'to', 'from']  # columns name of original routelink dataframe
route_re_df = route_df[cols].sort_values(by=['order'])      # reduce the size of routelink dataframe to include only these columns ['order', 'link', 'gages', 'lat', 'lon', 'to', 'from']

# rename the columns
route_renm_df = route_re_df.rename(index=str, columns={"order": "stream_order",
                                       "link": "comid",
                                       "from": 'upstream_comid',
                                       'to':'downstream_comid',
                                       "gages": 'usgs_gageid',
                                       "lat": 'lat-midpoint',
                                       "lon": 'lon-midpoint'})
route_renm_df.reset_index(inplace=True)
# data.drop('linkDim', axis=1, inplace=True)

########################################################################
########################################################################
# Load channel routing data "CHRTOUT_DOMAIN1_merged.nc"
channel_ds = xr.open_dataset(ch_file)
########################################################################
########################################################################

# Reduce the size of the dataset to only the essential variables.
reach_ds = channel_ds
reach_ds = reach_ds[['streamflow',        # Streamflow 
                     'q_lateral',         # lateral inflow through reach
                     'qSfcLatRunoff',     # runoff from terrain routing
                     'qBucket',           # flux from groundwater (gw) bucket
                     'qBtmVertRunoff',    # runoff from bottom of soil to bucket
                     'order',
                     'velocity',
                     'Head',
                     'elevation']]

## clean unnecessary data
reach_ds = reach_ds.reset_coords()
reach_ds = reach_ds.drop(labels=['latitude','longitude'])
reach_ds.attrs = {}

route_renm_df

#### Plot hydrograph for a reach within the watershed

In [None]:
reachid= 664168

fig, ax = plt.subplots(1, 1, figsize=(12,6), sharex='col')

reach_ds.sel(feature_id=reachid)['streamflow'].plot(ax=ax,
                                                    label='Total Outflow ($m^3$/s)',
                                                    color='red',
                                                    linestyle='--')
# plot settings
plt.ylabel('Discharge, $m^3$/s')
plt.xlabel("Time")
# plt.title("")
plt.legend(loc="upper right")
plt.savefig('Hydrograph.png')

#### Plot hydrograph for all reaches (optional)

In [None]:
# Uncomment the codes below to plot hydrograph for all reaches
## plot hydrograph for all reaches within the spatail domain 

# for r in route_renm_df['comid']:
#     fig, ax = plt.subplots(1, 1, figsize=(12,6), sharex='col')
#     reach_ds.sel(feature_id=r)['streamflow'].plot(ax=ax,
#                                                         label='Total Outflow ($m^3$/s)',
#                                                         color='red',
#                                                         linestyle='--')
#     # plot settings
#     plt.ylabel('Discharge, $m^3$/s')
#     plt.xlabel("Time")
#     # plt.title("")
#     plt.legend(loc="upper right")

## The End