# YD 01 An Approach for Creating Immutable and Interoperable End-to-End Hydrological Modeling Computational Workflows


## Authors

- Author1 = {"name": "Young-Don Choi", "affiliation": "University of Virginia", "email": "yc5ef@virginia.edu", "orcid": "orcid"}
- Author2 = {"name": "Jonathan L Goodall", "affiliation": "University of Virginia", "email": "goodall@virginia.edu", "orcid": "https://orcid.org/0000-0002-1112-4522"}
- Author3 = {"name": "Iman Maghami", "affiliation": "University of Virginia", "email": "im3vp@virginia.edu", "orcid": "orcid"}
- Author4 = {"name": "Lawrence Band", "affiliation": "University of Virginia", "email": "leb3t@virginia.edu", "orcid": "https://orcid.org/0000-0003-0461-0503"}
- Author5 = {"name": "Raza Ahmad", "affiliation": "DePaul University", "email": "raza.ahmad@depaul.edu", "orcid": "https://orcid.org/0000-0001-7953-2269"}
- Author6 = {"name": "Tanu Malik", "affiliation": "DePaul University", "email": "tmalik1@depaul.edu", "orcid": "orcid"}
- Author7 = {"name": "Zhiyu Li", "affiliation": "University of Illinois at Urbana-Champaign", "email": "zyli2004@gmail.com", "orcid": "orcid"}
- Author8 = {"name": "Shaowen Wang", "affiliation": "University of Illinois at Urbana-Champaign", "email": "shaowen@illinois.edu", "orcid": "https://orcid.org/0000-0001-5848-590X"}
- Author9 = {"name": "David G. Tarboton", "affiliation": "Utah State University", "email": "dtarb@usu.edu", "orcid": "https://orcid.org/0000-0002-1998-3479"}

## Purpose

* Computational hydrology is a rapidly advancing field with fast-evolving technologies to support increasingly complex computational hydrologic modeling. * The growing model complexity in terms of variety of software and cyberinfrastructure capabilities makes achieving computational reproducibility extremely challenging. Through recent reproducibility research, there have been efforts to integrate three components: 1) (meta)data, 2) computational environments, and 3) workflows. However, each component is still separate, and researchers must interoperate between these three components. These separations make verifying end-to-end reproducibility challenging. 

* Sciunit was developed to assist scientists, who are not programming experts, with encapsulating these three components into a container to enable reproducibility in an immutable form. However, there were still limitations to support interoperable computational environments and apply end-to-end solutions, which are an ultimate goal of reproducible hydrological modeling. 

* The objective of this research is to advance the existing Sciunit capabilities to not only support immutable, but also interoperable computational environments and apply an end-to-end modeling workflow using the Regional Hydro-Ecologic Simulation System (RHESSys) hydrologic model as an example.

<img src="https://raw.githubusercontent.com/DavidChoi76/figures/main/earthcube_purpose.png" width="1100">

## Technical contributions

- Advance Sciunit, a tool for easily containerizing, sharing, and tracking deterministic computational applications, to minimally containerize reproducible hydrologic modeling workflow objects into the same container with immutability and interoperability

- Develop pyRHESSys (https://github.com/uva-hydroinformatics/pyRHESSys) to programmatically interact with RHESSys on CyberGIS-Jupyter for water plaform(https://go.illinois.edu/cybergis-jupyter-water)

- Devlope an approach to encapsulate all computational artifacts `1) data, 2) computational environments, and 3) modeling workflow` in an immutable container for `long-term reproducibility` using the self-containerized tool (Sciunit, https://sciunit.run).

- Develop an approach to `generate software dependencies' list` to create a new computational environment for `an extendable computatioal research` using Sciunit

- This research presents a detailed example of a user-centric case study demonstrating the application of an open and interoperable containerization approach from a hydrologic modeler’s perspective.

## Methodology

* We demonstrate how to encapsulate 1) data, 2) computational environments, and 3) modeling workflow for reproducibility using Sciunit.

* First, we create Jupyter notebook to encapsulate RHESSys modeling workflow. In particular, we use a shell script to efficiently encapsulate GRASS GIS process in Sciunit.

* Second, we create a Sciunit container using RHESSys end-to-end workflows in CyberGIS-Jupyter for water platform.

* Third, we create `a computational dependencies' list` using Sciunit for evaluation of reproducibility in a different computational environment (MyBinder, https://mybinder.org).

* Fourth, we create HydroShare resource including `the Sciunit container` and `computational dependencies' lists (MyBinder configuration file`

<img src="https://raw.githubusercontent.com/DavidChoi76/figures/main/earthcube_methodology1.png" width="1100">

## Results

* We evaluate the immutable container for `long-term reproducibility` using Sciunit

* We evaluate Sciunit capabilities to create a new computational environment for `an extendable computatioal research`

## Funding

- Award1 = {"agency": "National Science Foundation", "award_code": "ICER-1639655, ICER-1639759, ICER-1639696", "award_URL": "https://www.nsf.gov/awardsearch/showAward?AWD_ID=1639655"}
- Award2 = {"agency": "National Science Foundation", "award_code": "OAC-1664061, OAC-1664018, OAC-1664119", "award_URL": "https://www.nsf.gov/awardsearch/showAward?AWD_ID=1664061"}

## Keywords

keywords=["hydrologic modeling", "computational reproducibility", "containerization", "open hydrology", "modeling version control"]

## Citation

- That, D.H.T., Fils, G., Yuan, Z., Malik, T., 2017. Sciunits: Reusable research objects, in: Proceedings - 13th IEEE International Conference on EScience, EScience 2017. https://doi.org/10.1109/eScience.2017.51

- Tague, C.L., Band, L.E., Tague, C.L., Band, L.E., 2004. RHESSys: Regional Hydro-Ecologic Simulation System—An Object-Oriented Approach to Spatially Distributed Modeling of Carbon, Water, and Nutrient Cycling. Earth Interact. 8, 1–42. [https://doi.org/10.1175/1087-3562(2004)8<1:RRHSSO>2.0.CO;2](https://doi.org/10.1175/1087-3562(2004)8<1:RRHSSO>2.0.CO;2)

# Setup

## Import Library 

In [None]:
!pip install git+https://github.com/hydroshare/hsclient.git

In [None]:
# Data manipulation
import os, shutil
import sys
import pandas as pd
import numpy as np

# RHESSys execution
import pyrhessys as pr

# GRASS GIS execution
import subprocess
from subprocess import Popen, PIPE

# Visualization
import matplotlib.pyplot as plt
from IPython.display import Image

# Output performance evaluation
from hydroeval import *

# HydroShare resource creation
from hsmodels.schemas.fields import BoxCoverage, PointCoverage, PeriodCoverage
from datetime import datetime


# Raw Input Collection and RHESSys Compilation

## Create Modeling Directory

In [None]:
# Define Project name
PROJECT_NAME = "Coweeta_Sub18"
CURRENT_PATH = os.getcwd()
PROJECT_DIR = os.path.join(CURRENT_PATH, PROJECT_NAME)
# Define raw GIS and observation data directory downloaded from HydroShare
RAWGIS_DIR = os.path.join(PROJECT_DIR, "gis_data")
RAWOBS_DIR = os.path.join(PROJECT_DIR, "obs")
# Define RHESsys model iput directory
MODEL_DIR = os.path.join(PROJECT_DIR, 'model')
# Create Detailed RHESsys model iput directories if directories are not previously setup.
if not os.path.exists(PROJECT_DIR):
    os.mkdir(PROJECT_DIR)
    os.mkdir(MODEL_DIR)
    os.mkdir(os.path.join(MODEL_DIR, 'defs'))
    os.mkdir(os.path.join(MODEL_DIR, 'flows'))
    os.mkdir(os.path.join(MODEL_DIR, 'worldfiles'))
    os.mkdir(os.path.join(MODEL_DIR, 'clim'))
    os.mkdir(os.path.join(MODEL_DIR, 'tecfiles'))
    os.mkdir(os.path.join(MODEL_DIR, 'output'))
else:
    pass

## Download Raw Input Data from HydroShare (Coweeta Subbasin18, North Carolina)


To start RHESSys modeling, we need to prepare raw spatial and time series input data. In this workflow, we describe the process from getting raw data (GIS data and time series data). Actually, we need to download data from governments or other organization websites, but for convenience we created HydroShare (https://www.hydroshare.org/) resource (https://www.hydroshare.org/resource/c7a3d9a914f54955877899389bb43ccb/) for this workflow.


Define resource id to download HydroShare resource using `utils` module in pyRHESSys. In HydroShare resources, there are raw data (GIS data (DEM, Landcover, soil, outlet etc), time series data (climate, streamflow observations) etc). 

In [None]:
# set HydroShare resource id of RHESSys Model instance for Coweeta subbasin18
resource_id = 'c7a3d9a914f54955877899389bb43ccb'
# download RHESSys Model instance of Coweeta subbasin18 from HydroSHare 
pr.utils.get_hs_resource(resource_id, PROJECT_DIR)

## Download RHESSys source code (East Coast version 7.2) and complie the RHESSys model 

* RHESSys main branch (https://github.com/RHESSys/RHESSys) serves as a general purpose model, while this branch of RHESSys (https://github.com/laurencelin/RHESSysEastCoast) is repeatedly and heavily tested in several catchments (forested and urban) on the U.S. east coast in terms of hydrology, soil moisture pattern, forest ecosystem, and biochemistry cycle & transport.

In [None]:
!cd {MODEL_DIR}; git clone https://github.com/laurencelin/RHESSysEastCoast.git
# complie rhessysEC.7.2 version execution file and set execution file to execution_file object
EXECUTABLE = pr.utils.complie(MODEL_DIR, version_option="rhessysEC.7.2")
EXECUTABLE

# Parameter definitions

* s1, s2, s3: horizontal hydraulic conductivity with depth and over the surface, 
* sv1, sv2: vertical hydraulic conductivity with depth and over the surface
* gw1, gw2: groundwater bypass flow and drainage rate

In [None]:
# Create r pyRHESSys simulation object
EXECUTABLE = MODEL_DIR + '/RHESSysEastCoast/rhessysEC.7.2'
r = pr.Simulation(EXECUTABLE, MODEL_DIR)
# Set simulation start and end date
start_date = '2005 01 01 01'
end_date = '2008 12 31 01'
# Set observation climate and streamflow data
obs_clim = os.path.join(RAWOBS_DIR, 'climate_coweeta_ws18.csv')
obs_flow = os.path.join(RAWOBS_DIR, 'streamflow_coweeta_ws18.csv')
# Set model parameters
r.parameters['version'] = 'rhessysEC.7.2'
r.parameters['start_date'] = start_date
r.parameters['end_date'] = end_date
r.parameters['s1'] = '5.884455562440615'
r.parameters['s2'] = '310.52318375463875'
r.parameters['s3'] = '4.16584949108444'
r.parameters['sv1'] = '12.77961480830584'
r.parameters['sv2'] = '90.48308040876708'
r.parameters['gw1'] = '0.19564245134347147'
r.parameters['gw2'] = '0.400865955544775'
r.parameters['snowEs'] = '1.17543162591755'
r.parameters['snowTs'] = '0.527982610510662'
r.parameters['svalt1'] = '0.928032172983822'
r.parameters['svalt2'] = '0.955452497987305'
r.parameters['locationid'] = '0'

# GRASS GIS 7.8 Setting

To start to use GRASS GIS and Python library of GRASS GIS, we need to set GRASS database and environment.

In [None]:
# Set the directory to store preprocessing GRASS database
GRASS_DATA = "grass_dataset"
GISDBASE = os.path.join(PROJECT_DIR, GRASS_DATA)
# Set the full path to GRASS execution
GRASSEXE = "/usr/lib/grass78" 
if os.path.isdir(GRASSEXE):
    pass
else:
    GRASSEXE = "/usr/lib/grass74" 
    os.environ["GDAL_DATA"] = "/usr/share/gdal/2.2"
# Set the command to start GRASS from shell
GRASS7BIN = "grass" 
# Define and create grass data folder, location, and mapset
if not os.path.exists(GISDBASE):
    os.mkdir(GISDBASE)
LOCATION = os.path.join(GISDBASE, PROJECT_NAME)
# Define mapset name which is a working directory for GRASS GIS
MAPSET = "PERMANENT"

In [None]:
# Set GISBASE environment variable
os.environ['GISBASE'] = GRASSEXE
# The following not needed with trunk
os.environ['PATH'] += os.pathsep + os.path.join(GRASSEXE, 'bin')
# Set GISDBASE environment variable
os.environ['GISDBASE'] = GISDBASE

# define GRASS-Python environment
gpydir = os.path.join(GRASSEXE, "etc", "python")
sys.path.append(gpydir)

# import GRASS Python library
import grass.script as gscript
import grass.script.setup as gsetup
gscript.core.set_raise_on_error(True)

In [None]:
# launch session
gsetup.init(GRASSEXE, GISDBASE, LOCATION, MAPSET)

In [None]:
# projection for spatial reference and resolution
# GIS spatial resolution and projection (UTM)
# look up from http://spatialreference.org/ref/epsg/?page=1
# EPSG:26917 = NAD83 UTM 17N
EPSGCODE='EPSG:26917'
location_path = os.path.join(GISDBASE, LOCATION)
# Create GRASS database for the project
if not os.path.exists(location_path):
    startcmd = GRASS7BIN + ' -c ' + EPSGCODE + ' -e ' + location_path
    print(startcmd)
    p = subprocess.Popen(startcmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = p.communicate()
    print(p.returncode)
    if p.returncode != 0:
        print('ERROR: %s' % err, file=sys.stderr)
        print('ERROR: Cannot generate location (%s)' % startcmd, file=sys.stderr)
        sys.exit(-1)

# Spatial RHESSys Input Preparation using GRASS GIS

## Create a Watershed Outlet textfile to Delineate the Watershed

In [None]:
%%writefile gage_latlon.txt
277826.69 3881430.25 1

## Create Spatial RHESSys Input using GRASS GIS with Shell Script

* In this process, we delineate a watershed using a point outlet and DEM. Then we extract land cover and soil map attributes. Finally, we create RHESSys spatial input files such as worldfiles, definition and flowtable. Worldfiles is used to describes the data properties and allows them to be represented in the landscape: Basin, Zone, Hillslope, Patch. Flowtable is used to describe the connectivity between patches and hillslopes.


* Depending on the condition of CyberGIS-Jupter for water, execution time is different. In general, it takes 3 min to complete GIS Process

In [None]:
%%time
!chmod +x coweeta_sub18_workflow.sh
process = Popen([os.getcwd() + '/coweeta_sub18_workflow.sh'], stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
print(stdout)

## Visualize Spatial Attribute (Aspect, Slope, and Flow Direction)

In [None]:
# Default font displays
os.environ['GRASS_FONT'] = 'sans'
# Overwrite existing maps
os.environ['GRASS_OVERWRITE'] = '1'
os.environ['GRASS_RENDER_IMMEDIATE'] = 'cairo'
os.environ['GRASS_RENDER_FILE_READ'] = 'TRUE'
os.environ['GRASS_LEGEND_FILE'] = 'legend.txt'

### Visualize `Aspect` Map

In [None]:
os.environ['GRASS_RENDER_FILE'] = 'aspect.png'
!grass {LOCATION}/{MAPSET} --exec d.rast map="aspect"
!grass {LOCATION}/{MAPSET} --exec d.legend raster="aspect" fontsize='8' title='aspect' title_fontsize='12' at=5,50,0,5
image = os.path.join(CURRENT_PATH,'aspect.png')
Image(filename=image)

### Visualize `Slope` Map

In [None]:
os.environ['GRASS_RENDER_FILE'] = 'slope.png'
!grass {LOCATION}/{MAPSET} --exec d.rast map="slope"
!grass {LOCATION}/{MAPSET} --exec d.legend raster="slope" fontsize='8' title='slope' title_fontsize='12' at=5,50,0,5
image = os.path.join(CURRENT_PATH,'slope.png')
Image(filename=image)

### Visualize `Drain (Flow Direction)` Map

In [None]:
os.environ['GRASS_RENDER_FILE'] = 'drain.png'
!grass {LOCATION}/{MAPSET} --exec d.rast map="drain"
!grass {LOCATION}/{MAPSET} --exec d.legend raster="drain" fontsize='15' title='drain' title_fontsize='12' at=5,50,0,5
image = os.path.join(CURRENT_PATH,'drain.png')
Image(filename=image)

# Time-series and Other Model input Configurations

### Create RHESsys input from observed time-series data

In [None]:
obs_clim_df = pd.read_csv(obs_clim)

ClimDIR = os.path.join(MODEL_DIR, 'clim')

rain = obs_clim_df['rain'].values
np.savetxt(r'cwt.rain', obs_clim_df['rain'].values, fmt='%2.4f', header='1983 9 1 1', comments='')
shutil.copy('cwt.rain', ClimDIR)

tmax = obs_clim_df['tmax'].values
np.savetxt(r'cwt.tmax', obs_clim_df['tmax'].values, fmt='%2.1f', header='1983 9 1 1', comments='')
shutil.copy('cwt.tmax', ClimDIR)

tmin = obs_clim_df['tmin'].values
np.savetxt(r'cwt.tmin', obs_clim_df['tmin'].values, fmt='%2.1f', header='1983 9 1 1', comments='')
shutil.copy('cwt.tmin', ClimDIR)

vpd = pd.to_numeric(obs_clim_df['vpd'].values, errors='coerce')
np.savetxt(r'cwt.vpd', vpd, fmt='%3.2f', header='1983 9 1 1', comments='')
shutil.copy('cwt.vpd', ClimDIR)

rh = pd.to_numeric(obs_clim_df['rh'].values, errors='coerce')
np.savetxt(r'cwt.relative_humidity', rh, fmt='%2.1f', header='1983 9 1 1', comments='')
shutil.copy('cwt.relative_humidity', ClimDIR)

kdownDirect = pd.to_numeric(obs_clim_df['kdownDirect'].values, errors='coerce')
np.savetxt(r'cwt.Kdown_direct', kdownDirect, fmt='%5.1f', header='1983 9 1 1', comments='')
shutil.copy('cwt.Kdown_direct', ClimDIR)

### Create Climate RHESSys input

In [None]:
cwt_path = ClimDIR +"/cwt"
cwt_path1 = ClimDIR +"/na"
base = open("cwt.base","w") 
contents = ["101 base_station_id \n",
     "278391.71 x_coordinate \n",
     "3882439.5 y_coordinate \n",
     "638.0 z_coordinate \n",
     "2.0 effective_lai \n",
     "22.9 	screen_height \n",
     cwt_path1 + " annual_climate_prefix \n", "0 \n",
     cwt_path1 + " monthly_climate_prefix \n", "0 \n",
     cwt_path + " daily_climate_prefix \n", "0 \n",
     cwt_path1 + " hourly_climate_prefix \n", "0 "]  
base.writelines(contents) 
base.close() 
shutil.copy('cwt.base', ClimDIR)
climateBaseFile = os.path.join(ClimDIR, 'cwt.base')

### Create Tecfiles (Temporal Event Control File)

In [None]:
tecfilessDIR = os.path.join(MODEL_DIR, 'tecfiles')
tec_daily = open("tec_daily.txt","w") 
contents = ["1983 9 1 1 print_daily_on"]  
tec_daily.writelines(contents) 
tec_daily.close() 
shutil.copy('tec_daily.txt', tecfilessDIR)

### Set Current Directories in RHESSys climate and worldfile input

In [None]:
clim_file = ClimDIR + '/cwt.base'
clim_data = open(clim_file, 'r')

world_hdr_file = r.worldfiles + '/worldfile.hdr'
world_hdr_data = open(world_hdr_file, 'r')

pr.utils.replace_string(world_hdr_file, "defs", r.defs)
pr.utils.replace_string(world_hdr_file, "clim", r.clim)
world_hdr_file = r.worldfiles + '/worldfile.hdr'
world_hdr_data = open(world_hdr_file, 'r')


# RHESSys Execution and Model Output Evaluation

## Execute RHESSys

* It take 2~4 min to simulate daily execution of RHESSys for 4 years

In [None]:
%%time
r.run("local")

## Plot Model Output: Comparison between Simulation and Observation Streamflow

In [None]:
# Create Pandas dataframe from RHESSys output
obs_flow_f = os.path.join(RAWOBS_DIR, 'streamflow_coweeta_ws18.csv')
obs_flow = pd.read_csv(obs_flow_f)
obs_flow['Date'] = pd.to_datetime(obs_flow['date'])
obs_flow.set_index('Date', inplace=True)

analysis_obs_flow = obs_flow['2005-01-01':'2008-12-31']
analysis_obs_flow = analysis_obs_flow.iloc[0:-1]

plot_data = pd.read_csv(r.output + "/rhessys_run" +'_basin.daily', delimiter=" ")
plot_data['Date'] = pd.to_datetime(analysis_obs_flow.index)

In [None]:
# compare simulation streamflow and observed streamflow
def ts_plot_obs(sim_data, sim_date_col_name, sim_output_variable, sim_label, obs_data, obs_variable: str="", obs_label: str="", pre_trim: int=0, post_trim: int=-1):
    # set output variables and variable description
    y_axis = "Total Stream Outflow(mm (normalized by basin area))"
    # Plotting 
    plt.figure(figsize=(17,7))
    ax = plt.gca()
    ax.plot(sim_data[sim_date_col_name][pre_trim:post_trim], sim_data[sim_output_variable][pre_trim:post_trim], label=sim_label)
    ax.plot(obs_data.index[pre_trim:post_trim], obs_data[obs_variable][pre_trim:post_trim], label=obs_label)
    ax.grid(True)
    ax.set_ylabel(y_axis, fontsize=18)
    plt.xticks(fontsize=18)
    plt.yticks(fontsize=18)
    ax.legend(loc='upper left', bbox_to_anchor=(1, 1), fontsize=10)
    plt.savefig("figure.png")
ts_plot_obs(sim_data=plot_data, sim_date_col_name='Date', sim_output_variable='streamflow', sim_label="sim_streamflow", obs_data=analysis_obs_flow, obs_variable="discharge (mm)", obs_label="obs_stream", pre_trim =100)

## Evaluate Model Performance

In [None]:
# set simulation and observation data to evaluate 
simulation_streamflow = plot_data["streamflow"].values
obs_streamflow = analysis_obs_flow["discharge (mm)"].values

# use the evaluator with the Nash Sutcliffe Efficiency
my_nse = evaluator(nse, simulation_streamflow[365*1:], obs_streamflow[365*1:])
my_nse

In [None]:
f= open("performance.txt","w+")
f.write("The Nash Sutcliffe Efficiency is %s\r" % (my_nse))
f.close()        


# Encapsulation of All Computational Artifacts using Sciunit

In [None]:
import sys
!{sys.executable} -m pip install sciunit2==0.4.post63.dev250822854

## Convert a RHESSys end-to-end workflow notebook to Python file (*.py)

The current version of Sciunit cannot directly create a Sciunit container using Jupyter notebook. So we need to convert this Jupyter notebook to Python code (from *.ipynb to *.py). To convert this notebook, we need to set the range of notebook except for downloading HydroShare resource and Sciunit process.

In [None]:
import sys
!{sys.executable} -m pip install jupyter_contrib_nbextensions

In [None]:
notebook_name = 'YD_01_An_Approach_for_Creating_Immutable_and_Interoperable_End_to_End_Hydrological_Modeling_Computational_Workflows'
!jupyter nbconvert {notebook_name}.ipynb --to script

 * Set the range of RHESSys workflows to encapsulate all computational artifacts except for 1) downloading external data for creating an immutable container and 2) creating the process of a Sciunit container.

In [None]:
# delete the process of downloading the RHESSys model instance from HydroShare and Sciunit process in this notebook
python_file_names = ["import_library_and_set_directory.py", "set_parameters.py", "remaining_procedures.py"]
original_python_file = notebook_name+".py"
with open(original_python_file, 'r') as fin, open(python_file_names[0], 'w+') as fout:
    for i, l in enumerate(fin):
        if int(87) <= i < int(144):
            fout.write("%s\n"%l)
with open(original_python_file, 'r') as fin, open(python_file_names[1], 'w+') as fout:
    for i, l in enumerate(fin):
        if int(175) <= i < int(209):
            fout.write("%s\n"%l)
with open(original_python_file, 'r') as fin, open(python_file_names[2], 'w+') as fout:
    for i, l in enumerate(fin):
        if int(286) <= i < int(523):
            fout.write("%s\n"%l)

In [None]:
# Combine Three Python files into one Python file
modified_python_file = "Modified_" + original_python_file
filenames = ["import_library_and_set_directory.py", "set_parameters.py", "remaining_procedures.py"]
with open(modified_python_file, 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

## Create an empty Sciunit Container

In [None]:
package_name = "Sciunit_RHESSys_Coweeta_Sub18"
!/opt/conda/envs/pyrhessys/bin/sciunit create {package_name}

## Open the empty Sciunit Container

In [None]:
!/opt/conda/envs/pyrhessys/bin/sciunit open {package_name}

## Execute Sciunit to create the Sciunit Container

<font color='red'> * Depending on the condition of CyberGIS-Jupyter for water, execution time is different. In general, it takes 11 mins.</font> 

In [None]:
%%time
!/opt/conda/envs/pyrhessys/bin/sciunit exec /opt/conda/envs/pyrhessys/bin/ipython {modified_python_file}

## Show the Created Sciunit Container

In [None]:
!/opt/conda/envs/pyrhessys/bin/sciunit list

In [None]:
!/opt/conda/envs/pyrhessys/bin/sciunit show e1

# HydroShare Resources Creation 

- Share Sciunit Container and MyBinder Configurations in HydroShare to evaluate Reproducibility in a Different Computational Environment

## Case-Study 1: Create a HydroShare Resource to Evaulate Reproducibility using the Immutable Sciunit Container 

### Create a zip file of the Sciunit Container


In [None]:
zip_name = "sciunit_container"
sciunit_container = os.path.join("/home/jovyan/sciunit", package_name)
sample_dir1 = os.path.join(os.getcwd(), zip_name)
sample_dir2 = os.path.join(os.getcwd(), zip_name, zip_name)
sciunit_container = "/home/jovyan/sciunit/" + package_name
shutil.copytree(sciunit_container, sample_dir2)
shutil.make_archive(zip_name, 'zip', sample_dir1)

### Create Mybinder Configuration mannually

In [None]:
!mkdir binder

In [None]:
%%writefile binder/requirements.txt
sciunit2==0.4.post63.dev250822854

### Authenticate with HydroShare

Before you start interacting with resources in HydroShare you will need to authenticate.

In [None]:
from hsclient import HydroShare

hs = HydroShare()
hs.sign_in()

### Create Case-Study 1 HS resource in HydroShare


In [None]:
# Create the new, empty resource
new_resource = hs.create()

# Set the Title for the resource
new_resource.metadata.title = '[Case-Study 1] An Immuable Sciunit Container encapsulated RHESSys end-to-end modeling workflows at Cowweta Subbasin18 to evaluate Long-Term Reproducibility'

# Set the Abstract text for the resource
new_resource.metadata.abstract = (
    'This resource was created to share the Sciunit container that encapsulated RHESSys end-to-end workflows'
)

# Create subject keywords for the resource using a list of strings
new_resource.metadata.subjects = ['hydrologic modeling', 'computational reproducibility', 'containerization', 'open hydrology']

# Set the spatial coverage to a BoxCoverage object
new_resource.metadata.spatial_coverage = BoxCoverage(name='Coweeta subbasin18, North Carolina',
                                                     northlimit=35.0513,
                                                     eastlimit=-83.4319,
                                                     southlimit=35.0454,
                                                     westlimit=-83.4363,
                                                     projection='WGS 84 EPSG:4326',
                                                     type='box',
                                                     units='Decimal degrees')

# Create a beginning and ending date for a time period
beginDate = datetime.strptime('2005-01-01T00:00:00Z', '%Y-%m-%dT%H:%M:%S%fZ')
endDate = datetime.strptime('2008-12-31T00:00:00Z', '%Y-%m-%dT%H:%M:%S%fZ')

# Set the temporal coverage of the resource to a PeriodCoverage object
new_resource.metadata.period_coverage = PeriodCoverage(start=beginDate,
                                                       end=endDate)

# Save the changes to the resource in HydroShare
new_resource.save()

# Create a binder folder
new_resource.folder_create('binder')

# Upload one or more files to a specific folder within a resource
new_resource.file_upload(zip_name+'.zip', "YD_02_An_Approach_for_Creating_Immutable_and_Interoperable_End_to_End_Hydrological_Modeling_Computational_Workflows.ipynb")
new_resource.file_upload("binder/requirements.txt", destination_path='binder')

## Case-Study 2: Create a HydroShare Resource to Create a New Computational Environment for Extendable Computatioal Research

* Currently, Sciunit can only generate "requirements.txt" which encapsulates Python pip install libraries during Sciunit execution. Therefore, we manually create apt.txt to install GRASS GIS and other dependencies related to apt install. Also, we manually create install.R and postBuild for installation of R packages and conda install. Finally, We used setup.py to install Python libraries automatically using "requirements.txt" that encapsulated every python pip install library using Sciunit. Because there are some libraries that can't install anymore, So we create setup.py, if there are no libraries to install or occur errors, we pass.

### Create a `requirements.txt` Configuration file automatically using Sciunit

In [None]:
!/opt/conda/envs/pyrhessys/bin/sciunit export e1

### Create `setup.py` manually

In [None]:
%%writefile setup.py
from setuptools import setup, find_packages
import os
from pip.__main__ import _main as main

error_log = open('error_log.txt', 'w')

def install(package):
    try:
        main(['install'] + [str(package)])
    except Exception as e:
        error_log.write(str(e))

if __name__ == '__main__':
    f = open('e1-requirements.txt', 'r')
    for line in f:
        install(line)
    f.close()
    error_log.close()
    
setup(name='pyrhessys',
      description='A python wrapper for RHESSys',
      url='https://github.com/uva-hydroinformatics/pyRHESSys',
      author='YoungDon Choi',
      author_email='choiyd1115@gmail.com',
      license='MIT',
      packages=find_packages(),
      install_requires=[
          ],
      include_package_data=True,
      test_suite='pyrhessys.tests')

### Create `apt.py` manually

In [None]:
%%writefile apt.txt
grass
grass-dev
r-base
r-base-dev
libgdal-dev
gdal-bin
libproj-dev
wget
software-properties-common
ca-certificates

### Create `install.R` manually

In [None]:
%%writefile install.R
gdal
r-sp
r-XML
r-rgdal
r-remotes
r-rgrass7>=0.1-12
r-units
r-sf
r-stars
r-openssl
r-curl
r-httr
r-devtools

### Create `postBuild` manually

In [None]:
%%writefile postBuild
conda install --file install.R

### Create Case-Study 2 HS resource in HydroShare

In [None]:
# Create the new, empty resource
new_resource = hs.create()

# Set the Title for the resource
new_resource.metadata.title = '[Case-Study 2] MyBinder Configuration fIie and RHESSys notebook to Create a New Computational Environment and Evaluate Interoperable Reproducibility'

# Set the Abstract text for the resource
new_resource.metadata.abstract = (
    'This resource was created to share the Sciunit container that encapsulated RHESSys end-to-end workflows'
)

# Create subject keywords for the resource using a list of strings
new_resource.metadata.subjects = ['hydrologic modeling', 'computational reproducibility', 'containerization', 'open hydrology']

# Set the spatial coverage to a BoxCoverage object
new_resource.metadata.spatial_coverage = BoxCoverage(name='Coweeta subbasin18, North Carolina',
                                                     northlimit=35.0513,
                                                     eastlimit=-83.4319,
                                                     southlimit=35.0454,
                                                     westlimit=-83.4363,
                                                     projection='WGS 84 EPSG:4326',
                                                     type='box',
                                                     units='Decimal degrees')

# Create a beginning and ending date for a time period
beginDate = datetime.strptime('2005-01-01T00:00:00Z', '%Y-%m-%dT%H:%M:%S%fZ')
endDate = datetime.strptime('2008-12-31T00:00:00Z', '%Y-%m-%dT%H:%M:%S%fZ')

# Set the temporal coverage of the resource to a PeriodCoverage object
new_resource.metadata.period_coverage = PeriodCoverage(start=beginDate,
                                                       end=endDate)

# Save the changes to the resource in HydroShare
new_resource.save()

# Upload one or more files to your resource 
new_resource.file_upload("coweeta_sub18_workflow.sh", "YD_03_An_Approach_for_Creating_Immutable_and_Interoperable_End_to_End_Hydrological_Modeling_Computational_Workflows.ipynb")
new_resource.file_upload("setup.py", "apt.txt", "install.R", "postBuild", "e1-requirements.txt")

# Creation of MyBinder Computational Environment in HydroShare

* After Creating Two HydroShare resources until the previous procedure, we have to move to HydroShare (https://www.hydroshare.org) and find new two HydroShare resources. 

* After reviewing HydroShare resources, we can create MyBinder computational environments in MyBinder (https://mybinder.org)

* Before Starting to create MyBinder Computational Environment, Users have to change HydroShare resource status to `Public`



* Following Figure below

   1) Set HydroShare resource in Repository name or URL selection options

   2) Copy and Paste HydroShare resource URL from the HydroShare resources

   3) Click `launch` button to start creating MyBinder computational environment

<div class="alert alert-block alert-danger">
<b> Attention </b> In Case Study 2, users can run the notebook without any change until RHESSys modeling workflow; however, if users want to create a Sciunit container, users have to change the Sciunit executable location from `/opt/conda/envs/pyrhessys/bin/sciunit` to `/srv/conda/envs/notebook/bin/sciunit` considering installation of Sciunit in MyBinder </div>

<img src="https://raw.githubusercontent.com/DavidChoi76/figures/main/mybinder.png" width="800">