# Modify Parameters and Run CFE Simulation in the Cloud

**Authors**

- Irene Garousi-Nejad: igarousi@cuahsi.org
- Tony Castronova: acastronova@cuahsi.org
- Scott Black: sblack@cuahsi.org

**Last Updated:** 05.28.2023

**Description**

This notebook demonstrates how to build and execute scientific workflows in the cloud using cyberinfrastructure developed as part of the "HydroShare Modernization" CIROH research project. The goal is do illustrate how the general-purpose cloud analysis workflows that have been developed to support common data archival operations, can also be leveraged for scientific computing. This notebook describes the process for using the outcomes of the aforementioned CIROH project, however these capabilities are still under active development are not ready for wide-spread public use.


**Data Availability**

This notebook requires access to the complete hydrological simulation inputs and outputs.

**Computational Availability**

This notebook leverages advanced cyberinfrastructure that is currently under active development. It has been made available to attendees that the 2024 CIROH User and Developer Conference, however it is not currently available to the general public. Access to this system may be terminated without notice.


**Software Requirements**

- git+https://github.com/CUAHSI/argo-workflow-python-client.git
- geopandas-0.14.4
- fiona-1.9.6
- numpy-1.26.4
- pandas-2.2.2
- pyproj-3.6.1
- shapely-2.0.4
- ipyleaflet-0.19.1
- sidecar-0.7.0
- fsspec-2024.5.0
- s3fs-2024.5.0
- pandas-2.2.1

In [None]:
!pip install -r requirements.txt

In [None]:
from datetime import datetime
import geopandas
from helpers import ArgoAPI, SideCarMap
import fiona
import fsspec
import ipyleaflet
import pandas as pd
from pathlib import Path
import utils

### Access data on Minio and Preview Domain Parameters

Bring the outputs of the last code cell from Demo 1.

In [None]:
bucket_name=""
data_path=""
MinIO_Path = ""
Outlet_catchment_id=""
Outlet_nexus_id=""

print(MinIO_Path)

Initialize a filesystem object using the `fsspec` library, which provides a unified API for working with various file systems.

In [None]:
endpoint_url = 'https://api.minio.cuahsi.io'
fs = fsspec.filesystem('s3', client_kwargs={'endpoint_url': endpoint_url}, anon=True)

Access and read various domain-related data files stored in an S3 bucket using the `fsspec` library to interact with the S3 storage, and `geopandas` and `pandas` libraries to read geographic data.

In [None]:
# path to the data stored in an S3 bucket (full s3 url)
base_path = f's3://{bucket_name}/{data_path}'

# Define path to domain data
catchments_path = base_path + 'domain/' + 'catchments.geojson'
attributes_path = base_path + 'domain/' + 'cfe_noahowp_attributes.csv'
flowpaths_path = base_path + 'domain/' + 'flowpaths.geojson'
nexus_path = base_path + 'domain/' + 'nexus.geojson'
gpkg_path = base_path + 'domain/' + f'{data_path.split("/")[1]}_upstream_subset.gpkg'

# Read data
catchments = geopandas.read_file(fs.open(catchments_path))
flowpaths = geopandas.read_file(fs.open(flowpaths_path))
nexus = geopandas.read_file(fs.open(nexus_path))
attributes = pd.read_csv(fs.open(attributes_path))  # cfe_noahowp_attributes.csv

with fs.open(gpkg_path) as f:
    gdf_flow = geopandas.read_file(f, layer='flowpaths')
    f.seek(0)  # Reset file pointer
    gdf_cat = geopandas.read_file(f, layer='divides')
    f.seek(0)  # Reset file pointer
    gdf_nexus = geopandas.read_file(f, layer='nexus')

Print catchments and attributes data

In [None]:
catchments.columns

In [None]:
attributes.columns

combine two DataFrames (`catchments` and `attributes`) based on a common column.

In [None]:
merged = catchments.merge(attributes, on='divide_id')

Let's now plot the histogram of a parameter of interest across the domain. Then, we will highlight a catchment of interest and show the value of the parameter for the selected catchment on the histogram. You can choose any catchment from merged['divide_id']. For example, we can use the outlet catchment ID referenced in Demo 1.

In [None]:
param = 'refkdt'                      # reference hydraulic conductivity
cat = Outlet_catchment_id              # catchment ID

utils.plot_single_cat(merged, param, 'Infiltration Scaling Parameter', 'viridis', cat)

Print the parameter value for the catchment of interest

In [None]:
print(f'{param} for {cat}: {merged.loc[merged["divide_id"]==cat].refkdt.values[0]}')

We can also find this value in the configuration file for the selected catchment.

In [None]:
config_path = base_path + 'config/' +  f'{cat}_config.ini'
config_dict = utils.load_config(config_path, endpoint_url, fs)
config_dict

### Modify Parameters

**refkdt** is actually runoff/infiltration rate and you can learn more about it in this article. This parameter is manually calibrated over multiple simulations and significantly impacts surface infiltration and hence the partitioning of total runoff into surface and subsurface runoff. <strong>Increased values of REFKDT leads to more infiltration and less surface runoff.

In [None]:
# change the value
var_name = 'refkdt'
var_value = '0'
utils.change_config(fs, config_path, config_dict, var_name, var_value)

In [None]:
# view changes
config_dict = utils.load_config(config_path, endpoint_url, fs)
config_dict

### Submit jobs

In [None]:
# set argo token, obtained from https://workflows.argo.cuahsi.io/userinfo 
ARGO_TOKEN=''

In [None]:
# create an instance of ArgoAPI using our ARGO_TOKEN
argo = ArgoAPI(ARGO_TOKEN)

In [None]:
# display the metadata for the workflow
argo.describe('ngen-run')

To run this workflow, we need to provide values for each of the input parameters listed above. In this example, the `input bucket` and `input data path` refer to the data that has already been subsetted, prepared, and used in Demo 1, which is stored on Minio S3. The `output bucket` is the same as the input bucket because we want to save the results in the same location. However, we will save the model outputs of the new simulation in a new `out put path` to prevent overwriting the results from Demo 1. This approach allows us to compare the results and see the impact of our modifications to the parameter of interest.

In [None]:
parameters = {
    "input-data-bucket": bucket_name,
    "input-data-path": data_path,
    "catchment-file-path": "domain/catchments.geojson",
    "nexus-file-path": "domain/nexus.geojson",
    "realization-file-path": "config/realization.json",
    "output-bucket": bucket_name,
    "output-path": data_path+f'{cat}_{param}_0',
}

parameters

In [None]:
job_name = argo.submit_workflow('ngen-run', parameters)
job_name

In [None]:
print('------------------------------------')
print('| View the workflow in the Argo UI |')
print('------------------------------------')
print(f'https://workflows.argo.cuahsi.io/workflows/workflows/{job_name}\n')

print('----------------------')
print('| Current Job Status |')
print('----------------------')
argo.workflow_status(job_name)

To preview the output data that our job created, we need to first construct the url to our output data. Our output path use our hydroshare username so that we can easily find it.

In [None]:
print(f'Browse the output files at:')
print(f'https://console.minio.cuahsi.io/browser/{bucket_name}/{data_path}{cat}_{param}_0')

### Preview Simulation Results

In [None]:
url = f's3://{bucket_name}/{data_path}'

forcing_path = f'{url}forcing'
base_results_path = f'{url}results'
mod_results_path = f'{url}{cat}_{param}_0/results'
print(f'forcing_path: {forcing_path}\nbase results_path: {base_results_path}\nmodified_results_path: {mod_results_path}')

In [None]:
############# load data from Demo 1 (base simulation)
# connect to the CUAHSI MinIO server that is hosting our data
s3 = fsspec.filesystem("s3",
                       anon=True,
                       client_kwargs={'endpoint_url':'https://api.minio.cuahsi.io'},
                       use_listings_cache=False,
                      )

forcing_csv_files = s3.glob(f'{forcing_path}/cat*.csv')
base_nex_csv_files = s3.glob(f'{base_results_path}/nex*.csv')
base_cat_csv_files = s3.glob(f'{base_results_path}/cat*.csv')
base_forcing_xr = utils.forcing_csv2xr(s3, forcing_csv_files)   # convert csv to xarray
base_nex_xr =  utils.nex_csv2xr(s3, base_nex_csv_files)         # convert csv to xarray 
base_cat_xr = utils.cat_csv2xr(s3, base_cat_csv_files)          # convert csv to xarray 

############# load data from Demo 2 (modify parameters)
mod_nex_csv_files = s3.glob(f'{mod_results_path}/nex*.csv')
mod_cat_csv_files = s3.glob(f'{mod_results_path}/cat*.csv')
mod_nex_xr =  utils.nex_csv2xr(s3, mod_nex_csv_files)           # convert csv to xarray
mod_cat_xr = utils.cat_csv2xr(s3, mod_cat_csv_files)            # convert csv to xarray

Create a plot of input preciptation and simulated streamflow for the base (Demo 1) and test (Demo 2) scenarios.

In [None]:
utils.plot_flow_comparison(Outlet_catchment_id, Outlet_nexus_id, 
                         base_forcing_xr, base_nex_xr, base_cat_xr, 
                         mod_nex_xr, mod_cat_xr)

In [None]:
# save xarray datasets as netcdf files to your local environment
base_forcing_xr.to_netcdf('./forcing.nc')
base_nex_xr.to_netcdf('./base_simulation_nexus_results.nc')
base_cat_xr.to_netcdf('./base_simulation_cat_results.nc')
mod_nex_xr.to_netcdf('./test_simulation_xr_results.nc') 
mod_cat_xr.to_netcdf('./test_simulation_cat_results.nc') 


Upload files to the S3-compatible object storage service where your model inputs and outputs are stored. Currently, we have set `signature_version=UNSIGNED`, which means you do not require authentication and can use unsigned requests.

In [None]:
import os
import boto3
import glob
from botocore import UNSIGNED
from botocore.client import Config

s3 = boto3.client('s3', endpoint_url="https://api.minio.cuahsi.io", config=Config(signature_version=UNSIGNED))

for file in glob.glob('./*.nc'):
    s3.upload_file(file, bucket_name, f'{data_path}post_process/{os.path.basename(file)}')

print(f'https://console.minio.cuahsi.io/browser/{bucket_name}/{data_path}')

### Share Results on HydroShare

Save the data of the modeling and analysis results on HydroShare. Use the HydroShare Python Client (hsclient), a library that allows users to interact with HydroShare, to create a new resource for your analysis datasets. For more information, see the [hsclient GitHub page](https://github.com/hydroshare/hsclient) and this [HydroShare resource](https://www.hydroshare.org/resource/7561aa12fd824ebb8edbee05af19b910/).

In [None]:
from hsclient import HydroShare

# sign in to the system using your HydroShare credentials
hs = HydroShare()
hs.sign_in()

# create a resource
res = hs.create()

# define metadata (including title, abstract, keywords)
res.metadata.title = 'Comparison of CFE Model Outputs to Assess the Impact of Modifying Soil Hydraulic Conductivity on Streamflow'
res.metadata.abstract = 'This resource provides links to the comparative analysis of two distinct CFE simulation scenarios. \
    The first scenario, referred to as the "base" scenario, utilizes default model parameters. \
    The second scenario, named the "test" scenario, uses a modified `refkdt` (reference hydraulic conductivity). \
    All relevant model inputs, configurations, outputs, and post-processing results for this comparative\
    analysis are sored in S3 storage.'
res.metadata.subjects = ['argo workflows', 'cfe model outputs', 'hydraulic conductivity', 'CIROH developers conference 2024']

# Call the save function to save the metadata edits to HydroShare
res.save()

# Creates a HydroShare reference object to reference content outside of the resource
res.reference_create("s3reference", MinIO_Path)


In [None]:
print(f'Your new resource is available at: {res.metadata.url}')

Add more metadata

In [None]:
from hsmodels.schemas.fields import PeriodCoverage

# Create a beginning and ending date for a time period
beginDate = pd.to_datetime(base_forcing_xr.time[0].values)
endDate = pd.to_datetime(base_forcing_xr.time[-1].values)

# Set the temporal coverage of the resource to a PeriodCoverage object
res.metadata.period_coverage = PeriodCoverage(start=beginDate, end=endDate)

# Save the changes to the resource in HydroShare
res.save()
print(f'Your new resource is available at: {res.metadata.url}')