# Ohio River Model

This notebook showcases the Clearwater Riverine water quality model, which uses hydrodynamic outputs from HEC-RAS 2D in order to perform advection-diffusion water quality calculations. This model is in the `clearwater_riverine` package (in development). 

[Environmental Fluid Dynamics Code](https://www.epa.gov/ceam/environmental-fluid-dynamics-code-efdc) (EFDC) is a surface water modeling system developed by the EPA that can simulate water quality constituent transport in rivers, stratified estuaries, lakes, and coastal seas. Since EFDC is a widely used and trusted water quality model, we can compare results from an EFDC model to results from a comparable system in a Clearwater Riverine model to demonstrate this new model's efficacy. This demonstration leverages a RAS model of the Ohio River that was created based on an [EFDC](https://www.epa.gov/ceam/environmental-fluid-dynamics-code-efdc) model of the Ohio River that modeled E. coli transport through the system as a conservative dye. The RAS model was created to mirror the EFDC model, including the 2D model mesh, the initial conditions, and the boundary conditions. However, there are some key differences between the two models (e.g., the cells aren't perfectly aligned, the modeling timestep is different, and the spatial extent is different). It is then fed into the Clearwater Riverine water quality model, which simulates advection and diffusion of E. coli through the model domain. 

This notebook follows the following structure:
1. Run the Clearwater Riverine water quality model using the Ohio River RAS model output. 
2. Load EFDC water quality results. 
3. Compare Clearwater Riverine water quality results to EFDC water quality results to demonstrate that the Clearwater Riverine model produces similar results. 

# Set-Up Workspace

## Package Imports

Be sure to install the `clearwater_riverine` environment from the `environment.yml` - instructions are provided on the ReadMe of the [Clearwater-riverine repo](https://github.com/EnvironmentalSystems/ClearWater-riverine).

In [None]:
# import dependencies for running the notebook 
from importlib import reload
from pathlib import Path
import warnings

import numpy as np
import pandas as pd
import geopandas as gpd

#plotting 
import matplotlib.pyplot as plt
import holoviews as hv
from holoviews.operation.datashader import datashade, rasterize
import geoviews as gv
from shapely.geometry import Polygon
import shapely
from shapely.errors import ShapelyDeprecationWarning
warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning) 
import panel as pn
from bokeh.resources import INLINE
hv.extension("bokeh")

In [None]:
import sys
sys.path.append('/Users/todd/GitHub/ecohydrology/ClearWater-riverine/src')

In [None]:
import clearwater_riverine as cwr

## RAS File Paths

Define the file path to the RAS2D HDF output file of interest. In this example, the HEC-RAS model output for the Ohio River is saved to the following path:

In [36]:
fpath = 'data/ohio_river/OhioRiver_m.p22.hdf'

# Example Clearwater Riverine Workflow

## Set-Up Model Mesh

The code below creates a ClearwaterRiverine model. Initializing this model requires two arguments:
* `fpath`: the filepath to the RAS2D HDF output (defined in the code block above).
* `diffusion_coefficient`: assumed diffusion coefficient for the entire model domain (arbitrarily set to 0.1 here)

In [37]:
%%time
ohio = cwr.ClearwaterRiverine(fpath, 0.1, verbose=True)

Populating Model Mesh...
Calculating Required Parameters...
CPU times: user 2.54 s, sys: 115 ms, total: 2.66 s
Wall time: 2.77 s


In [38]:
# display the ohio model domain xarray
ohio.mesh

## Define Initial Conditions

Initial conditions for the Clear Water model simulation of the Ohio River were developed as follows: A 5 ft raster was made from the EFDC shape file with the E. coli initial conditions and then a zonal statistic average was calculated for the HEC-RAS cells, creating the timeseries imported here.

To set up the initial conditions in a Clearwater Riverine model, use the `intitial_condtions` method, which accepts a CSV file with `Cell_Index` and `Concentration` as the column headers. 

In [39]:
%%time
ohio.initial_conditions('data/ohio_river/cwr_initial_conditions.csv')

CPU times: user 3.08 ms, sys: 2.46 ms, total: 5.54 ms
Wall time: 4.91 ms


## Define Boundary Conditions

Use the `boundary_conditions` method to pass a CSV containing boundary condition information, with should have the following columns:
* `RAS2D_TS_Name`: name of boundary condition in the HEC-RAS model
* `Datetime`: date and time of boundary condition
* `Concentration`: concentration (mass per volume)

The current set-up assumes that each timestep in the RAS model will have a concentration in this CSV file and no further interpolation/data cleaning is required. We will likely want to build out this functionality to accept any datetimes and properly interpolate the data to fit the RAS model, rather than requiring users to do all clean-up and interpolation on their own.

The boundary conditions were developed based on data in the EFDC model so that we can directly compare model output. 

In [40]:
%%time
ohio.boundary_conditions('data/ohio_river/cwr_boundary_conditions.csv')

CPU times: user 11.7 s, sys: 96.1 ms, total: 11.8 s
Wall time: 12 s


## Run Model

Run the RAS2D water quality model using `simulate_wq`, which accepts the following (optional) parameters:
* `input_mass_units`: User-defined mass units for concentration timeseries based on initial / boundary conditions. Assumes mg if no value is specified. 
* `input_volume_units`:     User-defined volume units for concentration timeseries based on initial / boundary conditions. Assumes L if no value
                                        is specified.
* `input_liter_conversion`: If concentration inputs (initial / boundary conditions) are not in mass/L, supply the conversion factor to 
                                        convert the volume unit to liters. 
* `save`:                   Boolean indicating whether the file should be saved. Default is to not save the output.
* `fpath_out`:              Filepath where the output file should be stored. Default to save in current directory.
* `fname_out`:              Filename of saved output.

**The unit info should probably be moved to the boundary condition / initial condition setup** and also potentially needs to be re-thought out. 

In this example, the input mass units are `cfu` (`colony forming unit`) since we are modelling E. coli, and the input volume units are `100 mL`. We therefore have to provide the `input_liter_conversion` parameter to correctly convert our input units to liters, which is the assumed unit in all calculations.

In [41]:
# create path to save output
path = Path.cwd() / 'data_temp'
try:
    path.mkdir(parents=True, exist_ok=False)
except FileExistsError:
    print("Folder is already there")
else:
    print("Folder was created")

Folder is already there


In [42]:
%%time
ohio.simulate_wq(input_mass_units= 'cfu',
                   input_volume_units = '100 mL',
                   input_liter_conversion = 0.1,
                   save=True,
                   output_file_path= 'data_temp/ohio-river.zarr',
)


Starting WQ Simulation...
 Assuming concentration input has units of cfu/100 mL...
     If this is not true, please re-run the wq simulation with input_mass_units, input_volume_units, and liter_conversion parameters filled in appropriately.
 25%
 50%
 75%
 100%
CPU times: user 9.9 s, sys: 433 ms, total: 10.3 s
Wall time: 10.3 s


## Plot Model Results

### Quick Plot 

If you want to quickly plot the model results, the `quick_plot` method takes the cell centroids and plots them in a scatter plot.


In [43]:
ohio.quick_plot()



BokehModel(combine_events=True, render_bundle={'docs_json': {'44f6c836-10c6-4c1f-8569-82a42c309b5c': {'version…

The default color bar will go from the minimum to maximum concentration value. If you want to specify a maximum value for your colorbar, use the `clim_max` parameter:

In [44]:
# define maximum value for plotting
ohio.quick_plot(clim_max = 500)



BokehModel(combine_events=True, render_bundle={'docs_json': {'dd359f5a-dc61-410e-a281-80765db78745': {'version…

### Detailed Polygon Plot

We can use the `plot()` method to plot a more detailed mesh. For this plotting function, you must specifiy the projection of the HEC-RAS model.

In [45]:
%%time
ohio.plot(crs='ESRI:102279')

CPU times: user 4.39 s, sys: 132 ms, total: 4.53 s
Wall time: 4.56 s


  return lib.buffer(
  return lib.buffer(
  return lib.buffer(


BokehModel(combine_events=True, render_bundle={'docs_json': {'a589abfa-9288-428f-b3ae-35439030eeda': {'version…

Some additional features of the `plot()` method:
* You can use the `clim_max` parameter to specify the maximum colorbar value.
* You can use the `time_index_range` to specify a start time index and and end time index. This limits the time extent displayed in the plot (e.g., if you only want to show a plot of a single event).

Note that if you call the `plot()` method more than once, you do **not** need to re-specify the projection. Note that the method is much faster than the initial call, because all re-projections have already occurred.

In [46]:
%%time
ohio.plot(clim_max = 1000, time_index_range= (280,340))

  return lib.buffer(
  return lib.buffer(


CPU times: user 12.7 ms, sys: 2.99 ms, total: 15.7 ms
Wall time: 14.3 ms


  return lib.buffer(


BokehModel(combine_events=True, render_bundle={'docs_json': {'44d8d1c7-8a0e-4430-8ded-fe26b6354b1e': {'version…

#### Example Save to HTML

The code below shows how to save the plot as an interactive HTML file. These cells will be saved as raw cells because saving the html takes a long time. However, useres can convert these raw cells to code blocks if they would like to choose to save the file to HTML.

#### Example Save to Gif 

Limit the timeframe so that it can be converted from a [HoloMap from a DynamicMap](https://holoviews.org/user_guide/Live_Data.html#converting-from-dynamicmap-to-holomap) and then [exported to a gif](https://holoviews.org/user_guide/Exporting_and_Archiving.html). These cells will be saved as raw cells because saving a gif takes a long time, but users can convert the raw cells to code blocks if they would like to choose to save a gif. 

# Load EFDC Model

Now we need to load the EFDC model results of the Ohio River so that we have a point of comparison for the Clearwater Riverine water quality output. 

## EFDC File Paths

In [47]:
# Set your project directory to your local folder for your clone of this repository
project_path = Path.cwd()
project_path

PosixPath('/Users/todd/GitHub/ecohydrology/ClearWater-riverine/examples')

In [48]:
# Assign relative paths for model output and EFDC domain
data_folder = Path('data/')
data_path = project_path / data_folder
print(data_path)
efdc_output = pd.read_parquet(data_path / 'ohio_river/efdc/ohio_river-2010.parquet')

/Users/todd/GitHub/ecohydrology/ClearWater-riverine/examples/data


In [49]:
efdc_domain = gpd.read_file(data_path / 'ohio_river/efdc/efdc_ohio_river_model_shapefile/EFDC.shp')

## Process EFDC Data

Now we must align the date range of EFDC to match with HEC-RAS, since the models were run over different time extents. Furthermore, the model domain and the model results (concentrations and timesteps) are currently in separate files. In this section we also combine the output data with the EFDC model mesh into a single geopandas dataframe.

In [50]:
arr = ohio.mesh

In [51]:
# limit date range to match RAS
efdc_output = efdc_output[(efdc_output.datetime >= pd.to_datetime(arr.time.min().values)) & (efdc_output.datetime <= pd.to_datetime(arr.time.max().values))]

In [None]:
# combine output and geometry from EFDC
efdc_domain.rename(columns={'GRIDNO': 'grid_no'}, inplace = True)
efdc_domain = efdc_domain[['grid_no', 'geometry']]
efdc = efdc_domain.merge(efdc_output, on = 'grid_no', how = 'right')

In [None]:
dts = efdc['datetime'].unique()
df_dict = {}
for i in range(len(dts)):
    # efdc = efdc[efdc.geometry != None]
    df_dict[i] = efdc[efdc.datetime == dts[i]]

## EFDC Plot

In [None]:
mval = 5000
def efdc_time_mesh(time, max_value = mval):
    time_title = pd.to_datetime(str(dts[time])).strftime('%m/%d/%Y %H:%M ')
    map = gv.Polygons(df_dict[time].to_crs('EPSG:4326'), vdims=['DYE_mgL', 'grid_no']).opts(height=700,
                                                                            width = 1000,
                                                                                color='DYE_mgL',
                                                                            colorbar = True,
                                                                            cmap = 'OrRd', 
                                                                            clim = (0,max_value),
                                                                            line_width = 0.1,
                                                                             title = time_title, 
                                                                               tools = ['hover'],
                                                                           )
    return map * gv.tile_sources.CartoLight()


In [None]:
efdc_meshes = gv.DynamicMap(efdc_time_mesh, kdims='Time').redim.values(Time=df_dict.keys())
efdc_meshes

## RAS Plot

In [None]:
ras_meshes = ohio.plot() # gv.DynamicMap(time_mesh, kdims='Time').redim.values(Time=mesh2d_dict.keys())
ras_meshes

# RAS / EFDC Model Comparison
## Make Domains Comparable

You can see from the plots above that the model domains are not aligned; the HEC-RAS model and corresponding Clearwater Riverine model have a much more limited spatial extent. The following code is used to limit the EFDC spatial domain to align with the Clearwater Riverine spatial domain. 

We start by dissolving the Clearwater Riverine boundary into a single polygon so that we can identify all EFDC polygons that fall within the Clearwater Riverine domain:

In [None]:
%%time
dissolve_boundary = ohio.gdf[ohio.gdf.datetime == pd.to_datetime(" 2010-05-29 00:00:00")]
dissolve_boundary = dissolve_boundary.assign(dissolve_param = 1)
ras_outline = dissolve_boundary.dissolve(by='dissolve_param')

Then re-project the RAS outline to match the CRS of the EFDC dataset. This is required for future spatial joins.

In [None]:
ras_outline = ras_outline.to_crs(efdc.crs)

Perform a spatial join to limit the EFDC extent to the Clearwater Riverine extent.  

In [None]:
%%time
efdc_full_df_raw = gpd.sjoin(efdc, ras_outline, how='left')
efdc_full_df = efdc_full_df_raw[~efdc_full_df_raw.cell.isna()]

Plotting the updated EFDC domain confirms that we have successfully limited the spatial extent to the model domain of our Clearwater Riverine model. 

In [None]:
mval = 5000
def efdc_time_mesh(time, max_value = mval):
    time_title = pd.to_datetime(str(dts[time])).strftime('%m/%d/%Y %H:%M ')
    efdc_sub = efdc_full_df[efdc_full_df.datetime_left == dts[time]]
    map = gv.Polygons(efdc_sub.to_crs('EPSG:4326'), vdims=['DYE_mgL', 'grid_no']).opts(height=700,
                                                                            width = 1000,
                                                                                color='DYE_mgL',
                                                                            colorbar = True,
                                                                            cmap = 'OrRd', 
                                                                            clim = (0,max_value),
                                                                            line_width = 0.1,
                                                                             title = time_title, 
                                                                               tools = ['hover'],
                                                                           )
    return map * gv.tile_sources.CartoLight()

In [None]:
efdc_meshes = gv.DynamicMap(efdc_time_mesh, kdims='Time').redim.values(Time=df_dict.keys())
efdc_meshes

## Side By Side Comparison Plots

Create side by side comparison plots. 

In [None]:
efdc_full_df = efdc_full_df.to_crs('EPSG:4326')
ras_full_df =  ohio.gdf # ras_full_df.to_crs('EPSG:4326')

In [None]:
max_value = 1000
def plot_maps(datetime):
    efdc_sub_df = efdc_full_df[efdc_full_df.datetime_left == datetime]
    ras_sub_df = ras_full_df[ras_full_df.datetime == datetime]

    
    efdc_map = gv.Polygons(efdc_sub_df, vdims=['DYE_mgL']).opts(height=600,
                                                                       width = 800,
                                                                    color='DYE_mgL',
                                                                    colorbar = True,
                                                                    cmap = 'OrRd', 
                                                                    clim = (0,max_value),
                                                                    line_width = 0.1,
                                                                    tools = ['hover'],
                                                                    title = "EFDC"
                                                                   )
    ras_map = gv.Polygons(ras_sub_df, vdims=['concentration']).opts(height=600,
                                                                       width = 800,
                                                                    color='concentration',
                                                                    colorbar = True,
                                                                    cmap = 'OrRd', 
                                                                    clim = (0,max_value),
                                                                    line_width = 0.1,
                                                                    tools = ['hover'],
                                                                    title = "Clearwater Riverine"
                                                                   )
    return (ras_map * gv.tile_sources.CartoLight()) + (efdc_map * gv.tile_sources.CartoLight())

# create your dynamicmap
dmap = hv.DynamicMap(plot_maps, kdims=['datetime'])

# define the range of values that your dropdowns should have
dmap.redim.values(datetime=efdc_full_df.datetime_left.unique())

### Save to HTML

### Make a GIF

## Compare Locations
When we look at the side by side maps above, the model results from EFDC and Clearwater Riverine look similar! However, now we want to create timeseries plots for each cell that will  provide a quantitative comparison of results. 

First, we must link EFDC cells to Clearwater Riverine cells to provide approximate cell to cell / location to location comparisons. We do this by finding the centroid of each Clearwater Riverine cell and then find the closest EFDC cell using a spatial join. 

In [None]:
# centroids works best with projected data. 
ras_centroids = dissolve_boundary.to_crs('ESRI:102279').centroid
point_gdf = gpd.GeoDataFrame({"geometry": ras_centroids, 'ras_cell': dissolve_boundary.cell})
efdc_full_df.drop(columns=['index_right'], inplace=True)

# both datasets must be within same projection for spatial join
efdc_full_proj = efdc_full_df.to_crs('ESRI:102279')
efdc_full_df_cell_comparison = gpd.sjoin(point_gdf, efdc_full_proj, predicate='within')
full_df = efdc_full_df_cell_comparison.merge(ras_full_df, left_on=['ras_cell', 'datetime_left'], right_on = ['cell', 'datetime'], how='left')

The following plot will highlight the cell that is being displayed in the timeseries plot to the left. 

In [None]:
def plot_conc(cell):
    sub_df = full_df[full_df.ras_cell == cell]
    ras_curve = hv.Curve(sub_df, ('datetime_left', 'datetime_left'), ('concentration_y', 'concentration_y'), label='Clearwater Riverine').opts(height=600, width=800, tools=['hover'])
    efdc_curve = hv.Curve(sub_df, ('datetime_left', 'datetime_left'), ('DYE_mgL', 'DYE_mgL'), label='EFDC').opts(height=600, width=800, tools=['hover'])
    cell_df = dissolve_boundary
    cell_df['color'] = [f * 0 if f != cell else 1 for f in cell_df.cell]
    map = gv.Polygons(cell_df.to_crs('EPSG:4326'), vdims=['color']).opts(height=600,
                                                                        width = 800,
                                                                        color = 'color',
                                                                        cmap = 'viridis', 
                                                                        clim = (0,1),
                                                                        line_width = 0.1,
                                                                        tools = ['hover'],
                                                                       )
    return ras_curve * efdc_curve + map * gv.tile_sources.CartoLight()

# create your dynamicmap
dmap = hv.DynamicMap(plot_conc, kdims=['cell'])

# define the range of values that your dropdowns should have
dmap.redim.values(cell=full_df.cell_y.unique())

## Difference Plot

Plot the difference in concentration for each cell. 

In [None]:
# calculate the difference between RAS and EFDC 
full_df.drop(['geometry_x'], axis=1, inplace=True)
full_df.rename(columns={'geometry_y':'geometry'}, inplace=True)
full_df['difference'] = full_df['concentration_y'] - full_df['DYE_mgL'] # RAS - EFDC


In [None]:
test = full_df[['datetime_left', 'difference', 'geometry']]
test = test.reset_index(drop=True)
gdf = gpd.GeoDataFrame(test)

In [None]:
def plot_diff(datetime):
    sub_df = gdf[gdf.datetime_left == datetime]
    diff_map = gv.Polygons(sub_df, vdims=['difference']).opts(height=600,
                                                                width = 800,
                                                                color='difference',
                                                              cmap = 'bwr',
                                                              clim = (-1000, 1000),
                                                                colorbar = True,
                                                                line_width = 0.1,
                                                                tools = ['hover'],
                                                                title = "RAS - EFDC")
    return diff_map * gv.tile_sources.CartoLight()

# create your dynamicmap
dmap = hv.DynamicMap(plot_diff, kdims=['datetime'])

# define the range of values that your dropdowns should have
dmap.redim.values(datetime=efdc_full_df.datetime_left.unique())

### Make a GIF