[1pt] Tool to compare synthetic rating curve with benchmark rating cu…

…rve (sierra test) (#332) * `usgs_gage_crosswalk.py`: generates `usgs_elev_table.csv` in run_by_unit.py with elevation at USGS gages * `rating_curve_comparison.py`: post-processing script to plot and calculate metrics between FIM and USGS rating curve data * updates `aggregate_fim_outputs.py` call argument in `fim_run.sh` from 4 jobs to 6 jobs (optimizing API performance) * reroutes median elevation data from `add_crosswalk.py` and `rem.py` to new file (depreciating `hand_ref_elev_table.csv`) *- adds new files to `viz_whitelist` in `output_cleanup.py`
NOAA-OWP · Apr 1, 2021 · 44b6333 · 44b6333
1 parent 4df6d1d
commit 44b6333
Show file tree

Hide file tree

Showing 8 changed files with 590 additions and 31 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,68 +1,82 @@
 All notable changes to this project will be documented in this file.
 We follow the [Semantic Versioning 2.0.0](http://semver.org/) format.
 
+## v3.0.12.2 - 2021-04-01 - [PR #332](https://github.com/NOAA-OWP/cahaba/pull/332)
+
+Created tool to compare synthetic rating curve with benchmark rating curve (sierra test). resolves issue #293; resolves issue #308
+
+### Changes
+ - update `aggregate_fim_outputs.py` call argument in `fim_run.sh` from 4 jobs to 6 jobs (optimizing API performance)
+ - reroutes median elevation data from `add_crosswalk.py` and `rem.py` to new file (depreciating `hand_ref_elev_table.csv`)
+ - adding new files to `viz_whitelist` in `output_cleanup.py`
+
+### Additions
+ - `usgs_gage_crosswalk.py`: generates `usgs_elev_table.csv` in run_by_unit.py with elevation and additional attributes at USGS gages. 
+ - `rating_curve_comparison.py`: post-processing script to plot and calculate metrics between synthetic rating curves and USGS rating curve data. 
+<br/><br/>
 
 ## v3.0.12.1 - 2021-03-31 - [PR #336](https://github.com/NOAA-OWP/cahaba/pull/336)
 
- Fix spatial option in `eval_plots.py` when creating plots and spatial outputs.
+Fix spatial option in `eval_plots.py` when creating plots and spatial outputs.
 
- ### Changes
+### Changes
  - Removes file dependencies from spatial option. Does require the WBD layer which should be specified in `.env` file. 
  - Produces outputs in a format consistent with requirements needed for publishing.
  - Preserves leading zeros in huc information for all outputs from `eval_plots.py`.
 
 ### Additions
-- Creates `fim_performance_points.shp`: this layer consists of all evaluated ahps points (with metrics). Spatial data retrieved from WRDS on the fly.
-- Creates `fim_performance_polys.shp`: this layer consists of all evaluated huc8s (with metrics). Spatial data retrieved from WBD layer.
+ - Creates `fim_performance_points.shp`: this layer consists of all evaluated ahps points (with metrics). Spatial data retrieved from WRDS on the fly.
+ - Creates `fim_performance_polys.shp`: this layer consists of all evaluated huc8s (with metrics). Spatial data retrieved from WBD layer.
 <br/><br/>
 
 ## v3.0.12.0 - 2021-03-26 - [PR #327](https://github.com/NOAA-OWP/cahaba/pull/237)
 
- Add more detail/information to plotting capabilities.
- ### Changes
+Add more detail/information to plotting capabilities.
+
+### Changes
  - Merge `plot_functions.py` into `eval_plots.py` and move `eval_plots.py` into the tools directory.
  - Remove `plots` subdirectory.
+ - 
 ### Additions
  - Optional argument to create barplots of CSI for each individual site.
  - Create a csv containing the data used to create the scatterplots. 
 <br/><br/>
 
 ## v3.0.11.0 - 2021-03-22 - [PR #319](https://github.com/NOAA-OWP/cahaba/pull/298)
 
- Improvements to CatFIM service source data generation.
+Improvements to CatFIM service source data generation.
 
- ### Changes
+### Changes
  - Renamed `generate_categorical_fim.py` to `generate_categorical_fim_mapping.py`.
  - Updated the status outputs of the `nws_lid_sites layer` and saved it in the same directory as the `merged catfim_library layer`.
  - Additional stability fixes (such as improved compatability with WRDS updates).
+
 ### Additions
  - Added `generate_categorical_fim.py` to wrap `generate_categorical_fim_flows.py` and `generate_categorical_fim_mapping.py`.
  - Create new `nws_lid_sites` shapefile located in same directory as the `catfim_library` shapefile.
-
 <br/><br/>
 
 ## v3.0.10.1 - 2021-03-24 - [PR #320](https://github.com/NOAA-OWP/cahaba/pull/320)
 
- Patch to synthesize_test_cases.py.
+Patch to synthesize_test_cases.py.
 
- ### Changes
+### Changes
  - Bug fix to `synthesize_test_cases.py` to allow comparison between `testing` version and `official` versions.
-
 <br/><br/>
 
 ## v3.0.10.0 - 2021-03-12 - [PR #298](https://github.com/NOAA-OWP/cahaba/pull/298)
 
- Preprocessing of flow files for Categorical FIM.
+Preprocessing of flow files for Categorical FIM.
 
 ### Additions
  - Generate Categorical FIM flow files for each category (action, minor, moderate, major).
  - Generate point shapefile of Categorical FIM sites.
  - Generate csv of attribute data in shapefile.
  - Aggregate all shapefiles and csv files into one file in parent directory.
  - Add flood of record category.
+
  ### Changes
  - Stability fixes to `generate_categorical_fim.py`.
-
 <br/><br/>
 
 ## v3.0.9.0 - 2021-03-12 - [PR #297](https://github.com/NOAA-OWP/cahaba/pull/297)

diff --git a/fim_run.sh b/fim_run.sh
@@ -152,5 +152,5 @@ fi
 echo "$viz"
 if [[ "$viz" -eq 1 ]]; then
     # aggregate outputs
-    python3 /foss_fim/src/aggregate_fim_outputs.py -d $outputRunDataDir -j 4
+    time python3 /foss_fim/src/aggregate_fim_outputs.py -d $outputRunDataDir -j 6
 fi
diff --git a/src/add_crosswalk.py b/src/add_crosswalk.py
@@ -220,9 +220,8 @@ def add_crosswalk(input_catchments_fileName,input_flows_fileName,input_srcbase_f
     output_hydro_table = output_hydro_table.merge(input_huc.loc[:,[FIM_ID,'HUC8']],how='left',on=FIM_ID)
 
     if output_flows.HydroID.dtype != 'str': output_flows.HydroID = output_flows.HydroID.astype(str)
-    output_hydro_table = output_hydro_table.merge(output_flows.loc[:,['HydroID','LakeID','Median_Thal_Elev_m']],how='left',on='HydroID')
+    output_hydro_table = output_hydro_table.merge(output_flows.loc[:,['HydroID','LakeID']],how='left',on='HydroID')
     output_hydro_table['LakeID'] = output_hydro_table['LakeID'].astype(int)
-    output_hydro_table['Median_Thal_Elev_m'] = output_hydro_table['Median_Thal_Elev_m'].astype(float).round(2)
     output_hydro_table = output_hydro_table.rename(columns={'HUC8':'HUC'})
     if output_hydro_table.HUC.dtype != 'str': output_hydro_table.HUC = output_hydro_table.HUC.astype(str)
 

diff --git a/src/output_cleanup.py b/src/output_cleanup.py
@@ -31,15 +31,20 @@ def output_cleanup(huc_number, output_folder_path, additional_whitelist, is_prod
         'gw_catchments_reaches_filtered_addedAttributes.tif',
         'hydroTable.csv',
         'src.json',
-        'small_segments.csv'
+        'small_segments.csv',
+        'usgs_elev_table.csv',
+        'hand_ref_elev_table.csv'
     ]
 
     # List of files that will be saved during a viz run
     viz_whitelist = [
         'rem_zeroed_masked.tif',
+        'gw_catchments_reaches_filtered_addedAttributes_crosswalked.gpkg',
+        'demDerived_reaches_split_filtered_addedAttributes_crosswalked.gpkg',
         'gw_catchments_reaches_filtered_addedAttributes.tif',
         'hydroTable.csv',
-        'src.json'
+        'src.json',
+        'small_segments.csv'
     ]
 
     # If "production" run, only keep whitelisted files

diff --git a/src/rem.py b/src/rem.py
@@ -11,7 +11,7 @@
 from utils.shared_functions import getDriver
 
 
-def rel_dem(dem_fileName, pixel_watersheds_fileName, rem_fileName, thalweg_raster, hydroid_fileName, hand_ref_elev_fileName, dem_reaches_filename):
+def rel_dem(dem_fileName, pixel_watersheds_fileName, rem_fileName, thalweg_raster, hydroid_fileName, dem_reaches_filename):
     """
         Calculates REM/HAND/Detrended DEM
 
@@ -25,8 +25,6 @@ def rel_dem(dem_fileName, pixel_watersheds_fileName, rem_fileName, thalweg_raste
             File name of output relative elevation raster.
         hydroid_fileName : str
             File name of the hydroid raster (i.e. gw_catchments_reaches.tif)
-        hand_ref_elev_fileName
-            File name of the output csv containing list of hydroid values and HAND zero/reference elev
         dem_reaches_filename
             File name of the reaches layer to populate HAND elevation attribute values and overwrite as output
 
@@ -108,20 +106,25 @@ def make_catchment_min_dict(flat_dem, catchment_min_dict, flat_catchments, thalw
     gw_catchments_pixels_masked_object.close()
     thalweg_raster_object.close()
 
-###############################################
     # Merge and export dictionary to to_csv
     catchment_min_dict_df = pd.DataFrame.from_dict(catchment_min_dict, orient='index') # convert dict to dataframe
     catchment_min_dict_df.columns = ['Median_Thal_Elev_m']
     catchment_hydroid_dict_df = pd.DataFrame.from_dict(catchment_hydroid_dict, orient='index') # convert dict to dataframe
     catchment_hydroid_dict_df.columns = ['HydroID']
     merge_df = catchment_hydroid_dict_df.merge(catchment_min_dict_df, left_index=True, right_index=True)
     merge_df.index.name = 'pixelcatch_id'
-    merge_df.to_csv(hand_ref_elev_fileName,index=True) # export dataframe to csv file
 
-    # Merge the HAND reference elvation by HydroID dataframe with the demDerived_reaches layer (add new layer attribute)
-    merge_df = merge_df.groupby(['HydroID']).median() # median value of all Median_Thal_Elev_m for pixel catchments in each HydroID reach
+    # Merge the HAND reference elevation by HydroID dataframe with the demDerived_reaches layer (add new layer attribute)
+    min_by_hydroid = merge_df.groupby(['HydroID']).min() # min value of all med_thal_elev for pixel catchments in each HydroID reach
+    min_by_hydroid.columns = ['min_thal_elev']
+    med_by_hydroid = merge_df.groupby(['HydroID']).median() # median value of all med_thal_elev for pixel catchments in each HydroID reach
+    med_by_hydroid.columns = ['med_thal_elev']
+    max_by_hydroid = merge_df.groupby(['HydroID']).max() # max value of all med_thal_elev for pixel catchments in each HydroID reach
+    max_by_hydroid.columns = ['max_thal_elev']
     input_reaches = gpd.read_file(dem_reaches_filename)
-    input_reaches = input_reaches.merge(merge_df, on='HydroID') # merge dataframes by HydroID variable
+    input_reaches = input_reaches.merge(min_by_hydroid, on='HydroID') # merge dataframes by HydroID variable
+    input_reaches = input_reaches.merge(med_by_hydroid, on='HydroID') # merge dataframes by HydroID variable
+    input_reaches = input_reaches.merge(max_by_hydroid, on='HydroID') # merge dataframes by HydroID variable
     input_reaches.to_file(dem_reaches_filename,driver=getDriver(dem_reaches_filename),index=False)
     # ------------------------------------------------------------------------------------------------------------------------ #
 
@@ -171,7 +174,6 @@ def calculate_rem(flat_dem,catchmentMinDict,flat_catchments,ndv):
     parser.add_argument('-t','--thalweg-raster',help='A binary raster representing the thalweg. 1 for thalweg, 0 for non-thalweg.',required=True)
     parser.add_argument('-o','--rem',help='Output REM raster',required=True)
     parser.add_argument('-i','--hydroid', help='HydroID raster to use within project path', required=True)
-    parser.add_argument('-r','--hand_ref_elev_table',help='Output table of HAND reference elev by catchment',required=True)
     parser.add_argument('-s','--dem_reaches_in_out',help='DEM derived reach layer to join HAND reference elevation attribute',required=True)
 
 
@@ -184,7 +186,6 @@ def calculate_rem(flat_dem,catchmentMinDict,flat_catchments,ndv):
     rem_fileName = args['rem']
     thalweg_raster = args['thalweg_raster']
     hydroid_fileName = args['hydroid']
-    hand_ref_elev_fileName = args['hand_ref_elev_table']
     dem_reaches_filename = args['dem_reaches_in_out']
 
-    rel_dem(dem_fileName, pixel_watersheds_fileName, rem_fileName, thalweg_raster, hydroid_fileName, hand_ref_elev_fileName, dem_reaches_filename)
+    rel_dem(dem_fileName, pixel_watersheds_fileName, rem_fileName, thalweg_raster, hydroid_fileName, dem_reaches_filename)
diff --git a/src/run_by_unit.sh b/src/run_by_unit.sh
@@ -321,7 +321,7 @@ echo -e $startDiv"D8 REM $hucNumber"$stopDiv
 date -u
 Tstart
 [ ! -f $outputHucDataDir/rem.tif ] && \
-$srcDir/rem.py -d $dem_thalwegCond -w $outputHucDataDir/gw_catchments_pixels.tif -o $outputHucDataDir/rem.tif -t $demDerived_streamPixels -i $outputHucDataDir/gw_catchments_reaches.tif -r $outputHucDataDir/hand_ref_elev_table.csv -s $outputHucDataDir/demDerived_reaches_split.gpkg
+$srcDir/rem.py -d $dem_thalwegCond -w $outputHucDataDir/gw_catchments_pixels.tif -o $outputHucDataDir/rem.tif -t $demDerived_streamPixels -i $outputHucDataDir/gw_catchments_reaches.tif -s $outputHucDataDir/demDerived_reaches_split.gpkg
 Tcount
 
 ## DINF DISTANCE DOWN ##
@@ -432,6 +432,14 @@ Tstart
 $srcDir/add_crosswalk.py -d $outputHucDataDir/gw_catchments_reaches_filtered_addedAttributes.gpkg -a $outputHucDataDir/demDerived_reaches_split_filtered.gpkg -s $outputHucDataDir/src_base.csv -l $outputHucDataDir/gw_catchments_reaches_filtered_addedAttributes_crosswalked.gpkg -f $outputHucDataDir/demDerived_reaches_split_filtered_addedAttributes_crosswalked.gpkg -r $outputHucDataDir/src_full_crosswalked.csv -j $outputHucDataDir/src.json -x $outputHucDataDir/crosswalk_table.csv -t $outputHucDataDir/hydroTable.csv -w $outputHucDataDir/wbd8_clp.gpkg -b $outputHucDataDir/nwm_subset_streams.gpkg -y $outputHucDataDir/nwm_catchments_proj_subset.tif -m $manning_n -z $input_NWM_Catchments -p $extent -k $outputHucDataDir/small_segments.csv
 Tcount
 
+
+## USGS CROSSWALK ##
+echo -e $startDiv"USGS Crosswalk $hucNumber"$stopDiv
+date -u
+Tstart
+$srcDir/usgs_gage_crosswalk.py -gages $inputDataDir/usgs_gages/usgs_gages.gpkg -dem $outputHucDataDir/dem_meters.tif -flows $outputHucDataDir/demDerived_reaches_split_filtered_addedAttributes_crosswalked.gpkg -cat $outputHucDataDir/gw_catchments_reaches_filtered_addedAttributes_crosswalked.gpkg -wbd $outputHucDataDir/wbd_buffered.gpkg -dem_adj $dem_thalwegCond -outtable $outputHucDataDir/usgs_elev_table.csv
+Tcount
+
 ## CLEANUP OUTPUTS ##
 echo -e $startDiv"Cleaning up outputs $hucNumber"$stopDiv
 args=()

diff --git a/src/usgs_gage_crosswalk.py b/src/usgs_gage_crosswalk.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+
+import os
+import geopandas as gpd
+import pandas as pd
+import numpy as np
+import rasterio
+import argparse
+import pygeos
+from shapely.wkb import dumps, loads
+import warnings
+warnings.simplefilter(action='ignore', category=FutureWarning)
+
+''' Get elevation at adjusted USGS gages locations
+
+    Parameters
+    ----------
+    usgs_gages_filename : str
+        File name of USGS stations layer.
+    dem_filename : str
+        File name of original DEM.
+    input_flows_filename : str
+        File name of FIM streams layer.
+    input_catchment_filename : str
+        File name of FIM catchment layer.
+    wbd_buffer_filename : str
+        File name of buffered wbd.
+    dem_adj_filename : str
+        File name of thalweg adjusted DEM.
+    output_table_filename : str
+        File name of output table.
+'''
+
+
+def crosswalk_usgs_gage(usgs_gages_filename,dem_filename,input_flows_filename,input_catchment_filename,wbd_buffer_filename,dem_adj_filename,output_table_filename):
+
+    wbd_buffer = gpd.read_file(wbd_buffer_filename)
+    usgs_gages = gpd.read_file(usgs_gages_filename, mask=wbd_buffer)
+    dem_m = rasterio.open(dem_filename,'r')
+    input_flows = gpd.read_file(input_flows_filename)
+    input_catchment = gpd.read_file(input_catchment_filename)
+    dem_adj = rasterio.open(dem_adj_filename,'r')
+
+    if input_flows.HydroID.dtype != 'int': input_flows.HydroID = input_flows.HydroID.astype(int)
+
+    # Identify closest HydroID
+    closest_catchment = gpd.sjoin(usgs_gages, input_catchment, how='left', op='within').reset_index(drop=True)
+    closest_hydro_id = closest_catchment.filter(items=['site_no','HydroID','min_thal_elev','med_thal_elev','max_thal_elev', 'order_'])
+    closest_hydro_id = closest_hydro_id.dropna()
+
+    # Get USGS gages that are within catchment boundaries
+    usgs_gages = usgs_gages.loc[usgs_gages.site_no.isin(list(closest_hydro_id.site_no))]
+
+    columns = ['location_id','HydroID','dem_elevation','dem_adj_elevation','min_thal_elev', 'med_thal_elev','max_thal_elev','str_order']
+    gage_data = []
+
+    # Move USGS gage to stream
+    for index, gage in usgs_gages.iterrows():
+
+        # Get stream attributes
+        hydro_id = closest_hydro_id.loc[closest_hydro_id.site_no==gage.site_no].HydroID.item()
+        str_order = str(int(closest_hydro_id.loc[closest_hydro_id.site_no==gage.site_no].order_.item()))
+        min_thal_elev = round(closest_hydro_id.loc[closest_hydro_id.site_no==gage.site_no].min_thal_elev.item(),2)
+        med_thal_elev = round(closest_hydro_id.loc[closest_hydro_id.site_no==gage.site_no].med_thal_elev.item(),2)
+        max_thal_elev = round(closest_hydro_id.loc[closest_hydro_id.site_no==gage.site_no].max_thal_elev.item(),2)
+
+        # Convert headwater point geometries to WKB representation
+        wkb_gages = dumps(gage.geometry)
+
+        # Create pygeos headwater point geometries from WKB representation
+        gage_bin_geom = pygeos.io.from_wkb(wkb_gages)
+
+        # Closest segment to headwater
+        closest_stream = input_flows.loc[input_flows.HydroID==hydro_id]
+        wkb_closest_stream = dumps(closest_stream.geometry.item())
+        stream_bin_geom = pygeos.io.from_wkb(wkb_closest_stream)
+
+        # Linear reference headwater to closest stream segment
+        gage_distance_to_line = pygeos.linear.line_locate_point(stream_bin_geom, gage_bin_geom)
+        referenced_gage = pygeos.linear.line_interpolate_point(stream_bin_geom, gage_distance_to_line)
+
+        # Convert geometries to wkb representation
+        bin_referenced_gage = pygeos.io.to_wkb(referenced_gage)
+
+        # Convert to shapely geometries
+        shply_referenced_gage = loads(bin_referenced_gage)
+
+        # Sample rasters at adjusted gage
+        dem_m_elev = round(list(rasterio.sample.sample_gen(dem_m,shply_referenced_gage.coords))[0].item(),2)
+        dem_adj_elev = round(list(rasterio.sample.sample_gen(dem_adj,shply_referenced_gage.coords))[0].item(),2)
+
+        # Append dem_m_elev, dem_adj_elev, hydro_id, and gage number to table
+        site_elevations = [str(gage.site_no), str(hydro_id), dem_m_elev, dem_adj_elev, min_thal_elev, med_thal_elev, max_thal_elev,str(str_order)]
+        gage_data.append(site_elevations)
+
+
+    elev_table = pd.DataFrame(gage_data, columns=columns)
+
+    if not elev_table.empty:
+        elev_table.to_csv(output_table_filename,index=False)
+
+
+if __name__ == '__main__':
+
+    parser = argparse.ArgumentParser(description='Crosswalk USGS sites to HydroID and get elevations')
+    parser.add_argument('-gages','--usgs-gages-filename', help='USGS gages', required=True)
+    parser.add_argument('-dem','--dem-filename',help='DEM',required=True)
+    parser.add_argument('-flows','--input-flows-filename', help='DEM derived streams', required=True)
+    parser.add_argument('-cat','--input-catchment-filename', help='DEM derived catchments', required=True)
+    parser.add_argument('-wbd','--wbd-buffer-filename', help='WBD buffer', required=True)
+    parser.add_argument('-dem_adj','--dem-adj-filename', help='Thalweg adjusted DEM', required=True)
+    parser.add_argument('-outtable','--output-table-filename', help='Table to append data', required=True)
+
+    args = vars(parser.parse_args())
+
+    usgs_gages_filename = args['usgs_gages_filename']
+    dem_filename = args['dem_filename']
+    input_flows_filename = args['input_flows_filename']
+    input_catchment_filename = args['input_catchment_filename']
+    wbd_buffer_filename = args['wbd_buffer_filename']
+    dem_adj_filename = args['dem_adj_filename']
+    output_table_filename = args['output_table_filename']
+
+    crosswalk_usgs_gage(usgs_gages_filename,dem_filename,input_flows_filename,input_catchment_filename,wbd_buffer_filename, dem_adj_filename,output_table_filename)