# TEEHR Evaluation Example 3  
## Hourly NWM Retrospective 3.0, CAMELS Subset (648)

### 1. Get the data from S3
For the sake of time, we prepared the individual datasets in advance and are simply copying to your 2i2c home directory. After running the cell below to copy the example_2 data.

In [33]:
!rm -rf ~/teehr/example-3/*
!aws s3 cp --recursive --no-sign-request s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-3 ~/teehr/example-3

download: s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-3/attributes/usgs_basin_attr_aridity.camels.parquet to ../../home/jovyan/teehr/example-3/attributes/usgs_basin_attr_aridity.camels.parquet
download: s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-3/attributes/usgs_basin_attr_dom_land_cover.camels.parquet to ../../home/jovyan/teehr/example-3/attributes/usgs_basin_attr_dom_land_cover.camels.parquet
download: s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-3/attributes/usgs_basin_attr_dom_land_cover_frac.camels.parquet to ../../home/jovyan/teehr/example-3/attributes/usgs_basin_attr_dom_land_cover_frac.camels.parquet
download: s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-3/attributes/usgs_basin_attr_forest_frac.camels.parquet to ../../home/jovyan/teehr/example-3/attributes/usgs_basin_attr_forest_frac.camels.parquet
download: s3://ciroh-rti-public-data/teehr-workshop-de

In [34]:
!tree ~/teehr/example-3/

[01;34m/home/jovyan/teehr/example-3/[0m
├── [01;34mattributes[0m
│   ├── [00musgs_basin_attr_aridity.camels.parquet[0m
│   ├── [00musgs_basin_attr_dom_land_cover.camels.parquet[0m
│   ├── [00musgs_basin_attr_dom_land_cover_frac.camels.parquet[0m
│   ├── [00musgs_basin_attr_drainage_area.all.parquet[0m
│   ├── [00musgs_basin_attr_elev_mean.camels.parquet[0m
│   ├── [00musgs_basin_attr_forest_frac.camels.parquet[0m
│   ├── [00musgs_basin_attr_frac_snow.camels.parquet[0m
│   ├── [00musgs_basin_attr_frac_urban.conus.parquet[0m
│   ├── [00musgs_basin_attr_high_prec_freq.camels.parquet[0m
│   ├── [00musgs_basin_attr_p_mean.camels.parquet[0m
│   ├── [00musgs_basin_attr_p_seasonality.camels.parquet[0m
│   ├── [00musgs_basin_attr_pet_mean.camels.parquet[0m
│   ├── [00musgs_basin_attr_slope_mean.camels.parquet[0m
│   ├── [00musgs_basin_attr_soil_porosity.camels.parquet[0m
│   ├── [00musgs_point_attr_baseflow_index.camels.parquet[0m
│   ├── [00musgs_point_attr_d

### Evaluate Model Output
This notebook we will demonstrate how to use TEEHR to calculate metrics from a previously created joined TEEHR database containing hourly NWM3.0 Retrospective simulations and USGS observations from 1981-2022, using a range of different options for grouping and filtering.  We will then create some common graphics based on the results (the same as Example 2)


#### In this notebook we will perform the following steps:
<ol>
    <li> Review the contents of our joined parquet file </li>
    <li> Calculate metrics with different group_by options </li>
    <li> Calculate metrics with different filters options </li>
    <li> Example visualizations of TEEHR results</li> 
</ol>

#### First setup the TEEHR class and review the contents of the joined parquet file

In [35]:
from teehr.classes.duckdb_joined_parquet import DuckDBJoinedParquet
from pathlib import Path

# Define the paths to the joined parquet file and the geometry files
TEEHR_BASE = Path(Path.home(), 'teehr/example-3')
JOINED_PARQUET_FILEPATH = f"{TEEHR_BASE}/joined/configuration=nwm30_retro/variable_name=streamflow_hourly_inst/*.parquet"
GEOMETRY_FILEPATH = f"{TEEHR_BASE}/geometry/**/*.parquet"

# Initialize a teehr joined parquet class with our parquet file and geometry
joined_data = DuckDBJoinedParquet(
    joined_parquet_filepath = JOINED_PARQUET_FILEPATH,
    geometry_filepath = GEOMETRY_FILEPATH
)

### 1. Review the contents of the joined parquet files

In practice, you may want to review the fields of data in the parquet file to plan your evaluation strategy.  If the dataset is large, reading it into a dataframe may be cumbersome or even infeasible in some cases. TEEHR provides the ```get_joined_timeseries_schema``` method to quickly review the fields of the joined parquet file and the ```get_unique_field_values``` method to review the unique values contained in a specified field.  The latter is particularly helpful for building dashboards for evaluation (e.g., to populate a drop down menu of possible filter or group_by values).

In [36]:
# Remind ourselves what fields were included
joined_data.get_joined_timeseries_schema()

Unnamed: 0,column_name,column_type,null,key,default,extra
0,reference_time,TIMESTAMP,YES,,,
1,value_time,TIMESTAMP,YES,,,
2,secondary_location_id,VARCHAR,YES,,,
3,secondary_value,FLOAT,YES,,,
4,configuration,VARCHAR,YES,,,
5,measurement_unit,VARCHAR,YES,,,
6,variable_name,VARCHAR,YES,,,
7,primary_value,FLOAT,YES,,,
8,primary_location_id,VARCHAR,YES,,,
9,aridity_none,VARCHAR,YES,,,


In [37]:
# Review what configuration datasets were included
joined_data.get_unique_field_values('configuration')

Unnamed: 0,unique_configuration_values
0,nwm30_retro


In [38]:
# ...number of locations
len(joined_data.get_unique_field_values('primary_location_id'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

647

### 2. Calculate metrics 

In [39]:
%%time

gdf_all = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'relative_bias',
        'pearson_correlation',                  
        'nash_sutcliffe_efficiency_normalized',  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
    include_geometry=True,
)
# view the dataframe
gdf_all

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

CPU times: user 1min 11s, sys: 475 ms, total: 1min 11s
Wall time: 18.5 s


Unnamed: 0,primary_location_id,configuration,kling_gupta_efficiency_mod2,relative_bias,pearson_correlation,nash_sutcliffe_efficiency_normalized,mean_absolute_relative_error,primary_count,geometry
0,usgs-01013500,nwm30_retro,0.391628,-0.041872,0.802823,0.667056,0.499498,195876,POINT (-68.58278 47.23750)
1,usgs-01022500,nwm30_retro,0.680773,-0.177140,0.838066,0.754479,0.341755,223852,POINT (-67.93528 44.60806)
2,usgs-01030500,nwm30_retro,0.709208,-0.203271,0.829589,0.746575,0.402818,194840,POINT (-68.30583 45.50111)
3,usgs-01031500,nwm30_retro,0.768625,-0.041667,0.783091,0.678686,0.491308,195481,POINT (-69.31472 45.17500)
4,usgs-01047000,nwm30_retro,0.679907,0.070001,0.744362,0.608151,0.507747,195651,POINT (-69.95500 44.86917)
...,...,...,...,...,...,...,...,...,...
642,usgs-14309500,nwm30_retro,0.831301,-0.008467,0.834059,0.744712,0.464768,280477,POINT (-123.61091 42.80400)
643,usgs-14316700,nwm30_retro,0.797103,-0.014002,0.836359,0.724203,0.431458,284668,POINT (-122.72894 43.34984)
644,usgs-14325000,nwm30_retro,0.734790,0.009141,0.849341,0.706860,0.501058,261938,POINT (-124.07065 42.89150)
645,usgs-14362250,nwm30_retro,0.507605,-0.084728,0.509288,0.497379,0.959275,271371,POINT (-123.07532 42.15401)


In [40]:
# use pandas magic to create a nice summary table of the metrics by model configuration across locations
gdf_all.groupby('configuration').describe(percentiles=[.5]).unstack(1).reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','configuration'], values=0, columns='summary')

Unnamed: 0_level_0,summary,50%,count,max,mean,min,std
metric,configuration,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
kling_gupta_efficiency_mod2,nwm30_retro,0.675088,647.0,0.915274,0.574126,-2.84415,0.368454
mean_absolute_relative_error,nwm30_retro,0.535176,647.0,4.50555,0.615383,0.246365,0.353796
nash_sutcliffe_efficiency_normalized,nwm30_retro,0.667654,647.0,0.875057,0.637685,0.047997,0.13691
pearson_correlation,nwm30_retro,0.765011,647.0,0.940297,0.721231,0.056232,0.1464
primary_count,nwm30_retro,251039.0,647.0,335997.0,243504.554869,113410.0,44774.729843
relative_bias,nwm30_retro,-0.049238,647.0,3.881756,-0.020543,-0.879751,0.378886


In [41]:
%%time

'''
Calculate metrics separately for low flows and high flows based on the 
calculated field "obs_flow_category_q_mean" -> add the field to the group_by list.  
'''

gdf_flowcat = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration", "obs_flow_category_q_mean"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'pearson_correlation',                  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
)
display(gdf_flowcat)

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,primary_location_id,configuration,obs_flow_category_q_mean,kling_gupta_efficiency_mod2,pearson_correlation,mean_absolute_relative_error,primary_count
0,usgs-01013500,nwm30_retro,high,0.148342,0.660219,0.379313,76712
1,usgs-01013500,nwm30_retro,low,-0.566669,0.677014,0.901567,119164
2,usgs-01022500,nwm30_retro,high,0.481649,0.701277,0.342938,82032
3,usgs-01022500,nwm30_retro,low,0.685788,0.747815,0.337945,141820
4,usgs-01030500,nwm30_retro,high,0.481961,0.707217,0.385183,75022
...,...,...,...,...,...,...,...
1289,usgs-14325000,nwm30_retro,high,0.618714,0.803206,0.455834,76474
1290,usgs-14362250,nwm30_retro,high,0.378903,0.426588,0.853640,52324
1291,usgs-14362250,nwm30_retro,low,-0.534671,0.397176,1.320719,219047
1292,usgs-14400000,nwm30_retro,low,0.502249,0.608719,0.519422,190160


CPU times: user 33.9 s, sys: 348 ms, total: 34.2 s
Wall time: 8.72 s


In [42]:
gdf_flowcat.groupby(['obs_flow_category_q_mean']).describe(percentiles=[.5]).unstack().reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','obs_flow_category_q_mean'], values=0, columns='summary')

Unnamed: 0_level_0,summary,50%,count,max,mean,min,std
metric,obs_flow_category_q_mean,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
kling_gupta_efficiency_mod2,high,0.580969,647.0,0.907899,0.45929,-4.363031,0.442827
kling_gupta_efficiency_mod2,low,-0.1037,647.0,0.853502,-1.606504,-151.82713,8.014236
mean_absolute_relative_error,high,0.494366,647.0,3.175267,0.555114,0.225015,0.264109
mean_absolute_relative_error,low,0.654675,647.0,192.423554,1.285105,0.209537,7.658334
pearson_correlation,high,0.698067,647.0,0.913775,0.650666,-0.020598,0.159303
pearson_correlation,low,0.543568,647.0,0.876188,0.515055,0.017918,0.177805
primary_count,high,64873.0,647.0,156080.0,63031.854714,1989.0,24644.963086
primary_count,low,182760.0,647.0,272679.0,180472.700155,67836.0,40173.639835


In [43]:
%%time
'''
Now add the location characteristics you want included in the metrics table
(for output tables and visualization)

To include location-specific attributes in the metrics table, those attributes 
must be added to the group_by list.  If grouping across locations (.e.g., all locations 
within an RFC region), you should only add attributes that area already aggregated by that 
same region (TEEHR does not check for this). An example of including location characteristic 
attributes is included below. </li>

'''
# list the attributes that are location characteristics that you want to include 
# in the metrics results tables
# in the metrics results tables
include_location_characteristics = [
    'aridity_none',
    'runoff_ratio_none',
    'baseflow_index_none',
    'stream_order_none',  
    'q_mean_cms',
    'slope_fdc_none',  
    'frac_urban_none',
    'frac_snow_none',
    'forest_frac_none',
    'ecoregion_L2_none',
    'river_forecast_center_none',
]
df_atts = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration"] + include_location_characteristics,
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'pearson_correlation',                  
        'mean_absolute_relative_error',
        'relative_bias',
        'primary_count' 
    ],
    include_geometry=False,
)

# view the dataframe
display(df_atts)

# summarize just the median results across locations by attribute (river forecast center)
df_atts_summary = df_atts.groupby(['configuration','river_forecast_center_none'])\
    .describe(percentiles=[.5]).unstack().unstack().reset_index()\
    .rename(columns={'level_0':'metric','level_1':'summary'})
df_atts_summary[df_atts_summary['summary'].isin(['50%'])].pivot(
    index=['river_forecast_center_none','configuration'],values=0, columns=['metric','summary'])

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,primary_location_id,configuration,aridity_none,runoff_ratio_none,baseflow_index_none,stream_order_none,q_mean_cms,slope_fdc_none,frac_urban_none,frac_snow_none,forest_frac_none,ecoregion_L2_none,river_forecast_center_none,kling_gupta_efficiency_mod2,pearson_correlation,mean_absolute_relative_error,relative_bias,primary_count
0,usgs-01013500,nwm30_retro,0.63055865946247,0.543437466590222,0.585225955779508,5,44.467109455834866,1.52821853538976,0.01,0.313440357191799,0.9063,8.1 MIXED WOOD PLAINS,NERFC,0.391628,0.802823,0.499498,-0.041872,195876
1,usgs-01022500,nwm30_retro,0.587356423405076,0.602268929482991,0.554478447930409,5,14.786380055715862,1.77627980351081,0.0092,0.245259009248271,0.9232,8.1 MIXED WOOD PLAINS,NERFC,0.680773,0.838066,0.341755,-0.177140,223852
2,usgs-01030500,nwm30_retro,0.624111385131731,0.555858982560286,0.508440712580478,5,77.36721025688733,1.87111040605632,0.0067,0.27701840295357,0.8782,8.1 MIXED WOOD PLAINS,NERFC,0.709208,0.829589,0.402818,-0.203271,194840
3,usgs-01031500,nwm30_retro,0.587950340389816,0.576289279160751,0.445090526012048,5,18.13589110971241,1.49401920331253,0.0123,0.291836473001958,0.9548,8.1 MIXED WOOD PLAINS,NERFC,0.768625,0.783091,0.491308,-0.041667,195481
4,usgs-01047000,nwm30_retro,0.628929335570973,0.65686843497944,0.473464930849723,4,23.09950986229729,1.41593871068099,0.012199999999999999,0.280118126940736,0.9906,8.1 MIXED WOOD PLAINS,NERFC,0.679907,0.744362,0.507747,0.070001,195651
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
642,usgs-14309500,nwm30_retro,0.627228059860258,0.493734021695294,0.459454969190781,3,6.40310450370619,1.68988956221131,0.0025,0.0612553691709827,1.0,6.2 WESTERN CORDILLERA,NWRFC,0.831301,0.834059,0.464768,-0.008467,280477
643,usgs-14316700,nwm30_retro,0.501305088892464,0.643997143591881,0.508616082222394,5,19.909239533259516,2.23102329485064,0.0,0.176336580742005,1.0,6.2 WESTERN CORDILLERA,NWRFC,0.797103,0.836359,0.431458,-0.014002,284668
644,usgs-14325000,nwm30_retro,0.38660991574857,0.64665721307564,0.480768555174975,4,20.630074080228493,2.24613455803779,0.0040999999999999995,0.0302033920558714,1.0,7.1 MARINE WEST COAST FOREST,NWRFC,0.734790,0.849341,0.501058,0.009141,261938
645,usgs-14362250,nwm30_retro,1.19539041561722,0.119357982098672,0.518407661642124,2,0.15909060816317996,1.18603999268406,0.0018,0.141500009350329,1.0,6.2 WESTERN CORDILLERA,NWRFC,0.507605,0.509288,0.959275,-0.084728,271371


CPU times: user 1min 39s, sys: 612 ms, total: 1min 39s
Wall time: 25.3 s


Unnamed: 0_level_0,metric,kling_gupta_efficiency_mod2,pearson_correlation,mean_absolute_relative_error,relative_bias,primary_count
Unnamed: 0_level_1,summary,50%,50%,50%,50%,50%
river_forecast_center_none,configuration,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
ABRFC,nwm30_retro,0.646766,0.743724,0.648113,-0.090962,252850.0
CBRFC,nwm30_retro,0.50479,0.702087,0.749092,0.15187,243805.5
CNRFC,nwm30_retro,0.600172,0.728902,0.623043,0.082032,279315.0
LMRFC,nwm30_retro,0.761364,0.817299,0.460484,-0.041267,269302.0
MARFC,nwm30_retro,0.667934,0.760434,0.447674,-0.053741,248907.0
MBRFC,nwm30_retro,0.55278,0.723873,0.692621,-0.267085,207622.0
NCRFC,nwm30_retro,0.632238,0.730732,0.579652,-0.164602,224216.5
NERFC,nwm30_retro,0.657678,0.755718,0.507559,0.04648,226378.0
NWRFC,nwm30_retro,0.796064,0.846393,0.371724,0.006148,280477.0
OHRFC,nwm30_retro,0.73176,0.770886,0.495271,-0.048039,248212.0


In [44]:
%%time

import geoviews as gv
import holoviews as hv
import colorcet as cc
hv.extension('bokeh', logo=False)
gv.extension('bokeh', logo=False)
basemap = hv.element.tiles.CartoLight()

gdf_filters = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration", "stream_order_none"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'relative_bias',
        'pearson_correlation',                  
        'nash_sutcliffe_efficiency_normalized',  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
    filters = [
          {
              "column": "stream_order_none",
              "operator": "in",
              "value": ['1','2','3','4']
              #"value": ['5','6','7','8']
          },
         # {
         #     "column": "month",
         #     "operator": "in",
         #     "value": ['5','6','7','8','9']
         # },
         # {
         #     "column": "river_forecast_center_none",
         #     "operator": "=",
         #     "value": "SERFC"
         # },
    ],
    include_geometry=True,
)
#display(gdf_filters.head())

# make a quick map of locations - see how it changes as you make different filter selections
basemap * gv.Points(gdf_filters, vdims=['kling_gupta_efficiency_mod2','configuration']).select(
    configuration='nwm30_retro').opts(
    color='kling_gupta_efficiency_mod2', 
    height=400, width=600, size=7, 
    cmap=cc.rainbow[::-1], colorbar=True, clim=(0,1))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

CPU times: user 1min 11s, sys: 552 ms, total: 1min 12s
Wall time: 18.4 s


### 4. More visualizations

In [45]:
# set up color and abbrevation settings to use across multiple plots

metric_abbrev=dict(
    kling_gupta_efficiency_mod2 = "KGE''",
    mean_absolute_relative_error = "MAE",
    pearson_correlation = "Corr",
    relative_bias  = "Rel.Bias",
    nash_sutcliffe_efficiency_normalized = "NNSE",
)
cmap_lin = cc.rainbow[::-1]
cmap_div = cc.CET_D1A[::-1]
metric_colors=dict(
    kling_gupta_efficiency_mod2          = {'cmap': cmap_lin, 'clim': (0,1)},  
    relative_bias                        = {'cmap': cmap_div, 'clim': (-1,1)},   
    pearson_correlation                  = {'cmap': cmap_lin, 'clim': (0,1)},     
    nash_sutcliffe_efficiency_normalized = {'cmap': cmap_lin, 'clim': (0,1)}, 
    mean_absolute_relative_error         = {'cmap': cmap_lin, 'clim': (0,2)},
)
metrics = list(metric_colors.keys())
configs = ['nwm30_retro']

#### 4a. Side by side metric maps
First we will create side-by-side maps of the first query results above (all locations and configurations, no filters), showing metric values at each location, where dots are colored by metric value and sized by sample size.  See how the comparison changes for each metric.

In [51]:
# map_metric = 'kling_gupta_efficiency_mod2'
# map_metric = 'pearson_correlation'                  
# map_metric = 'nash_sutcliffe_efficiency_normalized'
# map_metric = 'mean_absolute_relative_error' 
map_metric = 'relative_bias'

# factor to size dots based on sample size 
size_factor = 15/max(gdf_filters[('primary_count')])

polys = gv.Points(
    gdf_all, 
    vdims = metrics + ['primary_location_id','configuration','primary_count'],
    label = 'metric value (color), sample size (size)',
).opts(
    height = 400,
    width = 600,
    line_color = 'gray',
    colorbar = True,
    size = hv.dim('primary_count') * 15/max(gdf_filters[('primary_count')]),
    tools = ['hover'],
    xaxis = 'bare',
    yaxis = 'bare',
    show_legend = True
)
maps = []
config = configs[0]
for map_metric in ['kling_gupta_efficiency_mod2','relative_bias']:
    maps.append(basemap * polys.select(configuration=config).opts(
            title=f"{config} | {metric_abbrev[map_metric]}",
            color = map_metric,
            clim = metric_colors[map_metric]['clim'],
            cmap = metric_colors[map_metric]['cmap']
        )
    )
maps[0] + maps[1]

#### 4b. Dataframe table and bar chart side by side
Next we will summarize results across locations by creating a summary table with pandas (as we did above) and juxtapose it with a bar chart using holoviews and panel.

In [52]:
# Display dataframes and simple plots side by side using Panel
import panel as pn

gdf_summary = gdf_all.groupby('configuration').describe(percentiles=[.5]).unstack(1).reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','configuration'], values=0, columns='summary')

gdf_bars = gdf_summary.drop('primary_count', axis=0)['50%'].reset_index().replace({'metric':metric_abbrev})
bars = hv.Bars(gdf_bars, kdims=['metric', 'configuration']).opts(
    xrotation=90, height=400, width=300, ylabel='median',xlabel='')

pn.Row(pn.pane.DataFrame(gdf_summary, width=800), bars)

#### 4c. Box-whisker plots of results by metric and model

Next we'll create box-whisker plots to see the distribution of metrics across locations for each metric and configuration.

In [53]:
# remove geometry so holoviews knows this is not a map.
df = gdf_all.drop('geometry', axis=1)

opts = dict(
    show_legend=False, 
    width=100, 
    cmap='Set1', 
    xrotation=45,
    labelled=[]
)
boxplots = []
for metric in metrics:
    boxplots.append(
        hv.BoxWhisker(df, 'configuration', metric, label=metric_abbrev[metric]).opts(
            **opts,
            box_fill_color=hv.dim('configuration')
        )
    )
hv.Layout(boxplots).cols(len(metrics))

#### 4d. Histograms by metric and model
Every good scientist loves a histogram.  The below example creates a layout of histograms by configuration and metric, which gives us a more complete understanding of the metric distributions.

In [54]:
import hvplot.pandas
histograms =[]
for metric in metrics:
    histograms.append(
        df[df['configuration']==config].hvplot.hist(
            y=metric, 
            ylim=(0,200),
            bin_range=metric_colors[metric]['clim'], 
            xlabel=metric_abbrev[metric],
        ).opts(height = 200, width=250, title = config)
    )
hv.Layout(histograms).cols(len(metrics))

#### 4e. CDFs overlays by metric
Every good scientist loves a CDF even more.  The below example creates a layout of histograms by configuration and metric, which gives us a more complete understanding of the metric distributions.  We include metrics here with (mostly) the same range (0,1) and 'good' value (1).  

Not interesting with 1 model scenario

In [55]:
import numpy as np

layout = []
for metric in [
    'kling_gupta_efficiency_mod2',
    'pearson_correlation',                  
    'nash_sutcliffe_efficiency_normalized',
]:
    xlim = metric_colors[metric]['clim']
    xlabel = metric_abbrev[metric]
    
    cdfs = hv.Curve([])
    for config in ['nwm30_retro']:
        data = df[df['configuration']==config]
        data[xlabel] = np.sort(data[metric])
        n = len(data[xlabel])
        data['y'] = 1. * np.arange(n) / (n - 1)    
        cdfs = cdfs * hv.Curve(data, xlabel, 'y', label=config)
        
    layout.append(
        cdfs.opts(
            width = 300,
            legend_position='top_left',
            xlim=xlim, 
            xlabel=xlabel,
            title=metric_abbrev[metric],
            shared_axes=False,
        )
    )
    
hv.Layout(layout).cols(5)

#### 4f. Bar charts by attribute
In the third example query above, we demonstrate how to add attributes to the resulting dataframe for summary and visualization purposes.  In that example we generated a summary table to RFC region.  The below example uses those result to build bar charts of the median performance metric across locations within each RFC region.

In [56]:
df_bars = df_atts_summary.set_index('metric').drop('primary_count', axis=0).reset_index().set_index('summary').loc['50%']
df_bars = df_bars.replace({'metric': metric_abbrev}) \
    .rename(columns={'river_forecast_center_none':'rfc',0:'median'}) \
    .reset_index().drop('summary', axis=1)
df_bars.loc[df_bars['metric'] == 'MAE', 'median'] = 1 - df_bars.loc[df_bars['metric'] == 'MAE', 'median']
df_bars = df_bars.replace('MAE','1-MAE')

bars = hv.Bars(df_bars, kdims=['metric','configuration','rfc'], vdims=['median']).opts(
    xrotation=90, height=300, width=200, ylabel='median',xlabel='')

layout = []
for rfc in df_bars['rfc'].unique():
    layout.append(bars.select(rfc=rfc).opts(title=rfc))
hv.Layout(layout).cols(6)

#### 4g Scatter plots by attribute

Scatter plots of location metric values and location characteristics can provide insight about the relationship between the two - i.e., does model performance have a clear relationship with any of the characteristics?

First review the attributes added to the (3rd) query above to see what the options are (or run a new query to add others).  
Let's create scatter plots of 

In [64]:
# As examples, let's create scatter plots of KGE with each of the numeric attributes

import pandas as pd
import numpy as np

import geoviews as gv
import holoviews as hv
import colorcet as cc
hv.extension('bokeh', logo=False)
gv.extension('bokeh', logo=False)
basemap = hv.element.tiles.CartoLight()

location_chars = [
    'aridity',
    'runoff_ratio',
    'baseflow_index',
    'stream_order',  
    'q_mean_cms',
    'slope_fdc',  
    'frac_urban',
    'frac_snow',
    'forest_frac'
]
df_atts.columns = df_atts.columns.str.replace('_none', '')
df_atts[location_chars] = df_atts[location_chars].apply(pd.to_numeric)
df_atts['config_num'] = np.where(df_atts['configuration']=='nwm30_retro',1,2)

metrics = [
    'kling_gupta_efficiency_mod2',
    'pearson_correlation',                  
    'mean_absolute_relative_error',
    'relative_bias',
]
from bokeh.models import FixedTicker

scatter_layout = []
for char in location_chars:
    scatter_layout.append(
        hv.Scatter(
            df_atts, 
            kdims=[char],
            vdims=['kling_gupta_efficiency_mod2', 'relative_bias', 'primary_location_id','config_num'],
            label="nwm3.0"
        ).opts(
            width = 400, height = 300,
            #color = 'relative_bias',
            #color = 'config_num',
            #cmap = ['#377EB8', '#E41A1C'],
            #colorbar = True,
            clim=(0.5,2.5),
            ylabel = "KGE''",
            tools=['hover'],
            ylim=(-1,1),
            size=4,
            alpha=0.8,
            show_legend = True,
            # colorbar_opts={
            #     'ticker': FixedTicker(ticks=[1,2]),
            #     'major_label_overrides': {
            #         1: 'nwm30_retro',  
            #     },
            #     'major_label_text_align': 'left',
            # },
        ))
hv.Layout(scatter_layout).opts(show_legends = True).cols(3)        