PA2 Notebook 2: Analysis
===

This is the second notebook for DRWI Pollution Assessment Stage 2 (PA2) analysis.

It reads dataframes prepared in Notebook 1 and calculates Pollution Assessment metrics necessary for the Stage 2 Assessment.

# Installation and Setup

Carefully follow our **[Installation Instructions](README.md#get-started)**, especially including:
- Creating a virtual environment for this repository (step 3)

## Import Python Dependencies

In [80]:
from pathlib import Path
from importlib import reload

import numpy     as np
import pandas    as pd
import geopandas as gpd

In [81]:
# Confirm GeoPandas >= 0.11, for full GeoParquet support
print("Geopandas: ", gpd.__version__)

Geopandas:  0.14.2


In [82]:
# Confirm that this repo is in your Python Path
!conda-develop /Users/aaufdenkampe/Documents/Python/pollution-assessment/src

  sys.exit(main())


path exists, skipping /Users/aaufdenkampe/Documents/Python/pollution-assessment/src
completed operation for: /Users/aaufdenkampe/Documents/Python/pollution-assessment/src


In [83]:
# Custom functions for Pollution Assessment
import pollution_assessment as pa

In [84]:
pa.__version__

'0.1.0'

## Set Paths


In [85]:
# Set your project directory to your local folder for your clone of this repository
project_path = Path.cwd().parent
project_path

PosixPath('/Users/aaufdenkampe/Documents/Python/pollution-assessment')

In [86]:
# Assign a path for the geographies folder.
geography_path = project_path / 'geography/'

In [87]:
# Assign a path for the data OUTPUT folder.
data_output_path = project_path / 'stage2/data_output/'

## Open Files from Notebook 1

In [88]:
%%time
# read geometry data from GeoParquet files
reach_gdf = gpd.read_parquet(geography_path /'reach_gdf.parquet')
catch_gdf = gpd.read_parquet(geography_path /'catch_gdf.parquet')

huc12_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc12_outlets_drwi_gdf.parquet')
huc10_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc10_outlets_drwi_gdf.parquet')
huc08_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc08_outlets_drwi_gdf.parquet')


CPU times: user 799 ms, sys: 556 ms, total: 1.36 s
Wall time: 1.44 s


In [89]:
%%time
# Read WikiSRAT results data from Parquet files
reach_concs_df = pd.read_parquet(data_output_path /'reach_concs_df.parquet')
catch_loads_df = pd.read_parquet(data_output_path /'catch_loads_df.parquet')

CPU times: user 89.1 ms, sys: 52.1 ms, total: 141 ms
Wall time: 53.6 ms


## Open Files from FieldDOC
(Possibly remove and wait for Notebook 6)

In [90]:
# # Practices
# rest_gdf = gpd.read_parquet(project_path / 'stage2/private/restoration_bmps_from_FieldDoc.parquet')
# prot_gdf = gpd.read_parquet(project_path / 'stage2/private/protection_bmps_from_FieldDoc.parquet')

# # Practices by COMID
# rest_comid_gdf = gpd.read_parquet(project_path / 'stage2/private/restoration_df.parquet')
# prot_comid_gdf = gpd.read_parquet(project_path / 'stage2/private/protection_df.parquet')

# Baseline Results
Create mappable GeoDataFrames (GDF) of results for "baseline" model runs (i.e. 'No restoration or protection')

In [91]:
# Use pollutant names dictionary
pa.calc.pollutants

{'TotalN': 'tn', 'TotalP': 'tp', 'Sediment': 'tss'}

In [92]:
# Create Stream Reach Concentrations GDF
reach_concs_gdf = pa.calc.join_results(
    'reach', reach_gdf, reach_concs_df, 
    pa.calc.run_groups[0], run_type='combined', ps=False
)
# Add columns with previously used short names
for pollutant in pa.calc.pollutants.keys():
    reach_concs_gdf[f'{pa.calc.pollutants[pollutant]}_conc'] = (
        reach_concs_gdf[f'{pollutant}'])

In [93]:
reach_concs_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 35 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19494 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

In [94]:
# Create Catchment Loads GDF
catch_loads_gdf = pa.calc.join_results(
    'catch', catch_gdf, catch_loads_df, 
    pa.calc.run_groups[0], run_type='combined', ps=False
)
# Add columns with previously used short names
for pollutant in pa.calc.pollutants.keys():
    catch_loads_gdf[f'{pa.calc.pollutants[pollutant]}_load'] = (
        catch_loads_gdf[f'{pollutant}'])

# Add columns with loading rates (kg/ha/y)
pa.calc.add_loadrate(catch_loads_gdf)

catch_loads_gdf.head()

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,run_group,run_type,funding_sources,with_attenuation,tn_load,tp_load,tss_load,tn_loadrate,tp_loadrate,tss_loadrate
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,"MULTIPOLYGON (((-8301340.781 5199034.787, -830...",drb,,74914,74914,20401020302,1,...,No restoration or protection,combined,,True,12680.544786,1189.608231,1101612.0,1.951842,0.183109,169.564734
1748537,1663.1712,1664.46,11.189,"MULTIPOLYGON (((-8304909.314 5200051.727, -830...",drb,,74913,74913,20401020302,1,...,No restoration or protection,combined,,True,3771.332143,363.366436,201333.9,2.267555,0.218478,121.054237
1748539,1639.4128,1640.7,11.223,"MULTIPOLYGON (((-8315191.630 5191704.467, -831...",drb,,74921,74921,20401020305,1,...,No restoration or protection,combined,,True,3133.430355,357.22799,251402.4,1.911313,0.2179,153.349047
1748541,3013.8348,12912.3,86.528,"MULTIPOLYGON (((-8309824.403 5193427.492, -830...",drb,,74911,74915,20401020302,2,...,No restoration or protection,combined,,True,6409.514442,668.969079,617714.4,2.126697,0.221966,204.959595
1748543,1151.099,5232.87,35.389,"MULTIPOLYGON (((-8312514.529 5185023.831, -831...",drb,,74920,74922,20401020305,2,...,No restoration or protection,combined,,True,2918.236825,317.461447,198954.6,2.535174,0.27579,172.838845


In [95]:
catch_loads_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 30 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19496 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

# XS: Excess Pollution Results

```
excess pollution = total pollution 
                   – threshold pollution target
```

In [96]:
# Open dictionary of Target Values
pa.calc.targets

{'tn': {'loadrate_target': 17.07, 'conc_target': 4.73},
 'tp': {'loadrate_target': 0.31, 'conc_target': 0.09},
 'tss': {'loadrate_target': 923.8, 'conc_target': 237.3}}

In [97]:
pa.calc.add_excess('reach', reach_concs_gdf)
reach_concs_gdf.head(3)

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,run_group,run_type,funding_sources,with_attenuation,tn_conc,tp_conc,tss_conc,tn_conc_xs,tp_conc_xs,tss_conc_xs
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,MULTILINESTRING Z ((-8295323.930 5214456.622 0...,drb,,74914,74914,20401020302,1,...,No restoration or protection,combined,,True,0.324727,0.030464,28.210388,-4.405273,-0.059536,-209.089612
1748537,1663.1712,1664.46,11.189,MULTILINESTRING Z ((-8304623.226 5207684.737 0...,drb,,74913,74913,20401020302,1,...,No restoration or protection,combined,,True,0.377186,0.036342,20.136201,-4.352814,-0.053658,-217.163799
1748539,1639.4128,1640.7,11.223,MULTILINESTRING Z ((-8316446.558 5197994.113 0...,drb,,74921,74921,20401020305,1,...,No restoration or protection,combined,,True,0.312437,0.03562,25.067574,-4.417563,-0.05438,-212.232426


In [98]:
pa.calc.add_excess('catch', catch_loads_gdf)
catch_loads_gdf.head(3)

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,with_attenuation,tn_load,tp_load,tss_load,tn_loadrate,tp_loadrate,tss_loadrate,tn_loadrate_xs,tp_loadrate_xs,tss_loadrate_xs
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,"MULTIPOLYGON (((-8301340.781 5199034.787, -830...",drb,,74914,74914,20401020302,1,...,True,12680.544786,1189.608231,1101612.0,1.951842,0.183109,169.564734,-15.118158,-0.126891,-754.235266
1748537,1663.1712,1664.46,11.189,"MULTIPOLYGON (((-8304909.314 5200051.727, -830...",drb,,74913,74913,20401020302,1,...,True,3771.332143,363.366436,201333.9,2.267555,0.218478,121.054237,-14.802445,-0.091522,-802.745763
1748539,1639.4128,1640.7,11.223,"MULTIPOLYGON (((-8315191.630 5191704.467, -831...",drb,,74921,74921,20401020305,1,...,True,3133.430355,357.22799,251402.4,1.911313,0.2179,153.349047,-15.158687,-0.0921,-770.450953


# PS: Point Source Results

In [99]:
reach_concs_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 38 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19494 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

In [100]:
reload(pa.calc)

<module 'pollution_assessment.calc' from '/Users/aaufdenkampe/Documents/Python/pollution-assessment/src/pollution_assessment/calc.py'>

In [101]:
pa.calc.add_ps('reach', reach_concs_gdf, reach_concs_df, run_type='combined')
reach_concs_gdf

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,with_attenuation,tn_conc,tp_conc,tss_conc,tn_conc_xs,tp_conc_xs,tss_conc_xs,tn_conc_ps,tp_conc_ps,tss_conc_ps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,MULTILINESTRING Z ((-8295323.930 5214456.622 0...,drb,,74914,74914,020401020302,1,...,True,0.324727,0.030464,28.210388,-4.405273,-0.059536,-209.089612,0.0,0.0,0.0
1748537,1663.1712,1664.46,11.189,MULTILINESTRING Z ((-8304623.226 5207684.737 0...,drb,,74913,74913,020401020302,1,...,True,0.377186,0.036342,20.136201,-4.352814,-0.053658,-217.163799,0.0,0.0,0.0
1748539,1639.4128,1640.70,11.223,MULTILINESTRING Z ((-8316446.558 5197994.113 0...,drb,,74921,74921,020401020305,1,...,True,0.312437,0.035620,25.067574,-4.417563,-0.054380,-212.232426,0.0,0.0,0.0
1748541,3013.8348,12912.30,86.528,MULTILINESTRING Z ((-8304282.841 5198049.613 0...,drb,,74911,74915,020401020302,2,...,True,0.331619,0.032349,28.740814,-4.398381,-0.057651,-208.559186,0.0,0.0,0.0
1748543,1151.0990,5232.87,35.389,MULTILINESTRING Z ((-8312991.936 5192442.779 0...,drb,,74920,74922,020401020305,2,...,True,0.302415,0.031977,21.590414,-4.427585,-0.058023,-215.709586,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
932040366,2124.7248,2720941.47,17802.923,MULTILINESTRING Z ((-8400739.070 4831969.993 0...,drb,,65070,76964,020402060103,7,...,True,,,,,,,,,
932040367,788.7859,2717821.26,17788.281,MULTILINESTRING Z ((-8399585.343 4833380.786 0...,drb,,65079,76964,020402060103,7,...,True,,,,,,,,,
932040368,265.0275,2716120.08,17780.448,MULTILINESTRING Z ((-8398343.469 4834781.918 0...,drb,,65080,76960,020402060103,7,...,True,,,,,,,,,
932040369,1106.5294,2889095.67,18624.999,MULTILINESTRING Z ((-8406760.425 4820639.687 0...,drb,,64232,76965,020402040000,7,...,True,,,,,,,,,


In [102]:
reach_concs_gdf.run_type.value_counts()

run_type
combined    11529
single       7967
Name: count, dtype: int64

In [103]:
pa.calc.add_ps('catch', catch_loads_gdf, catch_loads_df, run_type='combined')
catch_loads_gdf.head(3)

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tn_loadrate,tp_loadrate,tss_loadrate,tn_loadrate_xs,tp_loadrate_xs,tss_loadrate_xs,tn_loadrate_ps,tp_loadrate_ps,tss_loadrate_ps,tss_loadrate_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,"MULTIPOLYGON (((-8301340.781 5199034.787, -830...",drb,,74914,74914,20401020302,1,...,1.951842,0.183109,169.564734,-15.118158,-0.126891,-754.235266,0.0,0.0,,-754.235266
1748537,1663.1712,1664.46,11.189,"MULTIPOLYGON (((-8304909.314 5200051.727, -830...",drb,,74913,74913,20401020302,1,...,2.267555,0.218478,121.054237,-14.802445,-0.091522,-802.745763,0.0,0.0,,-802.745763
1748539,1639.4128,1640.7,11.223,"MULTIPOLYGON (((-8315191.630 5191704.467, -831...",drb,,74921,74921,20401020305,1,...,1.911313,0.2179,153.349047,-15.158687,-0.0921,-770.450953,0.0,0.0,,-770.450953


# XSNPS: Excess Non-Point Source Results

```
excess nonpoint source pollution = excess pollution 
                                   – point source pollution
```

In [104]:
pa.calc.add_xsnps('reach', reach_concs_gdf, reach_concs_df, run_type='combined')
reach_concs_gdf.head(3)

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tss_conc,tn_conc_xs,tp_conc_xs,tss_conc_xs,tn_conc_ps,tp_conc_ps,tss_conc_ps,tn_conc_xsnps,tp_conc_xsnps,tss_conc_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,MULTILINESTRING Z ((-8295323.930 5214456.622 0...,drb,,74914,74914,20401020302,1,...,28.210388,-4.405273,-0.059536,-209.089612,0.0,0.0,0.0,-4.405273,-0.059536,-209.089612
1748537,1663.1712,1664.46,11.189,MULTILINESTRING Z ((-8304623.226 5207684.737 0...,drb,,74913,74913,20401020302,1,...,20.136201,-4.352814,-0.053658,-217.163799,0.0,0.0,0.0,-4.352814,-0.053658,-217.163799
1748539,1639.4128,1640.7,11.223,MULTILINESTRING Z ((-8316446.558 5197994.113 0...,drb,,74921,74921,20401020305,1,...,25.067574,-4.417563,-0.05438,-212.232426,0.0,0.0,0.0,-4.417563,-0.05438,-212.232426


In [105]:
pa.calc.add_xsnps('catch', catch_loads_gdf, catch_loads_df, run_type='combined')
catch_loads_gdf.head(3)

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tss_loadrate,tn_loadrate_xs,tp_loadrate_xs,tss_loadrate_xs,tn_loadrate_ps,tp_loadrate_ps,tss_loadrate_ps,tss_loadrate_xsnps,tn_loadrate_xsnps,tp_loadrate_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,"MULTIPOLYGON (((-8301340.781 5199034.787, -830...",drb,,74914,74914,20401020302,1,...,169.564734,-15.118158,-0.126891,-754.235266,0.0,0.0,,-754.235266,-15.118158,-0.126891
1748537,1663.1712,1664.46,11.189,"MULTIPOLYGON (((-8304909.314 5200051.727, -830...",drb,,74913,74913,20401020302,1,...,121.054237,-14.802445,-0.091522,-802.745763,0.0,0.0,,-802.745763,-14.802445,-0.091522
1748539,1639.4128,1640.7,11.223,"MULTIPOLYGON (((-8315191.630 5191704.467, -831...",drb,,74921,74921,20401020305,1,...,153.349047,-15.158687,-0.0921,-770.450953,0.0,0.0,,-770.450953,-15.158687,-0.0921


## Confirm Results vs. MMW Sub-basin

Confirm values make sense for COMIDs with:
- no point sources: `4648450`
- large point sources: `932040160`

In [106]:
reach_concs_gdf[['tp_conc',
                'tp_conc_ps',
                'tp_conc_xs',
                'tp_conc_xsnps',
               ]].loc[[4648450, 932040160]]

Unnamed: 0_level_0,tp_conc,tp_conc_ps,tp_conc_xs,tp_conc_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4648450,0.324502,0.0,0.234502,0.234502
932040160,0.284033,0.086628,0.194033,0.107405


In [107]:
catch_loads_gdf[['tp_loadrate',
                'tp_loadrate_xs',
                'tp_loadrate_xsnps',
               ]].loc[[4648450, 932040160]]

Unnamed: 0_level_0,tp_loadrate,tp_loadrate_xs,tp_loadrate_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4648450,1.554258,1.244258,1.244258
932040160,7.054176,6.744176,0.187592


In [108]:
catch_loads_gdf[['tn_loadrate',
                'tn_loadrate_ps',
                'tn_loadrate_xs',
                'tn_loadrate_xsnps',
               ]].loc[[4648450, 932040160]]

Unnamed: 0_level_0,tn_loadrate,tn_loadrate_ps,tn_loadrate_xs,tn_loadrate_xsnps
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4648450,25.356569,0.0,8.286569,8.286569
932040160,49.770205,40.932734,32.700205,-8.232529


In [109]:
# confirm point source
df = pa.calc.calc_loadrate(
    catch_loads_gdf, catch_loads_df, 'TotalN', 
    pa.calc.run_groups[0], run_type='combined', ps=True
)
df.loc[[4648450, 932040160]]

comid
4648450       0.000000
932040160    40.932734
dtype: float64

# REM: Remaining XSNPS after Restoration

In [110]:
for group_key in [1,2,3]:
    pa.calc.add_remaining('reach', reach_concs_gdf, reach_concs_df, group_key, run_type='combined')
reach_concs_gdf.loc[[4648450, 932040160]]

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tss_conc_xsnps,tn_conc_rem1,tp_conc_rem1,tss_conc_rem1,tn_conc_rem2,tp_conc_rem2,tss_conc_rem2,tn_conc_rem3,tp_conc_rem3,tss_conc_rem3
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4648450,263.4373,263.61,1.412,MULTILINESTRING Z ((-8449613.219 4882948.059 0...,Brandywine and Christina,,64639,64639,20402050202,1,...,-86.572683,0.249977,0.156533,-98.222925,0.249977,0.156533,-98.222925,0.248132,0.155055,-98.45204
932040160,497.8113,14124.96,74.903,MULTILINESTRING Z ((-8440707.897 4862261.408 0...,Brandywine and Christina,,64609,64658,20402050202,3,...,-96.81671,-0.607268,0.082555,-150.177255,-0.607268,0.082555,-150.177255,-0.613917,0.076873,-150.473741


In [111]:
for group_key in [1,2,3]:
    pa.calc.add_remaining('catch', catch_loads_gdf, catch_loads_df, group_key)
catch_loads_gdf.loc[[4648450, 932040160]]

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tp_loadrate_xsnps,tn_loadrate_rem1,tp_loadrate_rem1,tss_loadrate_rem1,tn_loadrate_rem2,tp_loadrate_rem2,tss_loadrate_rem2,tn_loadrate_rem3,tp_loadrate_rem3,tss_loadrate_rem3
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4648450,263.4373,263.61,1.412,"MULTIPOLYGON (((-8449229.677 4880762.406, -844...",Brandywine and Christina,,64639,64639,20402050202,1,...,1.244258,6.782443,0.870812,-257.666738,6.782443,0.870812,-257.666738,6.773609,0.863734,-258.764124
932040160,497.8113,14124.96,74.903,"MULTIPOLYGON (((-8442711.700 4860467.546, -844...",Brandywine and Christina,,64609,64658,20402050202,3,...,0.187592,-8.232529,0.187592,162.418203,-8.232529,0.187592,162.418203,-8.242159,0.177068,162.269713


# AVOID: Prevented (Avoided) Loads due to Protection

In [112]:
pa.calc.run_groups[4]

'Direct WPF Protection'

In [113]:
pa.calc.add_avoided('reach', reach_concs_gdf, reach_concs_df, 4, run_type='combined')
reach_concs_gdf.loc[pa.calc.comid_test_dict.keys(),
    ['tp_conc', 'tp_conc_xsnps','tp_conc_rem3', 'tp_conc_avoid']]

Unnamed: 0_level_0,tp_conc,tp_conc_xsnps,tp_conc_rem3,tp_conc_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4648450,0.324502,0.234502,0.155055,0.0
4648684,0.614707,0.152823,0.1501,0.0
932040160,0.284033,0.107405,0.076873,0.0
2583195,0.064066,-0.025934,-0.025937,0.054172
932040230,0.006167,-0.083833,-0.083833,0.015771
2619256,0.009448,-0.080902,-0.080953,0.0


In [114]:
pa.calc.add_avoided('catch', catch_loads_gdf, catch_loads_df, 4, run_type='combined')
catch_loads_gdf.loc[pa.calc.comid_test_dict.keys(),
    ['tp_loadrate', 'tp_loadrate_xsnps','tp_loadrate_rem3', 'tp_loadrate_avoid']]

Unnamed: 0_level_0,tp_loadrate,tp_loadrate_xsnps,tp_loadrate_rem3,tp_loadrate_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
4648450,1.554258,1.244258,0.863734,0.0
4648684,5.05825,0.877691,0.868742,0.0
932040160,7.054176,0.187592,0.177068,0.0
2583195,0.359205,0.049205,0.04919,0.303731
932040230,0.082214,-0.227786,-0.227786,0.216275
2619256,0.158917,-0.151083,-0.151098,0.0


In [115]:
catch_loads_gdf.loc[(catch_loads_gdf.tp_loadrate_avoid > 0)]

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tss_loadrate_rem1,tn_loadrate_rem2,tp_loadrate_rem2,tss_loadrate_rem2,tn_loadrate_rem3,tp_loadrate_rem3,tss_loadrate_rem3,tn_loadrate_avoid,tp_loadrate_avoid,tss_loadrate_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2583191,277.9706,356.76,2.257,"MULTIPOLYGON (((-8321373.063 5036691.907, -832...",New Jersey Highlands,,70084,70085,020401050101,1,...,-645.705971,-14.543512,-0.070038,-645.705971,-14.543590,-0.070051,-645.712933,0.001428,0.012961,0.539511
2583195,244.9663,245.16,1.537,"MULTIPOLYGON (((-8322884.353 5034593.476, -832...",New Jersey Highlands,,70082,70082,020401050101,1,...,-492.602728,-13.993447,0.049205,-492.602728,-13.993536,0.049190,-492.610898,0.033450,0.303731,12.640769
2583199,215.1096,1018.44,7.164,"MULTIPOLYGON (((-8325931.539 5035114.436, -832...",New Jersey Highlands,,70074,70080,020401050101,2,...,-444.071676,-15.198850,-0.088779,-444.071676,-15.199058,-0.088814,-444.090349,0.015249,0.137018,5.734300
2583463,163.6690,163.80,1.131,"MULTIPOLYGON (((-8348195.342 5014278.655, -834...",New Jersey Highlands,,70175,70175,020401050104,1,...,-733.990121,-16.267772,-0.227914,-733.990121,-16.267783,-0.227920,-733.990469,0.002733,0.024870,1.135799
2583501,343.6150,1712.34,13.092,"MULTIPOLYGON (((-8349076.358 5012570.234, -834...",New Jersey Highlands,,70168,70175,020401050104,2,...,-762.747681,-16.007439,-0.221786,-762.747681,-16.007477,-0.221813,-762.748481,0.004765,0.043350,1.979877
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
932040255,76.5288,2988.72,23.167,"MULTIPOLYGON (((-8358153.033 5048788.699, -835...",Poconos and Kittatinny,4,70847,70860,020401040602,2,...,-852.928627,-16.710960,-0.278467,-852.928627,-16.710960,-0.278467,-852.928627,0.022960,0.195906,9.870424
932040257,3.9570,3413.34,30.136,"MULTIPOLYGON (((-8399647.420 5049584.922, -839...",Upper Lehigh,,76203,76227,020401060201,3,...,-733.829159,-10.827665,-0.111938,-733.829159,-10.827665,-0.111938,-733.829159,0.005598,0.042961,2.053524
932040258,68.8874,459.27,3.954,"MULTIPOLYGON (((-8400132.510 5050177.661, -840...",Upper Lehigh,,76230,76232,020401060201,1,...,-609.432109,-12.958088,-0.091048,-609.432109,-12.958105,-0.091050,-609.433596,0.001465,0.012993,0.573357
932040267,7.6439,2230.92,17.444,"MULTIPOLYGON (((-8357210.728 5049678.166, -835...",Poconos and Kittatinny,,70848,70855,020401040602,2,...,-505.833607,-16.303832,-0.166822,-505.833607,-16.303832,-0.166822,-505.833607,0.000553,0.004506,0.220272


# Confirm Results

In [116]:
pa.calc.comid_test_dict

{4648450: 'no point sources',
 4648684: 'Upper E Branch Brandywine',
 932040160: 'large point sources',
 2583195: 'protection projects',
 932040230: 'restoration and protection projects',
 2619256: "where run_type='combined' gives a value when 'single' does not"}

In [117]:
catch_loads_gdf[['tp_loadrate',
                'tp_loadrate_xs',
                'tp_loadrate_xsnps',
                'tp_loadrate_rem1',
                'tp_loadrate_rem2',
                'tp_loadrate_rem3',
                'tp_loadrate_avoid',
               ]].loc[pa.calc.comid_test_dict.keys()]

Unnamed: 0_level_0,tp_loadrate,tp_loadrate_xs,tp_loadrate_xsnps,tp_loadrate_rem1,tp_loadrate_rem2,tp_loadrate_rem3,tp_loadrate_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
4648450,1.554258,1.244258,1.244258,0.870812,0.870812,0.863734,0.0
4648684,5.05825,4.74825,0.877691,0.875928,0.875928,0.868742,0.0
932040160,7.054176,6.744176,0.187592,0.187592,0.187592,0.177068,0.0
2583195,0.359205,0.049205,0.049205,0.049205,0.049205,0.04919,0.303731
932040230,0.082214,-0.227786,-0.227786,-0.227786,-0.227786,-0.227786,0.216275
2619256,0.158917,-0.151083,-0.151083,-0.151083,-0.151083,-0.151098,0.0


In [118]:
catch_loads_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 51 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19496 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

# Aggregate Results by Geography

## Method 1: Non-attenuated catchments loads, similar to PA1
PA1 Task 3d, as executed in `stage1/WikiSRAT_AnalysisViz_Clean.ipynb`

Back-calculate loads (kg/y) from excess & remaining loading rates (kg/ha/y)

In [119]:
# Back calculate Loads (kg/y) from excess & remaining loading rates (kg/ha/y)
for suffix in ['ps', 'xsnps', 'rem1', 'rem2', 'rem3', 'avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        catch_loads_gdf[f'{pollutant}_load_{suffix}'] = (
            catch_loads_gdf[f'{pollutant}_loadrate_{suffix}']
            * catch_loads_gdf.catchment_hectares
        )

In [120]:
catch_loads_gdf.funding_sources.value_counts()

funding_sources
Delaware River Restoration Fund                                                                                         0
Delaware River Restoration Fund, Delaware River Operational Fund, Delaware Watershed Conservation Fund                  0
Delaware River Restoration Fund, Delaware River Operational Fund, Delaware Watershed Conservation Fund, PADEP, NJDEP    0
Delaware River Watershed Protection Fund - Forestland Capital Grants                                                    0
Name: count, dtype: int64

In [121]:
reload(pa.calc)
pa.calc.run_group_sources

{0: [],
 1: 'Delaware River Restoration Fund',
 2: ['Delaware River Restoration Fund',
  'Delaware River Operational Fund',
  'Delaware Watershed Conservation Fund'],
 3: ['Delaware River Restoration Fund',
  'Delaware River Operational Fund',
  'Delaware Watershed Conservation Fund',
  'PADEP',
  'NJDEP'],
 4: 'Delaware River Watershed Protection Fund - Forestland Capital Grants'}

In [122]:
catch_loads_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 69 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19496 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

### Sum all DRWI

In [123]:
# Create list of columns to aggregate
columns_to_aggregate = [
    'catchment_hectares',        # catcment area
    'tn_load','tp_load','tss_load', # baseline loads
    'tn_load_ps','tp_load_ps',    # point source loads
    'tn_load_avoid','tp_load_avoid','tss_load_avoid', # avoided loads from land protection
    'tn_load_xsnps','tp_load_xsnps','tss_load_xsnps', # excess nonpoint source loads
    'tn_load_rem1','tp_load_rem1','tss_load_rem1', # remaining loads after restoration
    'tn_load_rem2','tp_load_rem2','tss_load_rem2', # remaining loads after restoration
    'tn_load_rem3','tp_load_rem3','tss_load_rem3', # remaining loads after restoration
    ]

In [124]:
# Sum selected columns and move to a new dataframe
drwi_load_df = catch_loads_gdf.loc[:,columns_to_aggregate].sum()

In [125]:
drwi_load_df

catchment_hectares    3.786557e+06
tn_load               5.619562e+07
tp_load               4.100625e+06
tss_load              2.012355e+09
tn_load_ps            2.647747e+07
tp_load_ps            2.214209e+06
tn_load_avoid         4.533677e+02
tp_load_avoid         4.103550e+03
tss_load_avoid        1.825090e+05
tn_load_xsnps        -3.491837e+07
tp_load_xsnps         7.125833e+05
tss_load_xsnps       -1.485666e+09
tn_load_rem1         -3.497877e+07
tp_load_rem1          6.921854e+05
tss_load_rem1        -1.498726e+09
tn_load_rem2         -3.497984e+07
tp_load_rem2          6.918087e+05
tss_load_rem2        -1.498890e+09
tn_load_rem3         -3.500059e+07
tp_load_rem3          6.718452e+05
tss_load_rem3        -1.500867e+09
dtype: float64

In [126]:
# Total Load Reductions, kg/y
tp_load_red3_total = drwi_load_df.tp_load_xsnps - drwi_load_df.tp_load_rem3
tp_load_red3_total

40738.0801989981

Save `drwi_load_df` to CSV at end, to import into Excel for PA1-style tally.

#### tp_load_xsnps is much higher than in Stage 1
7.125833e+05 vs 2.566874e+05 in Stage 1

Even though tp_load is similar:  
4.100625e+06 vs 3.928625e+06 in Stage 1

DECIDED: Due to errors in Stage 1 point source loads, which Barry discovered in March 2022, where TN & TP loads were swapped for 34 point sources. See [2022-11-02 Special Meeting](https://docs.google.com/document/d/1R-uHVoTdI_-orxaqAZ6S8LOPUffnHVjkIzS8v1YsDSQ/edit#heading=h.ihwawmpbkc4h) notes.

In [127]:
# tp_load, fraction increase from Stage 1 to Stage 2
pa1 = 3.928625e+06
pa2 = drwi_load_df.tp_load
print(f'PA2 load:   {pa2}')
print(f'difference: {pa2 - pa1}')
print(f'fraction:   {pa2/pa1}')

PA2 load:   4100625.0712229405
difference: 172000.07122294046
fraction:   1.0437812392943944


In [128]:
# tp_load_ps, fraction increase from Stage 1 to Stage 2
pa1 = 2.498105e+06
pa2 = drwi_load_df.tp_load_ps
print(f'PA2 load:   {pa2}')
print(f'difference: {pa2 - pa1}')
print(f'fraction:   {pa2/pa1}')

PA2 load:   2214209.1763966037
difference: -283895.8236033963
fraction:   0.8863555280489025


In [129]:
# tp_load_xsnps, fraction increase from Stage 1 to Stage 2
pa1 = 2.566874e+05
pa2 = drwi_load_df.tp_load_xsnps
print(f'PA2 load:   {pa2}')
print(f'difference: {pa2 - pa1}')
print(f'fraction:   {pa2/pa1}')

PA2 load:   712583.306759336
difference: 455895.906759336
fraction:   2.7760743486409387


#### DRWI Loads not in Clusters
Copied functions from Task 3d in `stage1/WikiSRAT_AnalysisViz_Clean.ipynb`

In [130]:
catch_loads_gdf

Unnamed: 0_level_0,catchment_hectares,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,...,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1748535,6496.7052,6501.69,43.699,"MULTIPOLYGON (((-8301340.781 5199034.787, -830...",drb,,74914,74914,020401020302,1,...,-4.900044e+06,-98218.212978,-824.370381,-4.900044e+06,-98218.212978,-824.370381,-4.900044e+06,0.0,0.0,0.0
1748537,1663.1712,1664.46,11.189,"MULTIPOLYGON (((-8304909.314 5200051.727, -830...",drb,,74913,74913,020401020302,1,...,-1.335104e+06,-24619.000241,-152.216636,-1.335104e+06,-24619.000241,-152.216636,-1.335104e+06,0.0,0.0,0.0
1748539,1639.4128,1640.70,11.223,"MULTIPOLYGON (((-8315191.630 5191704.467, -831...",drb,,74921,74921,020401020305,1,...,-1.263087e+06,-24851.346141,-150.989978,-1.263087e+06,-24851.346141,-150.989978,-1.263087e+06,0.0,0.0,0.0
1748541,3013.8348,12912.30,86.528,"MULTIPOLYGON (((-8309824.403 5193427.492, -830...",drb,,74911,74915,020401020302,2,...,-2.166466e+06,-45036.645594,-265.319709,-2.166466e+06,-45036.645594,-265.319709,-2.166466e+06,0.0,0.0,0.0
1748543,1151.0990,5232.87,35.389,"MULTIPOLYGON (((-8312514.529 5185023.831, -831...",drb,,74920,74922,020401020305,2,...,-8.644306e+05,-16731.023105,-39.379243,-8.644306e+05,-16731.023105,-39.379243,-8.644306e+05,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
932040366,2124.7248,2720941.47,17802.923,"MULTIPOLYGON (((-8403944.327 4826463.781, -840...",drb,,65070,76964,020402060103,7,...,-5.384968e+05,-24032.877292,-21.133361,-5.384968e+05,-24032.877292,-21.133361,-5.384968e+05,0.0,0.0,0.0
932040367,788.7859,2717821.26,17788.281,"MULTIPOLYGON (((-8400739.269 4832000.931, -840...",drb,,65079,76964,020402060103,7,...,-3.078111e+05,-7711.474156,64.448279,-3.078111e+05,-7712.536274,64.274288,-3.078925e+05,0.0,0.0,0.0
932040368,265.0275,2716120.08,17780.448,"MULTIPOLYGON (((-8399608.027 4833463.133, -839...",drb,,65080,76960,020402060103,7,...,-1.696431e+05,-4335.150158,-63.151696,-1.696431e+05,-4335.150158,-63.151696,-1.696431e+05,0.0,0.0,0.0
932040369,1106.5294,2889095.67,18624.999,"MULTIPOLYGON (((-8409371.984 4816335.622, -840...",drb,,64232,76965,020402040000,7,...,1.100270e+06,-17654.277907,-120.436420,1.100270e+06,-17654.277907,-120.436420,1.100270e+06,0.0,0.0,0.0


In [131]:
catch_loads_gdf['cluster'].value_counts(dropna=False)

cluster
drb                               8536
Kirkwood - Cohansey Aquifer       3224
NaN                               2138
Poconos and Kittatinny            2069
Upper Lehigh                       962
New Jersey Highlands               795
Brandywine and Christina           767
Middle Schuylkill                  717
Schuylkill Highlands               187
Upstream Suburban Philadelphia     101
Name: count, dtype: int64

In [132]:
# Develop mask
mask = catch_loads_gdf['cluster'].isnull()
mask.value_counts()

cluster
False    17358
True      2138
Name: count, dtype: int64

In [133]:
# Sum loads for DRWI, excluding Clusters via mask
mask = catch_loads_gdf['cluster'].isnull()

# Preselect colums to keep
# Non-summable dtypes (object, category, geometry) will be dropped automatically
drwi_load_noClus_df = catch_loads_gdf[mask].loc[:,
    columns_to_aggregate
].sum()

drwi_load_noClus_df

catchment_hectares    1.403987e+05
tn_load               9.917051e+05
tp_load               8.318639e+04
tss_load              2.464431e+08
tn_load_ps            0.000000e+00
tp_load_ps            0.000000e+00
tn_load_avoid         0.000000e+00
tp_load_avoid         0.000000e+00
tss_load_avoid        0.000000e+00
tn_load_xsnps        -1.404901e+06
tp_load_xsnps         3.966280e+04
tss_load_xsnps        1.167427e+08
tn_load_rem1         -1.404901e+06
tp_load_rem1          3.966280e+04
tss_load_rem1         1.167427e+08
tn_load_rem2         -1.404901e+06
tp_load_rem2          3.966280e+04
tss_load_rem2         1.167427e+08
tn_load_rem3         -1.404989e+06
tp_load_rem3          3.964853e+04
tss_load_rem3         1.167349e+08
dtype: float64

Save `drwi_load_noClus_df` to CSV at end, to import into Excel for PA1-style tally.

#### DRWI loads in DRB
Copy functions from `stage1/WikiSRAT_AnalysisViz_Clean.ipynb`

In [134]:
# Develop mask, for where 'in_drb'=True
mask = catch_loads_gdf['in_drb']
mask.value_counts()

in_drb
True     16033
False     3463
Name: count, dtype: Int64

In [135]:
# Sum loads for DRB, excluding Clusters via mask
mask = catch_loads_gdf['in_drb']

# Preselect colums to keep
# Exclude non-summable dtypes (object, category, geometry)
drwi_load_drb_df = catch_loads_gdf[mask].loc[:,
    columns_to_aggregate
].sum()

Save `drwi_load_drb_df` to CSV at end, to import into Excel for PA1-style tally.

### Sum by Cluster
Copied functions from `stage1/WikiSRAT_AnalysisViz_Clean.ipynb`

In [136]:
catch_loads_gdf.cluster.value_counts()

cluster
drb                               8536
Kirkwood - Cohansey Aquifer       3224
Poconos and Kittatinny            2069
Upper Lehigh                       962
New Jersey Highlands               795
Brandywine and Christina           767
Middle Schuylkill                  717
Schuylkill Highlands               187
Upstream Suburban Philadelphia     101
Name: count, dtype: int64

In [137]:
# Sum loads by Cluster categories
# Preselect colums to keep
# Non-summable dtypes (object, category, geometry) will be dropped automatically
columns = columns_to_aggregate.copy()
columns.append('cluster')

cluster_load_df = catch_loads_gdf.loc[:,
    columns
].groupby('cluster', observed=True).sum()

In [138]:
cluster_load_df

Unnamed: 0_level_0,catchment_hectares,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_xsnps,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Brandywine and Christina,145739.1,2497755.0,140709.5,95088940.0,373787.1,33920.4,0.0,0.0,0.0,-363797.2,...,-39544800.0,-379591.6,56047.189315,-45696600.0,-379591.6,56047.189315,-45696600.0,-382159.5,53681.210694,-45984570.0
Kirkwood - Cohansey Aquifer,550179.6,4002328.0,223432.3,206832300.0,1060019.0,91737.99,44.645635,388.699106,15874.813732,-6449257.0,...,-301423600.0,-6450571.0,-39218.6616,-301703500.0,-6450846.0,-39369.016058,-301780300.0,-6451083.0,-39473.40709,-301799600.0
Middle Schuylkill,202958.6,5101492.0,425934.5,154060300.0,1616390.0,142474.7,0.0,0.0,0.0,20599.42,...,-33432820.0,-935.462,212986.902837,-35883320.0,-935.462,212986.902837,-35883320.0,-4630.41,209083.898699,-36376310.0
New Jersey Highlands,178647.1,2097074.0,139428.8,83155020.0,462256.3,41890.87,57.730731,523.070087,23225.834845,-1414690.0,...,-81879220.0,-1433945.0,36158.604398,-85568360.0,-1434278.0,35979.668746,-85626130.0,-1434549.0,35737.351813,-85639910.0
Poconos and Kittatinny,342462.1,1078941.0,68330.83,62414820.0,89188.41,8572.551,212.720556,1916.083347,89641.703433,-4856076.0,...,-253951700.0,-4856076.0,-46404.987224,-253951700.0,-4856076.0,-46404.987224,-253951700.0,-4856547.0,-46857.900647,-254004600.0
Schuylkill Highlands,44855.11,770268.6,52589.6,22177910.0,171732.2,19183.91,13.964929,130.14349,5784.379578,-167140.4,...,-19259250.0,-167734.5,19283.956033,-19311600.0,-167734.5,19283.956033,-19311600.0,-168297.9,18729.361899,-19344640.0
Upper Lehigh,198029.8,742984.9,76817.22,53667830.0,90498.07,15750.53,103.852805,956.163128,40291.405096,-2727881.0,...,-129272100.0,-2727881.0,-322.538404,-129272100.0,-2727881.0,-322.538404,-129272100.0,-2728996.0,-1355.563146,-129486400.0
Upstream Suburban Philadelphia,37411.09,533446.7,26574.06,31462100.0,104574.7,9204.599,0.0,0.0,0.0,-209735.4,...,-3098272.0,-210896.1,5425.619514,-3377611.0,-211101.4,5403.914215,-3395173.0,-211314.6,5228.677446,-3405166.0
drb,1945876.0,38379630.0,2863622.0,1057053000.0,22509020.0,1851474.0,20.453044,189.391138,7690.856153,-17345500.0,...,-740546800.0,-17346240.0,408566.504349,-740703700.0,-17346490.0,408540.761905,-740715900.0,-17358030.0,397423.069751,-741560300.0


Save `cluster_load_df` to CSV at end, to import into Excel for PA1-style tally.

### Sum by Focus Area within Clusters

In [139]:
catch_loads_gdf.phase.value_counts()

phase
Phase 1    2708
Phase 2    1374
Name: count, dtype: int64

In [140]:
# Create merged name for Focus Area by Phase
catch_loads_gdf['fa_name_phase'] = (
    catch_loads_gdf.phase.dropna().astype('str') 
    + ' ' 
    + catch_loads_gdf.fa_name.dropna().astype('str')
    )
# Change type to category
catch_loads_gdf['fa_name_phase'] = catch_loads_gdf['fa_name_phase'].astype('category')

In [141]:
catch_loads_gdf.fa_name_phase.value_counts()

fa_name_phase
Phase 1 Core Pine Barrens                918
Phase 1 Cohansey-Maurice                 399
Phase 1 Salem River                      279
Phase 1 Bush Kill/Hornbecks Creek        254
Phase 1 Upper Delaware River Corridor    193
                                        ... 
Phase 2 Lower Maiden Cr Trib 2             1
Phase 2 Upper French Creek                 1
Phase 2 Pine Creek/Pickering Creek         1
Phase 2 Lower Maiden Cr Trib 3             1
Phase 2 Sixpenny                           1
Name: count, Length: 97, dtype: int64

In [142]:
# Sum loads by Focus Area categories
# Preselect colums to keep
# Non-summable dtypes (object, category, geometry) will be dropped automatically
columns = columns_to_aggregate.copy()
columns.append('fa_name_phase')

focusarea_load_df = catch_loads_gdf.loc[:,
    columns
].groupby('fa_name_phase', observed=True).sum()

In [143]:
focusarea_load_df

Unnamed: 0_level_0,catchment_hectares,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_xsnps,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
fa_name_phase,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Phase 1 Bear Creek,7167.2606,2.266742e+04,1578.904790,1.456950e+06,0.000000,0.000000,8.193425,72.607321,3299.212249,-9.967772e+04,...,-5.164166e+06,-9.967772e+04,-642.945996,-5.164166e+06,-9.967772e+04,-642.945996,-5.164166e+06,-9.967806e+04,-643.190468,-5.164204e+06
Phase 1 Bush Kill/Hornbecks Creek,44484.9440,1.005508e+05,6145.360358,6.595345e+06,2875.878244,393.816742,114.556461,1024.495784,47687.578266,-6.616831e+05,...,-3.449985e+07,-6.616831e+05,-8038.789024,-3.449985e+07,-6.616831e+05,-8038.789024,-3.449985e+07,-6.617389e+05,-8099.243218,-3.450051e+07
Phase 1 Cohansey-Maurice,79845.4069,1.191381e+06,61579.567751,3.037723e+07,484802.411012,40900.217900,0.000000,0.000000,0.000000,-6.563830e+05,...,-4.338396e+07,-6.568176e+05,-4117.122271,-4.343690e+07,-6.568176e+05,-4117.122271,-4.343690e+07,-6.568485e+05,-4123.911715,-4.343949e+07
Phase 1 Core Pine Barrens,131694.8683,2.366751e+05,13593.768324,2.261975e+07,4983.039094,424.775994,24.210280,212.835929,8666.089475,-2.016339e+06,...,-9.903997e+07,-2.016339e+06,-27656.416843,-9.903997e+07,-2.016339e+06,-27656.416843,-9.903997e+07,-2.016343e+06,-27657.062935,-9.904032e+07
Phase 1 French Creek Headwaters,4599.8253,6.484916e+04,3183.661904,2.002316e+06,1.949682,38.024761,2.653647,24.395604,1068.801991,-1.367181e+04,...,-2.247003e+06,-1.371735e+04,1696.531838,-2.247779e+06,-1.371735e+04,1696.531838,-2.247779e+06,-1.375623e+04,1660.581638,-2.251043e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Phase 2 Upper Musconetcong,9168.7891,4.098103e+04,2655.157768,5.268609e+06,0.000000,0.000000,11.153485,100.465577,4247.635380,-1.155302e+05,...,-3.201519e+06,-1.155302e+05,-187.166853,-3.201519e+06,-1.155302e+05,-187.166853,-3.201519e+06,-1.155322e+05,-187.507906,-3.201699e+06
Phase 2 Upper Neversink,8649.9809,2.387317e+04,965.901530,7.313115e+05,556.300809,67.261300,0.000000,0.000000,0.000000,-1.243383e+05,...,-7.259541e+06,-1.243383e+05,-1782.853849,-7.259541e+06,-1.243383e+05,-1782.853849,-7.259541e+06,-1.243383e+05,-1782.853849,-7.259541e+06
Phase 2 Upper Salem River,8402.0393,1.940543e+05,8885.140270,6.054299e+06,9664.697506,93.801041,0.000000,0.000000,0.000000,4.096679e+04,...,-1.707505e+06,4.064160e+04,6015.903243,-1.810821e+06,4.044203e+04,5907.572429,-1.871147e+06,4.040004e+04,5876.212400,-1.874539e+06
Phase 2 Welkinweir/Beaver Run,165.6616,2.084011e+03,103.094116,4.993718e+04,0.000000,0.000000,0.252033,2.285113,97.890760,-7.438320e+02,...,-1.031010e+05,-9.204020e+02,-17.141768,-1.289074e+05,-9.204020e+02,-17.141768,-1.289074e+05,-9.238046e+02,-20.678997,-1.290530e+05


In [144]:
# Add back categoricals that were dropped
left = focusarea_load_df.copy()
right = catch_loads_gdf.loc[:,['fa_name_phase','cluster', 'phase','fa_name']].dropna().drop_duplicates()
focusarea_load_df = pd.merge(left,right, on='fa_name_phase').set_index('fa_name_phase')
focusarea_load_df

Unnamed: 0_level_0,catchment_hectares,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_xsnps,...,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,cluster,phase,fa_name
fa_name_phase,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Phase 1 Bear Creek,7167.2606,2.266742e+04,1578.904790,1.456950e+06,0.000000,0.000000,8.193425,72.607321,3299.212249,-9.967772e+04,...,-5.164166e+06,-9.967772e+04,-642.945996,-5.164166e+06,-9.967806e+04,-643.190468,-5.164204e+06,Upper Lehigh,Phase 1,Bear Creek
Phase 1 Bush Kill/Hornbecks Creek,44484.9440,1.005508e+05,6145.360358,6.595345e+06,2875.878244,393.816742,114.556461,1024.495784,47687.578266,-6.616831e+05,...,-3.449985e+07,-6.616831e+05,-8038.789024,-3.449985e+07,-6.617389e+05,-8099.243218,-3.450051e+07,Poconos and Kittatinny,Phase 1,Bush Kill/Hornbecks Creek
Phase 1 Cohansey-Maurice,79845.4069,1.191381e+06,61579.567751,3.037723e+07,484802.411012,40900.217900,0.000000,0.000000,0.000000,-6.563830e+05,...,-4.343690e+07,-6.568176e+05,-4117.122271,-4.343690e+07,-6.568485e+05,-4123.911715,-4.343949e+07,Kirkwood - Cohansey Aquifer,Phase 1,Cohansey-Maurice
Phase 1 Core Pine Barrens,131694.8683,2.366751e+05,13593.768324,2.261975e+07,4983.039094,424.775994,24.210280,212.835929,8666.089475,-2.016339e+06,...,-9.903997e+07,-2.016339e+06,-27656.416843,-9.903997e+07,-2.016343e+06,-27657.062935,-9.904032e+07,Kirkwood - Cohansey Aquifer,Phase 1,Core Pine Barrens
Phase 1 French Creek Headwaters,4599.8253,6.484916e+04,3183.661904,2.002316e+06,1.949682,38.024761,2.653647,24.395604,1068.801991,-1.367181e+04,...,-2.247779e+06,-1.371735e+04,1696.531838,-2.247779e+06,-1.375623e+04,1660.581638,-2.251043e+06,Schuylkill Highlands,Phase 1,French Creek Headwaters
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Phase 2 Upper Musconetcong,9168.7891,4.098103e+04,2655.157768,5.268609e+06,0.000000,0.000000,11.153485,100.465577,4247.635380,-1.155302e+05,...,-3.201519e+06,-1.155302e+05,-187.166853,-3.201519e+06,-1.155322e+05,-187.507906,-3.201699e+06,New Jersey Highlands,Phase 2,Upper Musconetcong
Phase 2 Upper Neversink,8649.9809,2.387317e+04,965.901530,7.313115e+05,556.300809,67.261300,0.000000,0.000000,0.000000,-1.243383e+05,...,-7.259541e+06,-1.243383e+05,-1782.853849,-7.259541e+06,-1.243383e+05,-1782.853849,-7.259541e+06,Poconos and Kittatinny,Phase 2,Upper Neversink
Phase 2 Upper Salem River,8402.0393,1.940543e+05,8885.140270,6.054299e+06,9664.697506,93.801041,0.000000,0.000000,0.000000,4.096679e+04,...,-1.810821e+06,4.044203e+04,5907.572429,-1.871147e+06,4.040004e+04,5876.212400,-1.874539e+06,Kirkwood - Cohansey Aquifer,Phase 2,Upper Salem River
Phase 2 Welkinweir/Beaver Run,165.6616,2.084011e+03,103.094116,4.993718e+04,0.000000,0.000000,0.252033,2.285113,97.890760,-7.438320e+02,...,-1.289074e+05,-9.204020e+02,-17.141768,-1.289074e+05,-9.238046e+02,-20.678997,-1.290530e+05,Schuylkill Highlands,Phase 2,Welkinweir/Beaver Run


Save `focusarea_load_df` to CSV at end, to import into Excel for PA1-style tally.
Sort by 'cluster.

#### Cluster loads NOT IN Focus Area (noFA)
To add to cluster summary by focus area, below

In [145]:
# Develop mask
mask = catch_loads_gdf['fa_name_phase'].isnull()
mask.value_counts()

fa_name_phase
True     15414
False     4082
Name: count, dtype: int64

In [146]:
# Sum loads by Cluster categories, excluding Focus Areas via mask
# Preselect colums to keep
columns = columns_to_aggregate.copy()
columns.append('cluster')

cluster_load_noFA_df = catch_loads_gdf[mask].loc[:,
    columns
].groupby('cluster', observed=True).sum()

cluster_load_noFA_df

Unnamed: 0_level_0,catchment_hectares,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_xsnps,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Brandywine and Christina,126864.4,2065244.0,116004.3,81187480.0,365112.1,33237.67,0.0,0.0,0.0,-465443.3,...,-36009860.0,-467375.5,42848.751133,-36576530.0,-467375.5,42848.751133,-36576530.0,-469347.5,40891.180756,-36692500.0
Kirkwood - Cohansey Aquifer,229360.6,1179912.0,60483.08,90502970.0,140224.1,12421.37,2.290827,18.416261,766.343703,-2875498.0,...,-121380400.0,-2875503.0,-23042.363105,-121382500.0,-2875503.0,-23042.363105,-121382500.0,-2875611.0,-23078.519488,-121391100.0
Middle Schuylkill,172185.0,4414272.0,360876.0,127872600.0,1612201.0,140447.5,0.0,0.0,0.0,-137127.6,...,-31191900.0,-146921.7,164114.772523,-32696820.0,-146921.7,164114.772523,-32696820.0,-150134.7,160767.600042,-33120400.0
New Jersey Highlands,97953.76,1226267.0,71898.77,40705680.0,205260.4,14402.25,4.066981,37.277411,1719.866612,-651064.1,...,-49784000.0,-651876.5,26735.931422,-49949560.0,-651876.5,26735.931422,-49949560.0,-652104.8,26519.006873,-49960430.0
Poconos and Kittatinny,182634.1,617804.7,43387.72,37929140.0,73677.74,6885.628,6.699034,60.185624,2824.615177,-2573437.0,...,-130788200.0,-2573437.0,-20114.46664,-130788200.0,-2573437.0,-20114.46664,-130788200.0,-2573738.0,-20387.949535,-130832000.0
Schuylkill Highlands,20561.35,446796.8,33377.02,11746810.0,171693.2,19105.75,1.613102,14.897821,632.039528,-75878.55,...,-7247762.0,-75878.82,7897.104723,-7247762.0,-75878.82,7897.104723,-7247762.0,-76152.42,7629.441522,-7261826.0
Upper Lehigh,142704.1,611301.7,66262.89,44318560.0,79992.09,14781.27,4.353058,38.211604,1736.093012,-1904649.0,...,-87511480.0,-1904649.0,7243.353268,-87511480.0,-1904649.0,7243.353268,-87511480.0,-1905704.0,6270.116662,-87719300.0
Upstream Suburban Philadelphia,26845.84,275625.5,16346.26,22138110.0,48368.97,4363.898,0.0,0.0,0.0,-231002.0,...,-2662077.0,-231341.5,3605.218366,-2698496.0,-231546.8,3583.513066,-2716058.0,-231704.2,3454.48238,-2723706.0
drb,1944356.0,38356590.0,2862494.0,1056250000.0,22508920.0,1851465.0,20.453044,189.391138,7690.856153,-17342480.0,...,-739945600.0,-17343220.0,407922.417455,-740098800.0,-17343470.0,407896.675012,-740111000.0,-17355010.0,396780.776753,-740955300.0


Save `cluster_load_noFA_df` to CSV at end, to import into Excel for PA1-style tally.

### Sum by HUC8 in DRB

In [147]:
# initialize GDF
huc08_load_gdf = huc08_outlets_drwi_gdf.copy()

In [148]:
catch_loads_gdf[catch_loads_gdf.in_drb].huc08.unique()

['02040102', '02040105', '02040101', '02040104', '02040103', ..., '02040204', '02040203', '02040207', '02040303', '02040206']
Length: 14
Categories (17, object): ['02040101', '02040102', '02040103', '02040104', ..., '02040301', '02040302', '02040303', '02040304']

In [149]:
# Sum loads by HUC08

columns = columns_to_aggregate.copy()
columns.append('huc08')

for column in columns_to_aggregate:
    columns = [column, 'huc08']
    huc08_load_gdf[column] = catch_loads_gdf.loc[:,
        columns
    ].groupby('huc08', observed=True).sum()

In [150]:
huc08_load_gdf

Unnamed: 0_level_0,huc08_name,geometry,comid,nord,in_drb,catchment_hectares,tn_load,tp_load,tss_load,tn_load_ps,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
huc08,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2040101,Upper Delaware,"POLYGON ((-8304228.499 5229843.998, -8304203.8...",2619256,73297,True,308103.3782,931586.7,75654.781768,45126140.0,60286.11,...,-239499800.0,-4388051.0,-21985.266821,-239503100.0,-4388051.0,-21985.266821,-239503100.0,-4388481.0,-22314.174804,-239538200.0
2040102,East Branch Delaware,"POLYGON ((-8294284.604 5213730.686, -8294297.2...",1752159,74523,True,217471.8713,485223.7,36614.472818,25146940.0,4977.769,...,-175753600.0,-3231999.0,-31946.03817,-175753600.0,-3231999.0,-31946.03817,-175753600.0,-3231999.0,-31946.03817,-175753600.0
2040103,Lackawaxen,"POLYGON ((-8395173.834 5131609.270, -8395131.1...",2741462,72447,True,154757.2607,457724.9,51232.942978,34093490.0,72685.99,...,-108871300.0,-2256668.0,-5112.122398,-108871300.0,-2256668.0,-5112.122398,-108871300.0,-2257697.0,-5994.714921,-108956600.0
2040104,Middle Delaware-Mongaup-Brodhead,"POLYGON ((-8290255.809 5165720.405, -8290181.2...",4154510,70222,True,395876.6918,1257237.0,91180.327491,75432830.0,123149.3,...,-290278100.0,-5623528.0,-53685.713435,-290278100.0,-5623528.0,-53685.713435,-290278100.0,-5624014.0,-54165.002565,-290328400.0
2040105,Middle Delaware-Musconetcong,"POLYGON ((-8318518.543 5039392.409, -8318515.1...",4481949,68818,True,351714.7924,3743783.0,311996.724363,190724200.0,825039.7,...,-134189900.0,-3104407.0,101766.232517,-137885200.0,-3104741.0,101587.296866,-137943000.0,-3107316.0,98979.561685,-138079700.0
2040106,Lehigh,"POLYGON ((-8394794.527 5054626.737, -8394728.2...",4188251,74985,True,352414.9584,3789318.0,412013.41698,166717800.0,1410544.0,...,-158843200.0,-3637378.0,92734.203555,-158955200.0,-3637629.0,92708.461112,-158967400.0,-3640619.0,89812.70705,-159431800.0
2040201,Crosswicks-Neshaminy,"POLYGON ((-8361112.101 4923606.956, -8361020.3...",4485575,68274,True,140175.697,2344610.0,265751.052554,125500200.0,1285694.0,...,-3994127.0,-1333883.0,34853.344856,-3994127.0,-1333883.0,34853.344856,-3994127.0,-1335516.0,33280.244087,-4046398.0
2040202,Lower Delaware,"POLYGON ((-8354536.834 4895108.912, -8354450.1...",24903452,65081,True,298967.9128,17861610.0,972973.869331,326825500.0,14356300.0,...,50638920.0,-1598773.0,52827.276643,50487990.0,-1598978.0,52805.571343,50470430.0,-1600150.0,51885.148703,50422910.0
2040203,Schuylkill,"POLYGON ((-8453246.777 4995405.859, -8453181.3...",4784841,65459,True,494849.1119,9961103.0,927493.354787,344011400.0,3298815.0,...,-113130200.0,-1807541.0,328861.829684,-115793100.0,-1807541.0,328861.829684,-115793100.0,-1815061.0,321086.594324,-116581900.0
2040204,Delaware Bay,"POLYGON ((-8404100.919 4824476.609, -8404179.3...",24903800,63468,True,16366.2201,3369572.0,233305.918268,10195130.0,3353257.0,...,-4923989.0,-263056.2,-2060.502037,-4923989.0,-263056.2,-2060.502037,-4923989.0,-263056.2,-2060.502037,-4923989.0


### Sum by HUC10 in DRB

In [151]:
# initialize GDF
huc10_load_gdf = huc10_outlets_drwi_gdf.copy()

In [152]:
catch_loads_gdf[catch_loads_gdf.in_drb].huc10.unique()

['0204010203', '0204010204', '0204010205', '0204010202', '0204010201', ..., '0204020605', '0204020601', '0204020602', '0204020603', '0204020606']
Length: 87
Categories (96, object): ['0204010101', '0204010102', '0204010103', '0204010104', ..., '0204030202', '0204030203', '0204030204', '0204030301']

In [153]:
# Sum loads by HUC10

columns = columns_to_aggregate.copy()
columns.append('huc10')

for column in columns_to_aggregate:
    columns = [column, 'huc10']
    huc10_load_gdf[column] = catch_loads_gdf.loc[:,
        columns
    ].groupby('huc10', observed=True).sum()

In [154]:
huc10_load_gdf

Unnamed: 0_level_0,huc10_name,geometry,comid,nord,in_drb,huc08,catchment_hectares,tn_load,tp_load,tss_load,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
huc10,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0204010101,Upper West Branch Delaware River,"POLYGON ((-8304262.020 5228828.467, -8304276.2...",2612826,74277,True,02040101,51001.7886,156151.719089,11983.746298,7.735084e+06,...,-3.938037e+07,-714448.812313,-3826.808168,-3.938037e+07,-714448.812313,-3826.808168,-3.938037e+07,-714448.812313,-3826.808168,-3.938037e+07
0204010102,Middle West Branch Delaware River,"POLYGON ((-8346041.487 5210211.202, -8345991.9...",2614138,74141,True,02040101,66948.2854,270226.964656,17310.924691,1.069852e+07,...,-5.114830e+07,-916646.766860,-3836.498489,-5.114830e+07,-916646.766860,-3836.498489,-5.114830e+07,-916646.766860,-3836.498489,-5.114830e+07
0204010103,Lower West Branch Delaware River,"POLYGON ((-8386125.641 5192313.974, -8386205.0...",2617290,73934,True,02040101,54778.4430,183366.645183,14383.009424,6.922483e+06,...,-4.368184e+07,-753930.185988,-2906.386557,-4.368513e+07,-753930.185988,-2906.386557,-4.368513e+07,-753969.809995,-2940.377009,-4.368862e+07
0204010104,Upper Delaware River,"POLYGON ((-8358825.391 5150856.311, -8358625.2...",2616816,73702,True,02040101,42273.9588,105670.945618,9799.261444,5.376243e+06,...,-3.367644e+07,-624383.095711,-4282.088725,-3.367644e+07,-624383.095711,-4282.088725,-3.367644e+07,-624470.543807,-4349.455579,-3.368376e+07
0204010105,Middle Delaware River,"POLYGON ((-8329217.537 5136231.813, -8329266.5...",2617486,73565,True,02040101,46119.2838,105524.568909,11408.344809,7.077806e+06,...,-3.552719e+07,-683701.364164,-3057.167081,-3.552719e+07,-683701.364164,-3057.167081,-3.552719e+07,-683755.650302,-3097.437295,-3.553119e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0204030107,Manahawkin Bay-Little Egg Harbor,"POLYGON ((-8254906.642 4818531.173, -8255286.4...",9452077,106911,False,02040301,30605.6698,174203.251506,26460.485861,6.469750e+07,...,3.642399e+07,-348235.531980,16972.728223,3.642399e+07,-348235.531980,16972.728223,3.642399e+07,-348259.592054,16968.746333,3.642200e+07
0204030201,Upper Great Egg Harbor River,"POLYGON ((-8340052.545 4836703.894, -8340050.2...",9433771,114555,False,02040302,45171.0579,282489.449394,9238.039955,1.095726e+07,...,-3.077176e+07,-488580.604060,-4765.009652,-3.077177e+07,-488580.604060,-4765.009652,-3.077177e+07,-488595.310670,-4767.911999,-3.077304e+07
0204030202,Lower Great Egg Harbor River,"POLYGON ((-8304455.903 4790038.822, -8304567.7...",9436873,114472,False,02040302,44804.0249,125495.313819,6382.086305,1.359751e+07,...,-2.779245e+07,-639309.391224,-7507.161414,-2.779245e+07,-639309.391224,-7507.161414,-2.779245e+07,-639313.458051,-7507.820969,-2.779283e+07
0204030203,Tuckahoe River,"POLYGON ((-8310209.755 4764796.667, -8309971.4...",9436881,120357,False,02040302,26424.2354,55052.604883,2520.856872,3.434352e+06,...,-2.097636e+07,-396009.093395,-5670.656102,-2.097636e+07,-396009.093395,-5670.656102,-2.097636e+07,-396009.724154,-5670.750783,-2.097641e+07


### Sum by HUC12

In [155]:
# initialize GDF
huc12_load_gdf = huc12_outlets_drwi_gdf.copy()

In [156]:
# Sum loads by HUC12

columns = columns_to_aggregate.copy()
columns.append('huc12')

for column in columns_to_aggregate:
    columns = [column, 'huc12']
    huc12_load_gdf[column] = catch_loads_gdf.loc[:,
        columns
    ].groupby('huc12', observed=True).sum()

In [157]:
huc12_load_gdf

Unnamed: 0_level_0,huc12_name,geometry,centroid_xy,comid,nord,to_huc12,outlet_comid,from_huc12s,inlet_comids,outlet_comids,...,tss_load_xsnps,tn_load_rem1,tp_load_rem1,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
020401010101,Town Brook-Headwaters West Brach Delaware River,"POLYGON ((-8303725.462 5224646.990, -8303761.0...","[-74.62155936289159, 42.387091234041016]",2612792,74293,020401010102,2612792,,,[2612792],...,-6.417165e+06,-121656.024282,-452.700667,-6.417165e+06,-121656.024282,-452.700667,-6.417165e+06,-121656.024282,-452.700667,-6.417165e+06
020401010102,Betty Brook-Headwaters West Brach Delaware River,"POLYGON ((-8315136.657 5225191.846, -8315097.2...","[-74.71393635968639, 42.38194565669812]",2612800,74290,020401010103,2612800,[020401010101],[2612792],"[2612800, 2612922]",...,-5.189872e+06,-91865.779450,-456.580400,-5.189872e+06,-91865.779450,-456.580400,-5.189872e+06,-91865.779450,-456.580400,-5.189872e+06
020401010103,Rose Brook-Headwaters West Brach Delaware River,"POLYGON ((-8323990.577 5217953.339, -8323948.6...","[-74.71097819143394, 42.330665690562654]",2612808,74288,020401010104,2612808,[020401010102],[2612800],[2612808],...,-5.406878e+06,-73532.128766,-444.403453,-5.406878e+06,-73532.128766,-444.403453,-5.406878e+06,-73532.128766,-444.403453,-5.406878e+06
020401010104,Elk Creek-Headwaters West Brach Delaware River,"POLYGON ((-8326727.279 5222215.417, -8326605.6...","[-74.82334627464569, 42.34506256688788]",2612820,74282,020401010106,2612820,[020401010103],[2612808],[2612820],...,-5.612233e+06,-90071.591553,-499.505703,-5.612233e+06,-90071.591553,-499.505703,-5.612233e+06,-90071.591553,-499.505703,-5.612233e+06
020401010105,Upper Little Delaware River,"POLYGON ((-8319654.283 5208307.086, -8319607.8...","[-74.78436638151948, 42.27096486797448]",2612842,74311,020401010106,2612842,,,[2612842],...,-1.036604e+07,-204830.986463,-1494.308562,-1.036604e+07,-204830.986463,-1494.308562,-1.036604e+07,-204830.986463,-1494.308562,-1.036604e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
020403020403,Absecon Bay,"POLYGON ((-8277929.484 4780388.338, -8278050.1...","[-74.44604887400864, 39.417944290515535]",9436627,125390,020403020408,9436627,[020403020401],[9436775],[],...,2.453020e+07,85555.289603,8859.258408,2.453020e+07,85555.289603,8859.258408,2.453020e+07,85553.812895,8859.031070,2.453008e+07
020403020404,Cape May Harbor-Cape May Inlet,"POLYGON ((-8335529.098 4723951.934, -8335439.6...","[-74.8841890564985, 38.973283432181894]",9437503,120596,020403020500,9437503,,,"[9437503, 9438907, 9438927]",...,3.640515e+06,-81971.422358,303.483304,3.640515e+06,-81971.422358,303.483304,3.640515e+06,-81976.591279,302.687506,3.640092e+06
020403020405,Great Channel-Hereford Inlet,"POLYGON ((-8320042.885 4732976.676, -8320161.1...","[-74.81118396051109, 39.05056352723319]",9438919,123313,020403020500,9438919,,,"[9438919, 9438933, 9438959, 9436483]",...,1.662392e+07,-70510.004096,2280.880034,1.662392e+07,-70510.004096,2280.880034,1.662392e+07,-70514.287783,2280.224293,1.662358e+07
020403020406,Townsend Channel-Townsends Inlet,"POLYGON ((-8313546.615 4745787.504, -8313616.0...","[-74.74367003975492, 39.137811452292325]",9436931,124744,020403020500,9436931,,,"[9436931, 9436927, 9436939]",...,8.800608e+06,-49379.408359,1378.964196,8.800608e+06,-49379.408359,1378.964196,8.800608e+06,-49380.474521,1378.794659,8.800529e+06


# Save Calculated PA2 Results

In [158]:
reach_concs_gdf.to_parquet(
    data_output_path /'reach_concs_gdf.parquet',
    compression='brotli'
)

In [159]:
%%time
# Save PA2 combined and calculated results
# NOTE:  the 'brotli' compression engine writes slower than 'gzip', 
# but decreases storage by ~35% while having similar read speeds.

# Results by COMID
reach_concs_gdf.to_parquet(data_output_path /'reach_concs_gdf.parquet',compression='brotli')
catch_loads_gdf.to_parquet(data_output_path /'catch_loads_gdf.parquet',compression='brotli')

# Aggregate by DRWI Geographies, for comparison to Pollution Assessment Stage 1 (PA1)
# As CSV files for easy import into Excel for final analysis similar to PA1.
drwi_load_df.to_csv(data_output_path /'drwi_load_all.csv')
drwi_load_noClus_df.to_csv(data_output_path /'drwi_load_noClus.csv')
drwi_load_drb_df.to_csv(data_output_path /'drwi_load_drb.csv')
cluster_load_df.to_csv(data_output_path /'cluster_loads.csv')
focusarea_load_df.sort_values('cluster').to_csv(data_output_path /'focusarea_loads_byCluster.csv')
cluster_load_noFA_df.to_csv(data_output_path /'cluster_load_noFA.csv')

# Aggregation by HUC, using Method 1 (Sum of Local Loads) similar to PA1
huc12_load_gdf.to_parquet(data_output_path /'huc12_load_gdf.parquet',compression='brotli')
huc10_load_gdf.to_parquet(data_output_path /'huc10_load_gdf.parquet',compression='brotli')
huc08_load_gdf.to_parquet(data_output_path /'huc08_load_gdf.parquet',compression='brotli')

CPU times: user 9.53 s, sys: 230 ms, total: 9.76 s
Wall time: 9.86 s
