PA2 Notebook 2b: Aggregate by Geography
===

This is notebook 2b for DRWI Pollution Assessment Stage 2 (PA2) analysis.

It reads Pollution Assessment results calculated for every NHDplus catchment (COMID) in Notebook 2, and aggregrates results over various geographies using:
- Aggreation Method 2: Attenuated reach loads accumulated through the stream network

# Installation and Setup

Carefully follow our **[Installation Instructions](README.md#get-started)**, especially including:
- Creating a virtual environment for this repository (step 3)

## Import Python Dependencies

In [1]:
from pathlib import Path
from importlib import reload

import numpy     as np
import pandas    as pd
import geopandas as gpd

import hvplot.pandas
import holoviews as hv
import geoviews as gv

In [2]:
# Custom functions for Pollution Assessment
import pollution_assessment as pa

## Set Paths


In [3]:
# Set your project directory to your local folder for your clone of this repository
project_path = Path.cwd().parent
project_path

PosixPath('/Users/aaufdenkampe/Documents/Python/pollution-assessment')

In [4]:
# Assign a path for the geographies folder.
geography_path = project_path / 'geography/'

In [5]:
# Assign a path for the data OUTPUT folder.
data_output_path = project_path / 'stage2/data_output/'

# Import Data

## Open Files for Geographies

In [6]:
%%time
# read geometry data from GeoParquet files
# huc12_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc12_outlets_drwi_gdf.parquet')
huc10_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc10_outlets_drwi_gdf.parquet')
huc08_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc08_outlets_drwi_gdf.parquet')

# new inlet data
huc12_in_outlets_drwi_gdf = gpd.read_parquet(geography_path /'huc12_in_outlets_drwi_gdf.parquet')

CPU times: user 169 ms, sys: 38.1 ms, total: 208 ms
Wall time: 201 ms


In [7]:
# This works at first, but fails later because of some change to `from_huc12s`
huc12_in_outlets_drwi_gdf.to_parquet(
    data_output_path /'test.parquet',
    engine='pyarrow',
    compression='brotli',
)

In [8]:
huc12_in_outlets_drwi_gdf.from_huc12s

huc12
020401010101              None
020401010102    [020401010101]
020401010103    [020401010102]
020401010104    [020401010103]
020401010105              None
                     ...      
020403020403    [020403020401]
020403020404              None
020403020405              None
020403020406              None
020403020407              None
Name: from_huc12s, Length: 481, dtype: object

In [9]:
huc12_in_outlets_drwi_gdf.info()
huc12_in_outlets_drwi_gdf.head(5)

<class 'geopandas.geodataframe.GeoDataFrame'>
CategoricalIndex: 481 entries, 020401010101 to 020403020407
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   huc12_name     481 non-null    object  
 1   geometry       481 non-null    geometry
 2   centroid_xy    481 non-null    object  
 3   comid          481 non-null    Int64   
 4   nord           481 non-null    Int64   
 5   to_huc12       481 non-null    category
 6   outlet_comid   481 non-null    Int64   
 7   from_huc12s    231 non-null    object  
 8   inlet_comids   231 non-null    object  
 9   outlet_comids  481 non-null    object  
 10  huc10          481 non-null    category
 11  huc08          481 non-null    category
 12  in_drb         481 non-null    boolean 
dtypes: Int64(3), boolean(1), category(3), geometry(1), object(5)
memory usage: 72.9+ KB


Unnamed: 0_level_0,huc12_name,geometry,centroid_xy,comid,nord,to_huc12,outlet_comid,from_huc12s,inlet_comids,outlet_comids,huc10,huc08,in_drb
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
20401010101,Town Brook-Headwaters West Brach Delaware River,"POLYGON ((-8303725.462 5224646.990, -8303761.0...","[-74.62155936289159, 42.387091234041016]",2612792,74293,20401010102,2612792,,,[2612792],204010101,2040101,True
20401010102,Betty Brook-Headwaters West Brach Delaware River,"POLYGON ((-8315136.657 5225191.846, -8315097.2...","[-74.71393635968639, 42.38194565669812]",2612800,74290,20401010103,2612800,[020401010101],[2612792],"[2612800, 2612922]",204010101,2040101,True
20401010103,Rose Brook-Headwaters West Brach Delaware River,"POLYGON ((-8323990.577 5217953.339, -8323948.6...","[-74.71097819143394, 42.330665690562654]",2612808,74288,20401010104,2612808,[020401010102],[2612800],[2612808],204010101,2040101,True
20401010104,Elk Creek-Headwaters West Brach Delaware River,"POLYGON ((-8326727.279 5222215.417, -8326605.6...","[-74.82334627464569, 42.34506256688788]",2612820,74282,20401010106,2612820,[020401010103],[2612808],[2612820],204010101,2040101,True
20401010105,Upper Little Delaware River,"POLYGON ((-8319654.283 5208307.086, -8319607.8...","[-74.78436638151948, 42.27096486797448]",2612842,74311,20401010106,2612842,,,[2612842],204010101,2040101,True


## Open Files from Notebook 2

In [10]:
# Results by COMID
reach_concs_gdf = gpd.read_parquet(data_output_path /'reach_concs_gdf.parquet')
# catch_loads_gdf = gpd.read_parquet(data_output_path /'catch_loads_gdf.parquet')

# Aggregation by HUC, using Method 1 (Sum of Local Loads) similar to PA1
huc12_load_gdf = gpd.read_parquet(data_output_path /'huc12_load_gdf.parquet')
huc10_load_gdf = gpd.read_parquet(data_output_path /'huc10_load_gdf.parquet')
huc08_load_gdf = gpd.read_parquet(data_output_path /'huc08_load_gdf.parquet')

# Explore HUC12 Outlets; Confirm & Correct Problems

## Create Functions to explore & correct outlets

In [11]:
# Create convenience functions
def huc12_outlet_info(huc12s: list) -> pd.DataFrame:
    columns = ['huc12_name', 'comid','outlet_comid', 'to_huc12', 'from_huc12s', 'inlet_comids', 'outlet_comids']
    problem_huc = huc12_in_outlets_drwi_gdf.loc[huc12s, columns]
    return problem_huc

def huc12_explore_comids(huc12: str) -> pd.DataFrame:
    columns = ['huc12','nord', 'nordstop','maflowv','tp_conc']
    explore_outlets = reach_concs_gdf.loc[reach_concs_gdf.huc12==huc12, columns]
    explore_outlets['tp_load'] = (explore_outlets.tp_conc * 28.3168 / 1000000) * explore_outlets.maflowv * 31557600 
    explore_outlets['nord_diff'] = explore_outlets.nordstop - explore_outlets.nord
    return explore_outlets

def huc12_reassign_outlet(huc12: str, new_outlet_comid: int) -> None:
    """Presently reassigns both `comid` and `outlet_comid`, but the later 
    should eventually be turned into a list
    """
    old_outlet_comid = huc12_in_outlets_drwi_gdf.comid[huc12]
    huc12_in_outlets_drwi_gdf.loc[huc12, 'comid'] = new_outlet_comid
    huc12_in_outlets_drwi_gdf.loc[huc12, 'outlet_comid'] = new_outlet_comid
    print(f'`outlet_comid` changed from {old_outlet_comid} to {huc12_in_outlets_drwi_gdf.comid[huc12]}')
    return None

# # This function below breaks the ability to write this field
# def huc12_reassign_from_huc12s(huc12: str, new_from_huc12s: list) -> None:
#     """Reassigns `from_huc12s` list."""
#     new_from_huc12s_array = np.array(pd.array(new_from_huc12s))
#     old_from_huc12s = huc12_in_outlets_drwi_gdf.from_huc12s[huc12]
#     huc12_in_outlets_drwi_gdf.at[problem_huc12, 'from_huc12s'] = new_from_huc12s_array
#     print(f'`outlet_comid` changed from {old_from_huc12s} to {huc12_in_outlets_drwi_gdf.from_huc12s[huc12]}')
#     return None

## Correct Outlet Errors

NOTE: This was initially done manually as most of the work below, but since Nov 14, 2023 we created a `outlet_comids` field to recognize the reality of multiple outlets, which were created in `geography/huc12_reach_analysis.ipynb`.

In [12]:
# Value counts for HUC12s with multiple outlet COMIDs
huc12_in_outlets_drwi_gdf.outlet_comids.apply(len).value_counts()

outlet_comids
1     364
2      81
3      17
4       6
5       3
7       2
9       2
6       2
24      1
17      1
13      1
0       1
Name: count, dtype: int64

#### Factory Creek-Delaware River

In [13]:
# HUC12 with potential issues
problem_huc12 = '020401010403'
huc12_outlet_info([problem_huc12])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20401010403,Factory Creek-Delaware River,2617370,2617370,20401010406,"[020401010307, 020401010402, 020401020507]","[2617290, 2616778, 1752159]","[2617370, 2617366]"


In [14]:
# The outlet should have the lowest `nord` AND the highest `maflowv`
# Which is usually also the comid with the largest difference between the nord and nordstop
huc12_explore_comids(problem_huc12).sort_values(by=['nord']).head()

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load,nord_diff
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2617370,20401010403,73807,73898,112.551,0.033094,3328.527102,91
2617366,20401010403,73899,74971,2157.252,0.015936,30720.44237,1072
2617246,20401010403,73901,73908,8.064,0.031441,226.562841,7
2617382,20401010403,73902,73908,6.772,0.034519,208.891393,6
2616728,20401010403,73903,73908,5.964,0.036863,196.460234,5


In [15]:
huc12_reassign_outlet(problem_huc12, 2617366)

`outlet_comid` changed from 2617370 to 2617366


In [16]:
# Confirm that problem ID has been changed. Should be 2617366
problem_huc12 = '020401010403'
columns = ['huc12_name', 'comid','outlet_comid', 'outlet_comids']
huc12_in_outlets_drwi_gdf.loc[problem_huc12, columns]

huc12_name       Factory Creek-Delaware River
comid                                 2617366
outlet_comid                          2617366
outlet_comids              [2617370, 2617366]
Name: 020401010403, dtype: object

#### Mingo Creek-Schuylkill River

In [17]:
problem_huc12 = '020402031006' # Mingo Creek-Schuylkill River
huc12_outlet_info([problem_huc12])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20402031006,Mingo Creek-Schuylkill River,4782489,4782489,20402031007,"[020402030702, 020402030809, 020402031002, 020...","[4781691, 4782543, 4782471, 4781739, 4782565, ...","[4782489, 4782491]"


In [18]:
# The outlet should have the lowest `nord` AND the highest `maflowv`
# Which is usually also the comid with the largest difference between the nord and nordstop
huc12_explore_comids(problem_huc12).sort_values(by=['nord']).head()

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load,nord_diff
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
4782489,20402031006,65580,65593,31.715,0.101564,2878.423913,13
4782491,20402031006,65594,67354,2948.422,0.072372,190682.332561,1760
4782493,20402031006,65595,65599,7.168,0.103776,664.725424,4
4781863,20402031006,65596,65599,7.157,0.103978,664.996838,3
4781881,20402031006,65597,65598,1.073,0.070914,67.995141,1


In [19]:
huc12_reassign_outlet(problem_huc12, 4782491)

`outlet_comid` changed from 4782489 to 4782491


#### Baxter Brook-East Branch Delaware River
NEED TO ADDRESS ELSEWHERE

In [20]:
problem_huc12 = '020401020503'  # Baxter Brook-East Branch Delaware River, where 020401020204 Lower Beaver Kill" really joins after outlet
# remove it from `from_huc12s`
huc12_outlet_info([problem_huc12,'020401020204', '020401020502'])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20401020503,Baxter Brook-East Branch Delaware River,1752109,1752109,20401020505,"[020401020204, 020401020502]","[1752843, 1749245]",[1752109]
20401020204,Lower Beaver Kill,1752843,1752843,20401020503,"[020401020202, 020401020203]","[1750423, 1752035]",[1752843]
20401020502,Trout Brook-East Branch Delaware River,1749245,1749245,20401020503,"[020401020405, 020401020501]","[1748773, 1748603]","[1749245, 1749243]"


In [21]:
# # This function below breaks the ability to write this field
# def huc12_reassign_from_huc12s(huc12: str, new_from_huc12s: list) -> None:
#     """Reassigns `from_huc12s` list."""
#     new_from_huc12s_array = np.array(pd.array(new_from_huc12s))
#     print(type(new_from_huc12s_array))
#     old_from_huc12s = huc12_in_outlets_drwi_gdf.from_huc12s[huc12]
#     print(type(old_from_huc12s))
#     # huc12_in_outlets_drwi_gdf.at[problem_huc12, 'from_huc12s'] = new_from_huc12s_array
#     print(f'`outlet_comid` changed from {old_from_huc12s} to {huc12_in_outlets_drwi_gdf.from_huc12s[huc12]}')
#     return None
# # DECIDING TO NOT IMPLEMENT

In [22]:
huc12_in_outlets_drwi_gdf.at[problem_huc12, 'from_huc12s']

array(['020401020204', '020401020502'], dtype=object)

In [23]:
huc12_in_outlets_drwi_gdf.at[problem_huc12, 'from_huc12s']

array(['020401020204', '020401020502'], dtype=object)

In [24]:
# New `from_huc12s` list, dropping 020401020204
# huc12_reassign_from_huc12s(problem_huc12, ['020401020502'])

In [25]:
huc12_in_outlets_drwi_gdf.at[problem_huc12, 'from_huc12s']

array(['020401020204', '020401020502'], dtype=object)

#### Beers Brook-Middle West Branch Delaware River

In [26]:
problem_huc12 = '020401010205' # Beers Brook-Middle West Branch Delaware River
huc12_outlet_info([problem_huc12])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20401010205,Beers Brook-Middle West Branch Delaware River,2614074,2614074,20401010207,[020401010204],[2613528],"[2614074, 2614072]"


In [27]:
huc12_reassign_outlet(problem_huc12, 2614072)

`outlet_comid` changed from 2614074 to 2614072


#### Lower Pepacton Reservoir
has too much flow coming from 020401020403 Upper Pepacton Reservoir

NEED TO ADDRESS ELSEWHERE

In [28]:
problem_huc12 = '020401020405'  # Lower Pepacton Reservoir, has too much flow coming from 020401020403 Upper Pepacton Reservoir
# remove it from `from_huc12s`
huc12_outlet_info([problem_huc12])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20401020405,Lower Pepacton Reservoir,1748773,1748773,20401020502,"[020401020403, 020401020404]","[1748759, 1748767]",[1748773]


In [29]:
# Outlets flows and loads
comids = huc12_outlet_info([problem_huc12]).outlet_comids.at[problem_huc12]
explore_outlets = reach_concs_gdf.loc[comids,['huc12', 'nord', 'nordstop', 'maflowv','tp_conc']]
explore_outlets['tp_load'] = (explore_outlets.tp_conc * 28.3168 / 1000000) * explore_outlets.maflowv * 31557600
explore_outlets

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1748773,20401020405,74873,74940,176.197,0.086662,13645.132081


In [30]:
# Inlet flows and loads
comids = huc12_outlet_info([problem_huc12]).inlet_comids.at[problem_huc12]
explore_outlets = reach_concs_gdf.loc[comids,['huc12', 'nord', 'nordstop', 'maflowv','tp_conc']]
explore_outlets['tp_load'] = (explore_outlets.tp_conc * 28.3168 / 1000000) * explore_outlets.maflowv * 31557600
explore_outlets

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1748759,20401020403,74881,74923,530.907,0.023328,11067.578031
1748767,20401020404,74933,74940,63.414,0.031436,1781.391309


In [31]:
# New `from_huc12s` list, dropping 020401020403
# huc12_reassign_from_huc12s(problem_huc12, [ '020401020404'])

#### Drawyer Creek-Appoquinimink River
An example of multiple outlets to Delaware River Mainstem

In [32]:
problem_huc12 = '020402050802'	# Drawyer Creek-Appoquinimink River	
huc12_outlet_info([problem_huc12])

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,to_huc12,from_huc12s,inlet_comids,outlet_comids
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
20402050802,Drawyer Creek-Appoquinimink River,24902810,24902810,20402040000,,,"[24902810, 24902842, 4653798, 4653372]"


In [33]:
huc12_explore_comids(problem_huc12).sort_values(by=['nord']).head(15)

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load,nord_diff
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
24902810,20402050802,77094,77103,2.274,0.014579,29.62454,9
4653296,20402050802,77095,77103,0.508,0.036223,16.443439,8
4653298,20402050802,77096,77102,0.426,0.052472,19.975091,6
4653294,20402050802,77097,77099,0.244,0.082167,17.91578,2
4653302,20402050802,77098,77098,0.003,0.014725,0.039475,0
4653304,20402050802,77099,77099,0.229,0.100167,20.497784,0
4653292,20402050802,77100,77102,0.168,0.047151,7.078682,2
4653290,20402050802,77101,77101,0.004,0.014488,0.051788,0
4653288,20402050802,77102,77102,0.031,0.014198,0.393307,0
4653306,20402050802,77103,77103,0.069,0.014391,0.887315,0


In [34]:
# Use new multi-outlet `outlet_comids` field
huc12_outlet_info([problem_huc12]).outlet_comids.at[problem_huc12]

array([24902810, 24902842,  4653798,  4653372])

In [35]:
# This demonstrates that outlets are not always unique nor the largest
comids = huc12_outlet_info([problem_huc12]).outlet_comids.at[problem_huc12]
explore_outlets = reach_concs_gdf.loc[comids,['huc12', 'nord', 'nordstop', 'maflowv','tp_conc']]
explore_outlets['tp_load'] = (explore_outlets.tp_conc * 28.3168 / 1000000) * explore_outlets.maflowv * 31557600
explore_outlets

Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
24902810,20402050802,77094,77103,2.274,0.014579,29.62454
24902842,20402050802,77105,77205,51.71,0.004505,208.159745
4653798,20402050802,77206,77220,0.753,,
4653372,20402050802,77224,77226,0.222,,


In [36]:
# confirm sum over NaN
explore_outlets.tp_load.sum()

237.7842845314774

In [37]:
# This works at first, but fails later because of some change to `from_huc12s`
huc12_in_outlets_drwi_gdf.to_parquet(
    data_output_path /'test.parquet',
    engine='pyarrow',
    # compression='brotli',
)

# Aggregate Attenuated Results by Geography

# Method 2: Aggregate Attenuated reach loads accumulated through the stream network

Back-calculate cumulative attenuated stream reach loads (kg/y) from excess & remaining average annual concentrations (mg/L) and mean annual flow (CFS)), using:

```
tp_load_atten = (tp_conc * 28.3168 / 1000000) * (maflowv * 31557600)
```

Where:  
- 28.3168 liters in a cubic foot
- 1000000 mg in a kg
- 31557600 = 365.25 * 24 * 60 * 60 seconds per year

## HUC12 Attenuated Outlet Loads
This method requires aggregating over hydrologically defined geographies, such as HUCs. We've decided to do this analysis by HUC12.

### Get Concentrations at HUC12 outlets

In [38]:
# Old way, considering only one outlet
# Select results at HUC12 outlets, and change index to HUC12

# reach_concs_huc12_gdf = reach_concs_gdf.loc[
#     huc12_in_outlets_drwi_gdf.comid.dropna()
# ]
# reach_concs_huc12_gdf.reset_index(inplace=True)
# reach_concs_huc12_gdf.set_index('huc12', inplace=True)

# Remove COMID geometries

# reach_concs_huc12_gdf.drop('geometry', axis='columns', inplace=True)

In [39]:
# New way, considering multiple outlets
# First, get single list of COMIDs that are HUC12 outlets
huc12_outlets_combined = np.concatenate(huc12_in_outlets_drwi_gdf.outlet_comids.values)
huc12_outlets_combined[:20]


array([2612792, 2612800, 2612922, 2612808, 2612820, 2612842, 2612826,
       2612944, 2613460, 2613464, 2613462, 2613498, 2613528, 2614074,
       2614072, 2614122, 2614138, 2613804, 2613828, 2613826])

In [40]:
# Confirm no duplicates
huc12_outlets_combined.size == np.unique(huc12_outlets_combined).size

True

In [41]:
# Second, select reach concentrations at outlets
reach_concs_huc12_outlets_gdf = reach_concs_gdf.loc[
    huc12_outlets_combined
].copy(deep=True)

### Back-Calculate Attenuated Loads at Outlets

Using Method 2 equation in header above.

In [42]:
# Back calculate Loads (kg/y) from average annual concentrations (mg/L) 
# and mean annual flow (CFS))
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        reach_concs_huc12_outlets_gdf[f'{pollutant}_load{suffix}'] = (
            (reach_concs_huc12_outlets_gdf[f'{pollutant}_conc{suffix}'] * 28.3168 / 1000000)
            * reach_concs_huc12_outlets_gdf.maflowv * 31557600
        )

In [43]:
# Calc and add load reductions
for pollutant in ['tn', 'tp', 'tss']:
    reach_concs_huc12_outlets_gdf[f'{pollutant}_load_red3'] = (
        reach_concs_huc12_outlets_gdf[f'{pollutant}_load_xsnps'] 
        - reach_concs_huc12_outlets_gdf[f'{pollutant}_load_rem3']
    )

In [44]:
reach_concs_huc12_outlets_gdf.info()
reach_concs_huc12_outlets_gdf.columns

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 714 entries, 2612792 to 26814135
Data columns (total 72 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   watershed_hectares  714 non-null    float64 
 1   maflowv             714 non-null    float64 
 2   geometry            714 non-null    geometry
 3   cluster             596 non-null    category
 4   sub_focusarea       16 non-null     Int64   
 5   nord                714 non-null    Int64   
 6   nordstop            714 non-null    Int64   
 7   huc12               714 non-null    category
 8   streamorder         714 non-null    int64   
 9   headwater           714 non-null    int64   
 10  phase               103 non-null    category
 11  fa_name             103 non-null    category
 12  in_drb              714 non-null    boolean 
 13  huc08               714 non-null    category
 14  huc10               714 non-null    category
 15  into_dr             714 no

Index(['watershed_hectares', 'maflowv', 'geometry', 'cluster', 'sub_focusarea',
       'nord', 'nordstop', 'huc12', 'streamorder', 'headwater', 'phase',
       'fa_name', 'in_drb', 'huc08', 'huc10', 'into_dr', 'Source', 'Sediment',
       'TotalN', 'TotalP', 'run_group', 'run_type', 'funding_sources',
       'with_attenuation', 'tn_conc', 'tp_conc', 'tss_conc', 'tn_conc_xs',
       'tp_conc_xs', 'tss_conc_xs', 'tn_conc_ps', 'tp_conc_ps', 'tss_conc_ps',
       'tn_conc_xsnps', 'tp_conc_xsnps', 'tss_conc_xsnps', 'tn_conc_rem1',
       'tp_conc_rem1', 'tss_conc_rem1', 'tn_conc_rem2', 'tp_conc_rem2',
       'tss_conc_rem2', 'tn_conc_rem3', 'tp_conc_rem3', 'tss_conc_rem3',
       'tn_conc_avoid', 'tp_conc_avoid', 'tss_conc_avoid', 'tn_load',
       'tp_load', 'tss_load', 'tn_load_ps', 'tp_load_ps', 'tss_load_ps',
       'tn_load_xsnps', 'tp_load_xsnps', 'tss_load_xsnps', 'tn_load_rem1',
       'tp_load_rem1', 'tss_load_rem1', 'tn_load_rem2', 'tp_load_rem2',
       'tss_load_rem2', 'tn_load_

### Sum HUC12 Outlet loads

In [45]:
# Sum over all outlet loads
# First list columns that shouldn't be summed
columns_to_drop = ['watershed_hectares', 'geometry', 
    'cluster', 'sub_focusarea', 'nord', 'nordstop', 'streamorder', 'headwater', 'phase', 'fa_name',
    'in_drb', 'huc08', 'huc10', 'into_dr', 'Source', 'Sediment', 'TotalN',
    'TotalP', 'run_group', 'run_type', 'funding_sources',
    'with_attenuation', 'tn_conc', 'tp_conc', 'tss_conc', 'tn_conc_xs',
    'tp_conc_xs', 'tss_conc_xs', 'tn_conc_ps', 'tp_conc_ps', 'tss_conc_ps',
    'tn_conc_xsnps', 'tp_conc_xsnps', 'tss_conc_xsnps', 'tn_conc_rem1',
    'tp_conc_rem1', 'tss_conc_rem1', 'tn_conc_rem2', 'tp_conc_rem2',
    'tss_conc_rem2', 'tn_conc_rem3', 'tp_conc_rem3', 'tss_conc_rem3',
    'tn_conc_avoid', 'tp_conc_avoid', 'tss_conc_avoid',
]
# Convert from GDF to DF, by dropping geometry (and othe non-summable columns)
temp_df = reach_concs_huc12_outlets_gdf.drop(columns_to_drop, axis=1)

In [46]:
temp_sum_df = temp_df.groupby(by='huc12', observed=False).sum()
temp_sum_df.info()

<class 'pandas.core.frame.DataFrame'>
CategoricalIndex: 484 entries, 020401010101 to 020403030101
Data columns (total 25 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   maflowv         484 non-null    float64
 1   tn_load         484 non-null    float64
 2   tp_load         484 non-null    float64
 3   tss_load        484 non-null    float64
 4   tn_load_ps      484 non-null    float64
 5   tp_load_ps      484 non-null    float64
 6   tss_load_ps     484 non-null    float64
 7   tn_load_xsnps   484 non-null    float64
 8   tp_load_xsnps   484 non-null    float64
 9   tss_load_xsnps  484 non-null    float64
 10  tn_load_rem1    484 non-null    float64
 11  tp_load_rem1    484 non-null    float64
 12  tss_load_rem1   484 non-null    float64
 13  tn_load_rem2    484 non-null    float64
 14  tp_load_rem2    484 non-null    float64
 15  tss_load_rem2   484 non-null    float64
 16  tn_load_rem3    484 non-null    float64
 17  tp_load_r

### Create HUC12 Outlet Concs GDF

In [47]:
# Create dataframe for HUC12 loads (blank for now)
# by joining HUC12 metadata with summed outlet loads
huc12_outlet_loads_gdf = huc12_in_outlets_drwi_gdf.copy(deep=True).join(temp_sum_df)

In [48]:
# Index resort and reset as Category
huc12_outlet_loads_gdf.sort_index(inplace=True)
huc12_outlet_loads_gdf.index = huc12_outlet_loads_gdf.index.astype(
    pd.CategoricalDtype(ordered=True), copy=True
)

In [49]:
huc12_outlet_loads_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
CategoricalIndex: 481 entries, 020401010101 to 020403020407
Data columns (total 38 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   huc12_name      481 non-null    object  
 1   geometry        481 non-null    geometry
 2   centroid_xy     481 non-null    object  
 3   comid           481 non-null    Int64   
 4   nord            481 non-null    Int64   
 5   to_huc12        481 non-null    category
 6   outlet_comid    481 non-null    Int64   
 7   from_huc12s     231 non-null    object  
 8   inlet_comids    231 non-null    object  
 9   outlet_comids   481 non-null    object  
 10  huc10           481 non-null    category
 11  huc08           481 non-null    category
 12  in_drb          481 non-null    boolean 
 13  maflowv         481 non-null    float64 
 14  tn_load         481 non-null    float64 
 15  tp_load         481 non-null    float64 
 16  tss_load        481 non-null

### Calculate net loads

#### Explore how to build function

In [50]:
# Create small test dataframe
test_gdf = huc12_outlet_loads_gdf[['from_huc12s', 'tp_load_rem3']].head(10)
test_gdf

Unnamed: 0_level_0,from_huc12s,tp_load_rem3
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1
20401010101,,-2425.331527
20401010102,[020401010101],-4120.60599
20401010103,[020401010102],-6135.8445
20401010104,[020401010103],-8242.231184
20401010105,,-5253.900561
20401010106,"[020401010104, 020401010105]",-11043.478466
20401010201,[020401010106],-19688.257984
20401010202,[020401010201],-25373.353613
20401010203,,-2276.59186
20401010204,"[020401010202, 020401010203]",-32829.938508


In [51]:
# Note that unused categories cause a `KeyError` in the lambda function
test_gdf.index

CategoricalIndex(['020401010101', '020401010102', '020401010103',
                  '020401010104', '020401010105', '020401010106',
                  '020401010201', '020401010202', '020401010203',
                  '020401010204'],
                 categories=['020401010101', '020401010102', '020401010103', '020401010104', ..., '020403020404', '020403020405', '020403020406', '020403020407'], ordered=True, dtype='category', name='huc12')

In [52]:
# Remove unused categories to avoid the `KeyError`
test_gdf.index = test_gdf.index.remove_unused_categories()
# test_gdf.huc12 = test_gdf.huc12.cat.remove_unused_categories() 

In [53]:
# Create test functions
def get_inlet_loads_test(huc12: str) -> list:
    """Fetches list of HUC12s that flow into a HUC12, if any."""
    if type(test_gdf.from_huc12s[huc12]) == np.ndarray:
        ds = test_gdf.tp_load_rem3[test_gdf.from_huc12s[huc12]]
    else:
        ds = []
    return ds

def calc_net_load_test(huc12: str) -> float:
    """Calculates the net load of a HUC12, by subtracting inflow loads from outflow load."""
    net = (test_gdf.tp_load_rem3[huc12]
        - sum(get_inlet_loads_test(huc12))
    )
    return net

In [54]:
# Apply function
test_gdf['tp_load_rem3_net'] = test_gdf.index.to_series().apply(lambda huc12: calc_net_load_test(huc12))
test_gdf

Unnamed: 0_level_0,from_huc12s,tp_load_rem3,tp_load_rem3_net
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
20401010101,,-2425.331527,-2425.331527
20401010102,[020401010101],-4120.60599,-1695.274463
20401010103,[020401010102],-6135.8445,-2015.23851
20401010104,[020401010103],-8242.231184,-2106.386684
20401010105,,-5253.900561,-5253.900561
20401010106,"[020401010104, 020401010105]",-11043.478466,2452.653278
20401010201,[020401010106],-19688.257984,-8644.779517
20401010202,[020401010201],-25373.353613,-5685.095629
20401010203,,-2276.59186,-2276.59186
20401010204,"[020401010202, 020401010203]",-32829.938508,-5179.993036


#### Create function to find row with huc in `from_huc12s`

In [55]:
# Issue where a huc in `from_huc12s` isn't in the index
any(huc12_outlet_loads_gdf.index.isin(['020403010404'])) # Value from early error below

False

In [56]:
huc12 = '020401010106' # two inlets
# huc12 = '020403010405' # problem row, where item in `from_huc12s` isn't in the index
huc12_outlet_loads_gdf.from_huc12s[huc12]

array(['020401010104', '020401010105'], dtype=object)

In [57]:
huc12 = '020401010106' # two inlets
from_huc12 = '020401010104' # in from_huc12 for '020401010106' two inlets

x = huc12_outlet_loads_gdf.from_huc12s[huc12]
np.isin(x, from_huc12)

array([ True, False])

In [58]:
def is_fromhuc_in_fromhucs(from_huc12s: np.ndarray, from_huc12: str) -> bool:
    if type(from_huc12s) == np.ndarray:
        inarray = any(np.isin(from_huc12s,from_huc12))
    else:
        inarray = False
    return inarray

In [59]:
# Test function on a single good value
huc12 = '020401010106' # two inlets
from_huc12 = '020401010104' # in from_huc12 for '020401010106' two inlets

is_fromhuc_in_fromhucs(huc12_outlet_loads_gdf.from_huc12s[huc12], from_huc12 )

True

In [60]:
def are_fromhucs_in_index(from_huc12s: np.ndarray) -> np.ndarray[bool]:
    if type(from_huc12s) == np.ndarray:
        inarray = np.isin(from_huc12s, huc12_outlet_loads_gdf.index)
    else:
        inarray = False
    return inarray

In [61]:
# Test function on a single good value
huc12 = '020401010106' # two inlets
are_fromhucs_in_index(huc12_outlet_loads_gdf.from_huc12s[huc12])

array([ True,  True])

In [62]:
# Apply function to a dataframe to find the problem row
from_huc12 = '020403010404' # Not in index
fromwhere = huc12_outlet_loads_gdf.from_huc12s.apply(lambda x: is_fromhuc_in_fromhucs(x,from_huc12))
fromwhere[fromwhere == True]

# this stopped working after I applied my `huc12_reassign_from_huc12s()` to a record

huc12
020403010405    True
Name: from_huc12s, dtype: bool

In [63]:
#So this function fails on this row
# get_inlet_loads('tp','_rem3',from_huc12)

In [64]:
# Test function again on a bad value
huc12 = '020403010405' # problem row
are_fromhucs_in_index(huc12_outlet_loads_gdf.from_huc12s[huc12])

array([False])

In [65]:
# This is the problematic row
huc12_outlet_loads_gdf.loc['020403010405'].head(13)

huc12_name                               Forked River-Barnegat Bay
geometry         POLYGON ((-8246979.924288888 4846943.997902947...
centroid_xy                 [-74.1967227753088, 39.83119001643217]
comid                                                     26812379
nord                                                        121561
to_huc12                                              020403010407
outlet_comid                                              26812379
from_huc12s                                         [020403010404]
inlet_comids                                             [9443475]
outlet_comids    [26812379, 9454971, 9444453, 9444107, 9445131,...
huc10                                                   0204030104
huc08                                                     02040301
in_drb                                                       False
Name: 020403010405, dtype: object

In [66]:
# Try droping a single row, to see if that solves the problem
# huc12_outlet_loads_gdf.drop('020403010405', inplace=True)

But that just caused a new error:  
`KeyError: "['020403010405'] not in index"`

In [67]:
# Test approach to not search for problem row
huc12 = '020401010106' # two inlets
df = huc12_outlet_loads_gdf
from_huc12s_array = df.from_huc12s[huc12]
from_huc12s_mask = are_fromhucs_in_index(from_huc12s_array)
ds = df['tp_load_rem3'][from_huc12s_array[from_huc12s_mask]]
ds

huc12
020401010104   -8242.231184
020401010105   -5253.900561
Name: tp_load_rem3, dtype: float64

In [68]:
from_huc12s_mask = are_fromhucs_in_index(from_huc12s_array)
from_huc12s_mask

array([ True,  True])

In [69]:
test_mask = [True, False]
test_mask

[True, False]

In [70]:
from_huc12s_array[test_mask]

array(['020401010104'], dtype=object)

#### Create functions for net load

In [82]:
def get_inlet_loads(var: str, huc12: str) -> list:
    """Fetches list of HUC12s that flow into a HUC12, if any."""
    df = huc12_outlet_loads_gdf
    # var = f'{pollutant}_load{var_suffix}'
    if type(df.from_huc12s[huc12]) == np.ndarray:
        from_huc12s_array = df.from_huc12s[huc12]
        from_huc12s_mask = are_fromhucs_in_index(from_huc12s_array)
        ds = df[var][from_huc12s_array[from_huc12s_mask]]
    else:
        ds = []
    return ds

def calc_net_load(var: str, huc12: str) -> float:
    """Calculates the net load of a HUC12, by subtracting inflow loads from outflow load."""
    df = huc12_outlet_loads_gdf
    # var = f'{pollutant}_load{var_suffix}'
    net = (df[var][huc12]
        - sum(get_inlet_loads(var, huc12))
    )
    return net

In [83]:
# Confirm functions work

# huc12 = '020401010101' # no inlets
# huc12 = '020401010106' # two inlets
huc12 = '020403010405' # problem row

pollutants = ['tn', 'tp', 'tss']
suffixes = ['', '_xsnps', '_rem1', '_rem2', '_rem3']

var = f'{pollutants[1]}_load{suffixes[4]}'
print(var)

get_inlet_loads(var, huc12)

tp_load_rem3


Series([], Name: tp_load_rem3, dtype: float64)

In [84]:
# Confirm functions work
calc_net_load(var, huc12)

-2201.5973715144096

In [132]:
problem_huc12 = '020401020405'  # Lower Pepacton Reservoir, has too much flow coming from 020401020403 Upper Pepacton Reservoir
var = 'maflowv'
calc_net_load(var, problem_huc12)

-418.124

In [135]:
_.dtype

dtype('float64')

#### Apply functions to add net attenuated loads

In [85]:
# Remove unused categories to avoid the `KeyError`
huc12_outlet_loads_gdf.index = huc12_outlet_loads_gdf.index.remove_unused_categories()
# huc12_outlet_loads_gdf.huc12 = huc12_outlet_loads_gdf.huc12.cat.remove_unused_categories()

In [99]:
df = huc12_outlet_loads_gdf
# Calc net pollution loads
for suffix in ['', '_xsnps', '_rem1', '_rem2', '_rem3']:
    for pollutant in ['tn', 'tp', 'tss']:
        var = f'{pollutant}_load{suffix}'
        df[f'{var}_net'] = df.index.to_series().apply(
            lambda huc12: calc_net_load(var, huc12)
        )

In [112]:
# Calc net flows
var = 'maflowv'
df[f'{var}_net'] = df.index.to_series().apply(
    lambda huc12: calc_net_load(var, huc12)
)

In [113]:
huc12_outlet_loads_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
CategoricalIndex: 481 entries, 020401010101 to 020403020407
Data columns (total 54 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   huc12_name          481 non-null    object  
 1   geometry            481 non-null    geometry
 2   centroid_xy         481 non-null    object  
 3   comid               481 non-null    Int64   
 4   nord                481 non-null    Int64   
 5   to_huc12            481 non-null    category
 6   outlet_comid        481 non-null    Int64   
 7   from_huc12s         231 non-null    object  
 8   inlet_comids        231 non-null    object  
 9   outlet_comids       481 non-null    object  
 10  huc10               481 non-null    category
 11  huc08               481 non-null    category
 12  in_drb              481 non-null    boolean 
 13  maflowv             481 non-null    float64 
 14  tn_load             481 non-null    float64 
 15  tp_loa

In [131]:
# This is a category for some reason!
huc12_outlet_loads_gdf.maflowv_net

huc12
020401010101    56.661
020401010102    40.396
020401010103    44.661
020401010104    46.591
020401010105    98.559
                 ...  
020403020403   -29.176
020403020404    21.541
020403020405     7.133
020403020406     7.770
020403020407     7.770
Name: maflowv_net, Length: 481, dtype: category
Categories (481, float64): [56.661 < 40.396 < 44.661 < 46.591 ... 21.541 < 7.133 < 7.770 < 7.770]

In [137]:
# Convert from Category to float
huc12_outlet_loads_gdf.maflowv_net = huc12_outlet_loads_gdf.maflowv_net.astype('float')

#### Zero-out HUC12s with negative net flows
Where there are significant water withdrawls

In [138]:
# How many HUC12s have a negative `maflowv_net`, due to water withdrawls
huc12_outlet_loads_gdf.maflowv_net.gt(0).value_counts()


maflowv_net
True     434
False     47
Name: count, dtype: int64

In [140]:
net_neg_flow_mask = huc12_outlet_loads_gdf.maflowv_net.gt(0)
net_neg_flow_mask

huc12
020401010101     True
020401010102     True
020401010103     True
020401010104     True
020401010105     True
                ...  
020403020403    False
020403020404     True
020403020405     True
020403020406     True
020403020407     True
Name: maflowv_net, Length: 481, dtype: bool

In [None]:
# replace `mainstem_mask` net loads with NaN
for suffix in ['', '_xsnps', '_rem1', '_rem2', '_rem3']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc12_outlet_loads_gdf.loc[net_neg_flow_mask,f'{pollutant}_load{suffix}_net'] = np.nan

#### Zero-out Mainstem HUC12 net loads
because they have errors due to no concentrations

##### First try manual method

In [101]:
problem_huc12s_mainstem = [
    '020401040505', # Shingle Kill-Delaware River
    '020401040504', # Twin Lakes Creek-Delaware River, mainstem
    '020402020507', # Woodbury Creek-Delaware River
    '020402040000', # Delaware Bay-Deep
    '020402060604', # Back Creek, on Delaware Bay
    '020402060605', # Dividing Creek-Oranoaken Creek, on Delaware Bay
]

In [102]:
huc12_outlet_loads_gdf.loc[problem_huc12s_mainstem, :]

Unnamed: 0_level_0,huc12_name,geometry,centroid_xy,comid,nord,to_huc12,outlet_comid,from_huc12s,inlet_comids,outlet_comids,...,tss_load_xsnps_net,tn_load_rem1_net,tp_load_rem1_net,tss_load_rem1_net,tn_load_rem2_net,tp_load_rem2_net,tss_load_rem2_net,tn_load_rem3_net,tp_load_rem3_net,tss_load_rem3_net
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20401040505,Shingle Kill-Delaware River,"POLYGON ((-8314264.699 5087170.589, -8314225.2...","[-74.74256786199639, 41.42891982519458]",4158256,71290,20401040704,4158256,"[020401040107, 020401040308, 020401040504]","[4151446, 4151522, 4151440]","[4158256, 4151524]",...,84124160.0,1694917.0,29435.667967,84124160.0,1694917.0,29435.667967,84124160.0,1694917.0,29435.668098,84124160.0
20401040504,Twin Lakes Creek-Delaware River,"POLYGON ((-8340465.281 5088469.213, -8340454.0...","[-74.90365126862747, 41.437395848232526]",4151440,72128,20401040505,4151440,"[020401010606, 020401030603, 020401040403, 020...","[2619256, 2741462, 4151396, 4149986, 4150028, ...","[4151440, 4151454]",...,693723200.0,13222110.0,247855.51883,693725000.0,13222110.0,247855.51883,693725000.0,13222530.0,248000.381002,693748200.0
20402020507,Woodbury Creek-Delaware River,"POLYGON ((-8363743.703 4849391.649, -8363922.2...","[-75.20626278968832, 39.849239337999315]",4496226,65319,20402020607,4496226,"[020402020405, 020402020503, 020402020505, 020...","[4495990, 4499300, 4496236, 4499318, 4784841]","[4496226, 4499320]",...,506439300.0,7335154.0,180884.451516,507272700.0,7335154.0,180884.693403,507272800.0,7342066.0,182465.658865,507563000.0
20402040000,Delaware Bay-Deep,"POLYGON ((-8397336.237 4835435.521, -8397411.6...","[-75.2172326134155, 39.10411522785594]",24903800,63468,20403030601,24903800,"[020402020608, 020402050505, 020402050601, 020...","[24903452, 24902758, 932040355, 24902756, 9320...",[24903800],...,251729500.0,3385168.0,86912.679837,254423300.0,3385168.0,86912.679837,254423300.0,3387635.0,87433.482928,254563400.0
20402060604,Back Creek,"POLYGON ((-8376861.027 4775147.368, -8376816.5...","[-75.24124884395003, 39.31331191893464]",24946864,78160,20402040000,24946864,[020402060303],[9486246],"[24946864, 24903786, 9484910, 9486342, 9486400...",...,19998430.0,122914.0,6849.066904,20001940.0,122914.0,6849.066904,20001940.0,122917.3,6849.071193,20001960.0
20402060605,Dividing Creek-Oranoaken Creek,"POLYGON ((-8353977.618 4767436.951, -8354002.9...","[-75.11145659337949, 39.21451081461224]",9486540,119714,20402040000,9486540,[020402060507],[9486502],"[9486540, 9486530, 9486486, 9485166, 9485200, ...",...,102275900.0,1498872.0,39240.811296,102276800.0,1498872.0,39240.811296,102276800.0,1498906.0,39240.813792,102276900.0


In [103]:
# replace `problem_huc12_mainstem` net loads with NaN
for problem_huc12_mainstem in problem_huc12s_mainstem:
    for suffix in ['', '_xsnps', '_rem1', '_rem2', '_rem3']:
        for pollutant in ['tn', 'tp', 'tss']:
            huc12_outlet_loads_gdf.at[problem_huc12_mainstem,f'{pollutant}_load{suffix}_net'] = np.nan

##### Mask by name

In [104]:
mainstem_mask = huc12_outlet_loads_gdf.huc12_name.str.contains(
    '-Delaware River|Delaware Bay-', regex=True
 )
mainstem_mask.value_counts()

huc12_name
False    438
True      43
Name: count, dtype: int64

In [105]:
huc12_outlet_loads_gdf[mainstem_mask].huc12_name

huc12
020401010403          Factory Creek-Delaware River
020401010406              Pea Brook-Delaware River
020401010501          Hankins Creek-Delaware River
020401010506        Beaverdam Creek-Delaware River
020401010604              Peggy Run-Delaware River
020401010606       Westcolang Creek-Delaware River
020401030603       Lackawaxen River-Delaware River
020401040504       Twin Lakes Creek-Delaware River
020401040505           Shingle Kill-Delaware River
020401040704          Shimers Brook-Delaware River
020401040705        Hornbecks Creek-Delaware River
020401041005       Vancampens Brook-Delaware River
020401050601        Allegheny Creek-Delaware River
020401050602          Martins Creek-Delaware River
020401050603         Buckhorn Creek-Delaware River
020401050604            Cooks Creek-Delaware River
020401050605        Lopatcong Creek-Delaware River
020401050901       Hakihokake Creek-Delaware River
020401050902    Nishisakawick Creek-Delaware River
020401050903          Tin

In [106]:
# replace `mainstem_mask` net loads with NaN
for suffix in ['', '_xsnps', '_rem1', '_rem2', '_rem3']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc12_outlet_loads_gdf.loc[mainstem_mask,f'{pollutant}_load{suffix}_net'] = np.nan

In [107]:
huc12_outlet_loads_gdf[mainstem_mask]

Unnamed: 0_level_0,huc12_name,geometry,centroid_xy,comid,nord,to_huc12,outlet_comid,from_huc12s,inlet_comids,outlet_comids,...,tss_load_xsnps_net,tn_load_rem1_net,tp_load_rem1_net,tss_load_rem1_net,tn_load_rem2_net,tp_load_rem2_net,tss_load_rem2_net,tn_load_rem3_net,tp_load_rem3_net,tss_load_rem3_net
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20401010403,Factory Creek-Delaware River,"POLYGON ((-8378221.427 5151841.655, -8378154.7...","[-75.27209008525746, 41.887597916294425]",2617366,73807,20401010406,2617366,"[020401010307, 020401010402, 020401020507]","[2617290, 2616778, 1752159]","[2617370, 2617366]",...,,,,,,,,,,
20401010406,Pea Brook-Delaware River,"POLYGON ((-8363935.141 5149815.582, -8363868.4...","[-75.16809024110287, 41.88143428771512]",2617384,73743,20401010501,2617384,"[020401010403, 020401010404, 020401010405]","[2617370, 2616756, 2616816]",[2617384],...,,,,,,,,,,
20401010501,Hankins Creek-Delaware River,"POLYGON ((-8347923.900 5144333.514, -8347799.6...","[-75.08286158289661, 41.810742013902946]",2617442,73661,20401010506,2617442,[020401010406],[2617384],[2617442],...,,,,,,,,,,
20401010506,Beaverdam Creek-Delaware River,"POLYGON ((-8355618.629 5125857.910, -8355587.8...","[-75.08496108669196, 41.72747404678263]",2617486,73565,20401010604,2617486,"[020401010501, 020401010505]","[2617442, 2617252]",[2617486],...,,,,,,,,,,
20401010604,Peggy Run-Delaware River,"POLYGON ((-8346677.744 5118043.931, -8346688.3...","[-75.05089551810381, 41.63528061697637]",2617554,73448,20401010606,2617554,"[020401010506, 020401010602, 020401010603]","[2617486, 2617038, 2617552]",[2617554],...,,,,,,,,,,
20401010606,Westcolang Creek-Delaware River,"POLYGON ((-8344238.314 5094753.275, -8344207.4...","[-75.00260610937411, 41.52355405282967]",2619256,73297,20401040504,2619256,"[020401010604, 020401010605]","[2617554, 2617202]",[2619256],...,,,,,,,,,,
20401030603,Lackawaxen River-Delaware River,"POLYGON ((-8366042.954 5094946.324, -8365951.2...","[-75.10024782330726, 41.48152054028455]",2741462,72447,20401040504,2741462,"[020401030505, 020401030601, 020401030602]","[2743218, 2742540, 2742580]",[2741462],...,,,,,,,,,,
20401040504,Twin Lakes Creek-Delaware River,"POLYGON ((-8340465.281 5088469.213, -8340454.0...","[-74.90365126862747, 41.437395848232526]",4151440,72128,20401040505,4151440,"[020401010606, 020401030603, 020401040403, 020...","[2619256, 2741462, 4151396, 4149986, 4150028, ...","[4151440, 4151454]",...,,,,,,,,,,
20401040505,Shingle Kill-Delaware River,"POLYGON ((-8314264.699 5087170.589, -8314225.2...","[-74.74256786199639, 41.42891982519458]",4158256,71290,20401040704,4158256,"[020401040107, 020401040308, 020401040504]","[4151446, 4151522, 4151440]","[4158256, 4151524]",...,,,,,,,,,,
20401040704,Shimers Brook-Delaware River,"POLYGON ((-8314992.489 5065044.482, -8314985.1...","[-74.79829255854104, 41.3112891629927]",4151706,71115,20401040705,4151706,"[020401040505, 020401040701, 020401040702, 020...","[4158256, 4150730, 4150820, 4150956]",[4151706],...,,,,,,,,,,


### Map HUC12 Attenuated Loads

In [108]:
var = 'tp_load_rem3'

def huc12_outlet_loads_plot(var: str):
    # Mask by HUC12s only in_drb
    gdf = huc12_outlet_loads_gdf[huc12_outlet_loads_gdf.in_drb]
    # remove negative values
    gdf = gdf[gdf[var]>0]
    gdf_plot = gdf[[var, gdf.geometry.name]]

    # Prep for plotting
    gdf_plot = pa.dynamic_plot.prep_gdf(gdf_plot)
    # Remove negative & NaN

    huc_plot = gv.Polygons(
        gdf_plot, 
        vdims=[gdf.index.name, var], 
    ).opts(
        height = 600,
        width = 600,
        color = var,
        colorbar = True,
        cmap = 'cet_CET_L18',
        # cnorm = 'log',
        # clim = (kwargs['vmin'], kwargs['vmax']),
        # line_width = kwargs['line_width'],
        title = var,
        tools = ['hover']
    )
    return huc_plot

In [109]:
# Skip for shorter run time
var = 'tp_load_rem3'
huc12_outlet_loads_plot(var) * gv.tile_sources.CartoLight()

In [110]:
var = 'tp_load_rem3_net'
huc12_outlet_loads_plot(var) * gv.tile_sources.CartoLight()

#### Check HUC12s for wrong outlets
Using functions developed in Explore HUC12 Outlets section, above.
If an error is found, go back to to top to reapply and rerun notebook.

In [1]:
# HUC12 with potential issues
problem_huc12s = [
    # '020402031007'
    # '020401010403'  # Factory Creek-Delaware River
    # '020402031006' # Mingo Creek-Schuylkill River
    # '020401020503'  # Baxter Brook-East Branch Delaware River, where "020401020204 Lower Beaver Kill" really joins after outlet
    # '020401010205' # Beers Brook-Middle West Branch Delaware River
    '020401020405'  # Lower Pepacton Reservoir, has too much flow coming from 020401020403 Upper Pepacton Reservoir
    # '020401040505' # Shingle Kill-Delaware River, zero-out net loads b/c mainstem so has NaN concentrations in major inflows
    # '020401040504' # Twin Lakes Creek-Delaware River, mainstem
    # '020402040000', # Delaware Bay-Deep
    # '020402020507', # Woodbury Creek-Delaware River
    # '020402060605', # Dividing Creek-Oranoaken Creek
    # '020401020202', # Middle Beaver Kill
]
huc12_outlet_info(problem_huc12s)

NameError: name 'huc12_outlet_info' is not defined

In [87]:
# The outlet should have: lowest `nord`; the highest `maflowv`; largest difference between nord and nordstop
explore_huc12 = problem_huc12s[0]
print(huc12_outlet_info([explore_huc12]).huc12_name.values)
huc12_explore_comids(explore_huc12).sort_values(by=['nord'])

['Middle Beaver Kill']


Unnamed: 0_level_0,huc12,nord,nordstop,maflowv,tp_conc,tp_load,nord_diff
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1750423,20401020202,74746,74810,237.667,0.011584,2460.170219,64
1750631,20401020202,74747,74810,233.281,0.009856,2054.572111,63
1750627,20401020202,74748,74750,0.157,0.029214,4.098637,2
1750391,20401020202,74749,74750,0.151,0.030221,4.077867,1
1750687,20401020202,74750,74750,0.003,0.011262,0.03019,0
1750629,20401020202,74751,74810,232.51,0.010116,2101.760549,59
1750579,20401020202,74752,74810,232.481,0.010118,2102.026432,58
1750389,20401020202,74753,74808,208.657,0.008699,1622.051948,55
1750623,20401020202,74754,74808,205.437,0.008853,1625.304043,54
1750409,20401020202,74756,74757,2.468,0.026542,58.536833,1


In [88]:
huc12s = [problem_huc12s[0], '020401020104', '020401020201']
columns = ['huc12_name', 'comid', 'outlet_comid','nord', 'maflowv', 'tp_load', 'tp_load_ps', 'tp_load_xsnps', 'tp_load_net', 'tp_load_xsnps_net', 'tp_load_rem3_net']
huc12_outlet_loads_gdf.loc[huc12s, columns]

Unnamed: 0_level_0,huc12_name,comid,outlet_comid,nord,maflowv,tp_load,tp_load_ps,tp_load_xsnps,tp_load_net,tp_load_xsnps_net,tp_load_rem3_net
huc12,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
20401020202,Middle Beaver Kill,1750423,1750423,74746,237.667,2460.170219,0.0,-16654.179787,-4042.237109,14805.209343,14805.209343
20401020104,Lower Willowemoc Creek,1750427,1750427,74638,325.843,5307.781606,1120.828479,-22018.944828,2422.608154,-5374.293124,-5374.293124
20401020201,Upper Beaver Kill,1748733,1748733,74790,132.236,1194.625722,0.0,-9440.444302,1194.625722,-9440.444302,-9440.444302


#### Histograms

In [89]:
# Count HUC12s with reductions >0, out of 484 total (428 w/ conc values)
df = huc12_outlet_loads_gdf.tp_load_red3[huc12_outlet_loads_gdf.tp_load_red3 > 1]
df.count()

253

In [90]:
# histogram of log transformed data
df_log = np.log(df).dropna()

In [91]:
df_log.hvplot.hist(title='log_tp_load_atten_red3')

In [92]:
huc12_outlet_loads_gdf.tp_load_rem3[huc12_outlet_loads_gdf.tp_load_rem3 > 0].hvplot.hist()

### HUC12 explorer map

In [93]:
huc12_in_outlets_drwi_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
CategoricalIndex: 481 entries, 020401010101 to 020403020407
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   huc12_name     481 non-null    object  
 1   geometry       481 non-null    geometry
 2   centroid_xy    481 non-null    object  
 3   comid          481 non-null    Int64   
 4   nord           481 non-null    Int64   
 5   to_huc12       481 non-null    category
 6   outlet_comid   481 non-null    Int64   
 7   from_huc12s    231 non-null    object  
 8   inlet_comids   231 non-null    object  
 9   outlet_comids  481 non-null    object  
 10  huc10          481 non-null    category
 11  huc08          481 non-null    category
 12  in_drb         481 non-null    boolean 
dtypes: Int64(3), boolean(1), category(3), geometry(1), object(5)
memory usage: 83.1+ KB


In [94]:
huc_plot = huc12_in_outlets_drwi_gdf.hvplot(
    geo=True, 
    # frame_height=1500, 
    # hover_cols=['huc12','huc12_name'],
).opts(
    # color=None, alpha = 0.5
)

In [95]:
# This plot fails
# huc_plot

In [96]:
gdf_plot = reach_concs_gdf[['geometry', 'tp_conc']]
gdf_plot.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   geometry  19494 non-null  geometry
 1   tp_conc   16812 non-null  float64 
dtypes: float64(1), geometry(1)
memory usage: 973.0 KB


In [97]:
streams_plot = gv.Polygons(
    gdf_plot, 
    # vdims=[gdf.index.name, var], 
).opts(
    height = 750,
    width = 600,
    color = var,
    colorbar = True,
    cmap = 'cet_CET_L18',
    # cnorm = 'log',
    # clim = (kwargs['vmin'], kwargs['vmax']),
    # line_width = kwargs['line_width'],
    title = var,
    tools = ['hover']
)

In [98]:
streams_plot



In [99]:
# Compare to all comids
reach_concs_gdf['tp_conc_red3'] = reach_concs_gdf['tp_conc_xs'] - reach_concs_gdf['tp_conc_rem3']

In [100]:
# log transform
reach_concs_gdf['tp_conc_red3_log'] = np.log(
    reach_concs_gdf.tp_conc_red3[reach_concs_gdf.tp_conc_red3 >0]
).dropna()

In [101]:
reach_concs_gdf['tp_conc_red3_log'].min()

-20.166074584986646

In [102]:
# Stacked plots, using Holoviz Layout (1 column)
(reach_concs_gdf['tp_conc_red3_log'].hvplot.hist() 
 + df_log.hvplot.hist()
).cols(1)

## HUC10 Attenuated Outlet Loads
Implement similar to HUC12 Attenuated Outlet Loads


In [103]:
# Create dataframe for HUC08 loads (blank for now)
huc10_outlet_loads_gdf = huc10_outlets_drwi_gdf.copy(deep=True)

In [104]:
# Select results at HUC10 outlets, and change index to HUC10
reach_concs_huc10_gdf = reach_concs_gdf.loc[
    huc10_outlets_drwi_gdf.comid.dropna()
].set_index('huc10')
# Remove COMID geometries
reach_concs_huc10_gdf.drop('geometry', axis='columns', inplace=True)

In [105]:
# Add HUC10 metadata from outlet COMIDs in PA2 Reach Results
vars_int = ['nordstop', 'streamorder',  ]
# Recast to Pandas nullable integer type to avoid auto-recast to float
for var in vars_int:
    huc10_outlet_loads_gdf[f'{var}'] = reach_concs_huc10_gdf[f'{var}'].astype(pd.Int64Dtype())

vars_other = ['maflowv',]
for var in vars_other:
    huc10_outlet_loads_gdf[f'{var}'] = reach_concs_huc10_gdf[f'{var}']

In [106]:
# Add PA2 Reach Results from outlets to HUC12 GDF
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc10_outlet_loads_gdf[f'{pollutant}_conc{suffix}'] = (
            reach_concs_huc10_gdf[f'{pollutant}_conc{suffix}']
        )

In [107]:
# Back calculate Loads (kg/y) from average annual concentrations (mg/L) 
# and mean annual flow (CFS))
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc10_outlet_loads_gdf[f'{pollutant}_load{suffix}'] = (
            (huc10_outlet_loads_gdf[f'{pollutant}_conc{suffix}'] * 28.3168 / 1000000)
            * huc10_outlet_loads_gdf.maflowv * 31557600
        )

In [108]:
# Calc and add load reductions
for pollutant in ['tn', 'tp', 'tss']:
    huc10_outlet_loads_gdf[f'{pollutant}_load_red3'] = (
        huc10_outlet_loads_gdf[f'{pollutant}_load_xsnps'] 
        - huc10_outlet_loads_gdf[f'{pollutant}_load_rem3']
    )

In [109]:
huc10_outlet_loads_gdf

Unnamed: 0_level_0,huc10_name,geometry,comid,nord,in_drb,huc08,nordstop,streamorder,maflowv,tn_conc,...,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_red3,tp_load_red3,tss_load_red3
huc10,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0204010101,Upper West Branch Delaware River,"POLYGON ((-8304262.020 5228828.467, -8304276.2...",2612826,74277,True,02040101,74310,3,249.949,0.583135,...,-4.744983e+07,-9.262312e+05,-11043.478466,-4.744983e+07,0.0,0.0,0.0,0.000000,0.000000,0.000000
0204010102,Middle West Branch Delaware River,"POLYGON ((-8346041.487 5210211.202, -8345991.9...",2614138,74141,True,02040101,74452,4,486.135,0.922010,...,-9.341380e+07,-1.695335e+06,-30768.534458,-9.341380e+07,0.0,0.0,0.0,0.000000,0.000000,0.000000
0204010103,Lower West Branch Delaware River,"POLYGON ((-8386125.641 5192313.974, -8386205.0...",2617290,73934,True,02040101,74522,4,881.917,0.718986,...,-1.747651e+08,-3.203011e+06,-58154.495120,-1.747683e+08,0.0,0.0,0.0,66.555827,44.433876,6384.339133
0204010104,Upper Delaware River,"POLYGON ((-8358825.391 5150856.311, -8358625.2...",2616816,73702,True,02040101,73742,3,46.467,0.358229,...,-8.926194e+06,-1.816176e+05,-2235.265279,-8.928247e+06,0.0,0.0,0.0,27.596969,20.387760,2052.995258
0204010105,Middle Delaware River,"POLYGON ((-8329217.537 5136231.813, -8329266.5...",2617486,73565,True,02040101,74971,5,2731.678,0.498726,...,-5.454072e+08,-1.038475e+07,-186696.586115,-5.454187e+08,0.0,0.0,0.0,203.808124,95.154961,13853.236510
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0204030107,Manahawkin Bay-Little Egg Harbor,"POLYGON ((-8254906.642 4818531.173, -8255286.4...",9452077,106911,False,02040301,106912,1,0.353,1.127201,...,2.090618e+05,-1.136486e+03,38.856468,2.090615e+05,0.0,0.0,0.0,0.003672,0.000500,0.301111
0204030201,Upper Great Egg Harbor River,"POLYGON ((-8340052.545 4836703.894, -8340050.2...",9433771,114555,False,02040302,114695,4,231.886,1.116529,...,-4.487649e+07,-7.487808e+05,-17048.353837,-4.487686e+07,0.0,0.0,0.0,12.887737,0.255064,374.377277
0204030202,Lower Great Egg Harbor River,"POLYGON ((-8304455.903 4790038.822, -8304567.7...",9436873,114472,False,02040302,114738,4,441.120,0.846969,...,-8.626086e+07,-1.530665e+06,-34124.249901,-8.626111e+07,0.0,0.0,0.0,15.307113,0.144021,254.983230
0204030203,Tuckahoe River,"POLYGON ((-8310209.755 4764796.667, -8309971.4...",9436881,120357,False,02040302,120417,3,130.765,0.449912,...,-2.543278e+07,-5.001415e+05,-9539.899639,-2.543282e+07,0.0,0.0,0.0,0.608292,0.040044,37.745598


## HUC08 Attenuated Outlet Loads
Implement similar to HUC12 Attenuated Outlet Loads


In [110]:
# Create dataframe for HUC08 loads (blank for now)
huc08_outlet_loads_gdf = huc08_outlets_drwi_gdf.copy(deep=True)

In [111]:
# Select results at HUC08 outlets, and change index to HUC08
reach_concs_huc08_gdf = reach_concs_gdf.loc[
    huc08_outlets_drwi_gdf.comid.dropna()
].set_index('huc08')
# Remove COMID geometries
reach_concs_huc08_gdf.drop('geometry', axis='columns', inplace=True)

In [112]:
# Add HUC08 metadata from outlet COMIDs in PA2 Reach Results
vars_int = ['nordstop', 'streamorder',  ]
# Recast to Pandas nullable integer type to avoid auto-recast to float
for var in vars_int:
    huc08_outlet_loads_gdf[f'{var}'] = reach_concs_huc08_gdf[f'{var}'].astype(pd.Int64Dtype())

vars_other = ['maflowv',]
for var in vars_other:
    huc08_outlet_loads_gdf[f'{var}'] = reach_concs_huc08_gdf[f'{var}']

In [113]:
# Add PA2 Reach Results from outlets to HUC12 GDF
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc08_outlet_loads_gdf[f'{pollutant}_conc{suffix}'] = (
            reach_concs_huc08_gdf[f'{pollutant}_conc{suffix}']
        )

In [114]:
# Back calculate Loads (kg/y) from average annual concentrations (mg/L) 
# and mean annual flow (CFS))
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc08_outlet_loads_gdf[f'{pollutant}_load{suffix}'] = (
            (huc08_outlet_loads_gdf[f'{pollutant}_conc{suffix}'] * 28.3168 / 1000000)
            * huc08_outlet_loads_gdf.maflowv * 31557600
        )

In [115]:
# Calc and add load reductions
for pollutant in ['tn', 'tp', 'tss']:
    huc08_outlet_loads_gdf[f'{pollutant}_load_red3'] = (
        huc08_outlet_loads_gdf[f'{pollutant}_load_xsnps'] 
        - huc08_outlet_loads_gdf[f'{pollutant}_load_rem3']
    )

In [116]:
huc08_outlet_loads_gdf

Unnamed: 0_level_0,huc08_name,geometry,comid,nord,in_drb,nordstop,streamorder,maflowv,tn_conc,tp_conc,...,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid,tn_load_red3,tp_load_red3,tss_load_red3
huc08,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2040101,Upper Delaware,"POLYGON ((-8304228.499 5229843.998, -8304203.8...",2619256,73297,True,74971,5,3191.877,0.45493,0.009448,...,-644850400.0,-12252230.0,-230903.117232,-644873500.0,0.0,0.0,0.0,439.560446,147.21948,24931.31
2040102,East Branch Delaware,"POLYGON ((-8294284.604 5213730.686, -8294297.2...",1752159,74523,True,74971,5,1225.195,0.429241,0.019714,...,-240937500.0,-4713588.0,-77819.703985,-240937500.0,0.0,0.0,0.0,0.0,0.0,0.0
2040103,Lackawaxen,"POLYGON ((-8395173.834 5131609.270, -8395131.1...",2741462,72447,True,73296,6,1044.957,,,...,,,,,,,,,,
2040104,Middle Delaware-Mongaup-Brodhead,"POLYGON ((-8290255.809 5165720.405, -8290181.2...",4154510,70222,True,70223,1,7.742,6.334462,0.46655,...,-1556818.0,-31429.66,-543.142864,-1556818.0,0.0,0.0,0.0,0.0002,0.0001,0.02206792
2040105,Middle Delaware-Musconetcong,"POLYGON ((-8318518.543 5039392.409, -8318515.1...",4481949,68818,True,76391,6,12406.545,,,...,,,,,,,,,,
2040106,Lehigh,"POLYGON ((-8394794.527 5054626.737, -8394728.2...",4188251,74985,True,76373,5,2938.482,1.362875,0.102702,...,-500763300.0,-10165560.0,-108146.514798,-501119600.0,99.170001,394.921521,24991.882748,3369.211276,1973.185392,468593.6
2040201,Crosswicks-Neshaminy,"POLYGON ((-8361112.101 4923606.956, -8361020.3...",4485575,68274,True,68524,4,374.52,1.564866,0.053124,...,-44971360.0,-1218256.0,-17355.284571,-44993010.0,0.0,0.0,0.0,1147.573041,361.418231,21651.59
2040202,Lower Delaware,"POLYGON ((-8354536.834 4895108.912, -8354450.1...",24903452,65081,True,76894,7,17729.414,,,...,,,,,,,,,,
2040203,Schuylkill,"POLYGON ((-8453246.777 4995405.859, -8453181.3...",4784841,65459,True,67354,6,3055.678,3.298179,0.084313,...,-494393300.0,-6977055.0,-175070.367989,-494680400.0,9.170833,21.038511,1998.609704,27173.57922,2258.274003,1120262.0
2040204,Delaware Bay,"POLYGON ((-8404100.919 4824476.609, -8404179.3...",24903800,63468,True,78185,7,19418.612,,,...,,,,,,,,,,


# Method 2b: HUC08 Attenuated Multi-Outfall Loads
Summing all outfalls to the main stem of the Delaware River, because HUC8's with an upstream inlet have NaN concentrations at their outlets.

### Create reach_loads_gdf
Because we must sum over loads (not concs) for multiple COMIDs per HUC.

In [117]:
# Select COMIDs that directly drain into Delaware River mainstem.
reach_loads_gdf = reach_concs_gdf[reach_concs_gdf.into_dr==True].copy(deep=True)

# Back calculate Loads (kg/y) from average annual concentrations (mg/L) 
# and mean annual flow (CFS))
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        reach_loads_gdf[f'{pollutant}_load{suffix}'] = (
            (reach_loads_gdf[f'{pollutant}_conc{suffix}'] * 28.3168 / 1000000)
            * reach_loads_gdf.maflowv * 31557600
        )
reach_loads_gdf

Unnamed: 0_level_0,watershed_hectares,maflowv,geometry,cluster,sub_focusarea,nord,nordstop,huc12,streamorder,headwater,...,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1752159,217643.13,1225.195,MULTILINESTRING Z ((-8378146.566 5155379.090 0...,drb,,74523,74971,020401020507,5,0,...,-2.409375e+08,-4.713588e+06,-77819.703985,-2.409375e+08,-4.713588e+06,-77819.703985,-2.409375e+08,0.0,0.0,0.0
2588117,4559.31,31.593,MULTILINESTRING Z ((-8359670.670 4949497.189 0...,drb,,69290,69298,020401050901,2,0,...,-4.836876e+06,-9.797112e+04,230.415576,-4.836876e+06,-9.797806e+04,224.275551,-4.837832e+06,0.0,0.0,0.0
2588141,2865.69,20.147,MULTILINESTRING Z ((-8352789.129 4946216.758 0...,drb,,69267,69276,020401050902,2,0,...,-2.350382e+06,-6.039266e+04,1570.982231,-2.350382e+06,-6.040293e+04,1561.828694,-2.352055e+06,0.0,0.0,0.0
2588143,917.19,6.346,MULTILINESTRING Z ((-8349185.027 4944419.030 0...,drb,,69264,69264,020401050902,1,1,...,-8.768907e+05,-1.979346e+04,348.809648,-8.768907e+05,-1.979688e+04,345.668144,-8.774562e+05,0.0,0.0,0.0
2588283,260.28,1.845,MULTILINESTRING Z ((-8362061.968 5005702.842 0...,New Jersey Highlands,,70217,70218,020401050601,1,1,...,-3.485983e+05,-7.410518e+03,-123.260981,-3.485983e+05,-7.410527e+03,-123.268268,-3.485986e+05,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
932040294,6402.78,1.709,MULTILINESTRING Z ((-8354148.963 4926458.472 0...,drb,,69079,69080,020401050908,2,0,...,,,,,,,,,,
932040355,993.87,5.386,MULTILINESTRING Z ((-8402382.033 4833138.791 0...,drb,,65071,65078,020402050601,2,0,...,-3.852778e+05,-1.677725e+04,-75.559019,-3.852778e+05,-1.677725e+04,-75.559019,-3.852778e+05,0.0,0.0,0.0
932040356,911.79,4.812,MULTILINESTRING Z ((-8401265.600 4834709.285 0...,drb,,76961,76964,020402050601,2,0,...,-2.820900e+05,-1.311073e+04,-53.050168,-2.820900e+05,-1.311073e+04,-53.050168,-2.820900e+05,0.0,0.0,0.0
932040360,849.24,4.137,MULTILINESTRING Z ((-8407942.154 4821337.768 0...,drb,,64233,64235,020402050701,1,0,...,-3.813264e+05,-1.230997e+04,-33.813041,-3.813264e+05,-1.230997e+04,-33.813041,-3.813264e+05,0.0,0.0,0.0


In [118]:
# Test for any COMID
outfall_comids = [4495870]
vars = ['huc12','huc10', 'huc08','in_drb', 'into_dr', 'maflowv', 'tp_conc']
reach_loads_gdf.loc[outfall_comids, vars]

Unnamed: 0_level_0,huc12,huc10,huc08,in_drb,into_dr,maflowv,tp_conc
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
4495870,20402020405,204020204,2040202,True,True,67.56,0.054616


### Find outfall COMIDs with missing data
Sometimes, what looks like the outfall doesn't have valid data, because of some anomaly in the reach such as branching, etc.

The solution is to find the nearest upstream COMID with valid data, doing so in Notebook PA2_1.

In [119]:
# Find COMIDS with missing data!
reach_loads_gdf[reach_loads_gdf.tp_load.isna()][['huc12','huc10', 'huc08', 'maflowv', 'tp_load' ]].sort_values('huc12')

Unnamed: 0_level_0,huc12,huc10,huc08,maflowv,tp_load
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2588403,20401050603,204010506,2040105,0.025,
2588407,20401050603,204010506,2040105,0.23,
932040294,20401050908,204010509,2040105,1.709,
2589785,20401050911,204010509,2040105,0.647,
4481949,20401050911,204010509,2040105,12406.545,
4499298,20402020405,204020204,2040202,14.182,
4496356,20402020606,204020206,2040202,64.304,
4499332,20402020607,204020206,2040202,64.756,


In [120]:
# Find the nearest upstream COMID with valid data, by iterating through the list above by HUC12
# Make Edits in Notebook PA2_1
huc12 = '020402060602'
vars = ['in_drb', 'into_dr', 'maflowv', 'tp_conc']
print(f'HUC12: {huc12} ', huc12_outlet_loads_gdf.loc[huc12].huc12_name)
print(huc10_outlet_loads_gdf.loc[list(reach_loads_gdf[reach_loads_gdf.huc12==huc12]['huc10'].unique())].huc10_name)
print(huc08_outlet_loads_gdf.loc[list(reach_loads_gdf[reach_loads_gdf.huc12==huc12]['huc08'].unique())].huc08_name)

reach_concs_gdf[reach_concs_gdf.huc12==huc12].loc[:, vars].sort_values('maflowv').tail(30)


HUC12: 020402060602  Mad Horse Creek-Delaware Bay
huc10
0204020606    Stow Creek-Delaware Bay
Name: huc10_name, dtype: object
huc08
02040206    Cohansey-Maurice
Name: huc08_name, dtype: object


Unnamed: 0_level_0,in_drb,into_dr,maflowv,tp_conc
comid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
9484536,True,False,5.634,
9485782,True,False,7.854,0.085916
9485442,True,False,10.091,0.125952
9485848,True,False,10.258,0.120636
9484564,True,False,10.596,
9485914,True,False,10.597,
9485896,True,False,11.405,0.12732
9485462,True,False,11.436,0.115973
9485942,True,False,11.893,
9484540,True,False,12.346,0.101423


### HUC08 Attenuated Multi-Outfall Loads
Summing all outfalls to the main stem of the Delaware River, because HUC8's with an upstream inlet have NaN concentrations at their outlets.

In [121]:
# Select which columns to drop before summing
reach_loads_gdf.columns

Index(['watershed_hectares', 'maflowv', 'geometry', 'cluster', 'sub_focusarea',
       'nord', 'nordstop', 'huc12', 'streamorder', 'headwater', 'phase',
       'fa_name', 'in_drb', 'huc08', 'huc10', 'into_dr', 'Source', 'Sediment',
       'TotalN', 'TotalP', 'run_group', 'run_type', 'funding_sources',
       'with_attenuation', 'tn_conc', 'tp_conc', 'tss_conc', 'tn_conc_xs',
       'tp_conc_xs', 'tss_conc_xs', 'tn_conc_ps', 'tp_conc_ps', 'tss_conc_ps',
       'tn_conc_xsnps', 'tp_conc_xsnps', 'tss_conc_xsnps', 'tn_conc_rem1',
       'tp_conc_rem1', 'tss_conc_rem1', 'tn_conc_rem2', 'tp_conc_rem2',
       'tss_conc_rem2', 'tn_conc_rem3', 'tp_conc_rem3', 'tss_conc_rem3',
       'tn_conc_avoid', 'tp_conc_avoid', 'tss_conc_avoid', 'tp_conc_red3',
       'tp_conc_red3_log', 'tn_load', 'tp_load', 'tss_load', 'tn_load_ps',
       'tp_load_ps', 'tss_load_ps', 'tn_load_xsnps', 'tp_load_xsnps',
       'tss_load_xsnps', 'tn_load_rem1', 'tp_load_rem1', 'tss_load_rem1',
       'tn_load_rem2', 'tp_lo

In [122]:
columns_to_drop = [
    'watershed_hectares', 'geometry', 'cluster', 'sub_focusarea',
    'nord', 'nordstop', 'huc12', 'streamorder', 'headwater', 'phase',
    'fa_name', 'in_drb', 'huc10', 'Source', 'Sediment',
    'TotalN', 'TotalP', 'run_group', 'run_type', 'funding_sources',
    'with_attenuation', 'tn_conc', 'tp_conc', 'tss_conc', 'tn_conc_xs',
    'tp_conc_xs', 'tss_conc_xs', 'tn_conc_ps', 'tp_conc_ps', 'tss_conc_ps',
    'tn_conc_xsnps', 'tp_conc_xsnps', 'tss_conc_xsnps', 'tn_conc_rem1',
    'tp_conc_rem1', 'tss_conc_rem1', 'tn_conc_rem2', 'tp_conc_rem2',
    'tss_conc_rem2', 'tn_conc_rem3', 'tp_conc_rem3', 'tss_conc_rem3',
    'tn_conc_avoid', 'tp_conc_avoid', 'tss_conc_avoid', 'tp_conc_red3',
    'tp_conc_red3_log',
]

In [123]:
# Select results at HUC08 outfalls to Delaware River mainstem, and sum by
reach_loads_huc08multi_df = reach_loads_gdf[reach_loads_gdf.into_dr==True].drop(columns_to_drop, axis=1).groupby('huc08').sum().copy()
reach_loads_huc08multi_df

  reach_loads_huc08multi_df = reach_loads_gdf[reach_loads_gdf.into_dr==True].drop(columns_to_drop, axis=1).groupby('huc08').sum().copy()


Unnamed: 0_level_0,maflowv,into_dr,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tss_load_ps,tn_load_xsnps,tp_load_xsnps,...,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid
huc08,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2040101,3191.877,1,1297595.0,26949.530451,32000810.0,58037.91,998.968515,0.0,-12251790.0,-230755.897752,...,-644850400.0,-12251820.0,-230760.860841,-644850400.0,-12252230.0,-230903.117232,-644873500.0,0.0,0.0,0.0
2040102,1225.195,1,469953.2,21583.927049,18869630.0,4915.952,867.418367,0.0,-4713588.0,-77819.703985,...,-240937500.0,-4713588.0,-77819.703985,-240937500.0,-4713588.0,-77819.703985,-240937500.0,0.0,0.0,0.0
2040103,1043.742,1,416769.3,9662.29662,12854480.0,67963.34,2152.598457,0.0,-4062858.0,-76433.171079,...,-208474900.0,-4062858.0,-76433.171079,-208474900.0,-4063812.0,-76611.421222,-208511200.0,0.0,0.0,0.0
2040104,7.742,1,43823.9,3227.74821,84901.57,42529.86,3148.241226,0.0,-31429.66,-543.142764,...,-1556818.0,-31429.66,-543.142764,-1556818.0,-31429.66,-543.142864,-1556818.0,0.0,0.0,0.0
2040105,14712.605,67,3585546.0,274518.891959,172989900.0,807082.0,94919.582068,0.0,-6957700.0,-5655.396935,...,-318540900.0,-6976631.0,-9956.090084,-318595400.0,-6979164.0,-12178.263962,-318729100.0,76.341045,462.9688,24440.405843
2040106,2938.482,1,3578716.0,269681.071261,122465000.0,1320600.0,139527.214164,0.0,-10162190.0,-106173.329407,...,-500752200.0,-10162860.0,-106363.233365,-500763300.0,-10165560.0,-108146.514798,-501119600.0,99.170001,394.921521,24991.882748
2040201,791.82,11,1476583.0,54453.656589,67605260.0,592095.0,27441.202151,0.0,-2462358.0,-36669.607531,...,-100303100.0,-2462358.0,-36669.607531,-100303100.0,-2463740.0,-37210.248925,-100328900.0,0.0,0.0,0.0
2040202,4708.624,29,14146030.0,680056.407402,350822600.0,5512151.0,538801.113861,0.0,-10662970.0,-225915.199039,...,-618242500.0,-10684150.0,-226761.632306,-618253500.0,-10691960.0,-228935.382545,-618571600.0,19.111125,29.363579,3215.955896
2040203,3055.678,1,9005960.0,230224.981212,154407700.0,3040174.0,157284.409501,0.0,-6949882.0,-172812.093986,...,-494393300.0,-6970200.0,-173494.027976,-494393300.0,-6977055.0,-175070.367989,-494680400.0,9.170833,21.038511,1998.609704
2040204,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [124]:
# Create dataframe for HUC08 loads (blank for now)
huc08_multioutfall_loads_gdf = huc08_outlets_drwi_gdf.copy(deep=True)
huc08_multioutfall_loads_gdf.drop(['comid', 'nord'], axis='columns', inplace=True)
huc08_multioutfall_loads_gdf

Unnamed: 0_level_0,huc08_name,geometry,in_drb
huc08,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2040101,Upper Delaware,"POLYGON ((-8304228.499 5229843.998, -8304203.8...",True
2040102,East Branch Delaware,"POLYGON ((-8294284.604 5213730.686, -8294297.2...",True
2040103,Lackawaxen,"POLYGON ((-8395173.834 5131609.270, -8395131.1...",True
2040104,Middle Delaware-Mongaup-Brodhead,"POLYGON ((-8290255.809 5165720.405, -8290181.2...",True
2040105,Middle Delaware-Musconetcong,"POLYGON ((-8318518.543 5039392.409, -8318515.1...",True
2040106,Lehigh,"POLYGON ((-8394794.527 5054626.737, -8394728.2...",True
2040201,Crosswicks-Neshaminy,"POLYGON ((-8361112.101 4923606.956, -8361020.3...",True
2040202,Lower Delaware,"POLYGON ((-8354536.834 4895108.912, -8354450.1...",True
2040203,Schuylkill,"POLYGON ((-8453246.777 4995405.859, -8453181.3...",True
2040204,Delaware Bay,"POLYGON ((-8404100.919 4824476.609, -8404179.3...",True


In [125]:
# Add maflowv
huc08_multioutfall_loads_gdf['maflowv'] = reach_loads_huc08multi_df['maflowv']

In [126]:
# Add PA2 Reach Results from outlets to HUC08 GDF
for suffix in ['','_ps', '_xsnps', '_rem1', '_rem2', '_rem3', '_avoid']:
    for pollutant in ['tn', 'tp', 'tss']:
        huc08_multioutfall_loads_gdf[f'{pollutant}_load{suffix}'] = (
            reach_loads_huc08multi_df[f'{pollutant}_load{suffix}']
        )

In [127]:
huc08_multioutfall_loads_gdf

Unnamed: 0_level_0,huc08_name,geometry,in_drb,maflowv,tn_load,tp_load,tss_load,tn_load_ps,tp_load_ps,tss_load_ps,...,tss_load_rem1,tn_load_rem2,tp_load_rem2,tss_load_rem2,tn_load_rem3,tp_load_rem3,tss_load_rem3,tn_load_avoid,tp_load_avoid,tss_load_avoid
huc08,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2040101,Upper Delaware,"POLYGON ((-8304228.499 5229843.998, -8304203.8...",True,3191.877,1297595.0,26949.530451,32000810.0,58037.91,998.968515,0.0,...,-644850400.0,-12251820.0,-230760.860841,-644850400.0,-12252230.0,-230903.117232,-644873500.0,0.0,0.0,0.0
2040102,East Branch Delaware,"POLYGON ((-8294284.604 5213730.686, -8294297.2...",True,1225.195,469953.2,21583.927049,18869630.0,4915.952,867.418367,0.0,...,-240937500.0,-4713588.0,-77819.703985,-240937500.0,-4713588.0,-77819.703985,-240937500.0,0.0,0.0,0.0
2040103,Lackawaxen,"POLYGON ((-8395173.834 5131609.270, -8395131.1...",True,1043.742,416769.3,9662.29662,12854480.0,67963.34,2152.598457,0.0,...,-208474900.0,-4062858.0,-76433.171079,-208474900.0,-4063812.0,-76611.421222,-208511200.0,0.0,0.0,0.0
2040104,Middle Delaware-Mongaup-Brodhead,"POLYGON ((-8290255.809 5165720.405, -8290181.2...",True,7.742,43823.9,3227.74821,84901.57,42529.86,3148.241226,0.0,...,-1556818.0,-31429.66,-543.142764,-1556818.0,-31429.66,-543.142864,-1556818.0,0.0,0.0,0.0
2040105,Middle Delaware-Musconetcong,"POLYGON ((-8318518.543 5039392.409, -8318515.1...",True,14712.605,3585546.0,274518.891959,172989900.0,807082.0,94919.582068,0.0,...,-318540900.0,-6976631.0,-9956.090084,-318595400.0,-6979164.0,-12178.263962,-318729100.0,76.341045,462.9688,24440.405843
2040106,Lehigh,"POLYGON ((-8394794.527 5054626.737, -8394728.2...",True,2938.482,3578716.0,269681.071261,122465000.0,1320600.0,139527.214164,0.0,...,-500752200.0,-10162860.0,-106363.233365,-500763300.0,-10165560.0,-108146.514798,-501119600.0,99.170001,394.921521,24991.882748
2040201,Crosswicks-Neshaminy,"POLYGON ((-8361112.101 4923606.956, -8361020.3...",True,791.82,1476583.0,54453.656589,67605260.0,592095.0,27441.202151,0.0,...,-100303100.0,-2462358.0,-36669.607531,-100303100.0,-2463740.0,-37210.248925,-100328900.0,0.0,0.0,0.0
2040202,Lower Delaware,"POLYGON ((-8354536.834 4895108.912, -8354450.1...",True,4708.624,14146030.0,680056.407402,350822600.0,5512151.0,538801.113861,0.0,...,-618242500.0,-10684150.0,-226761.632306,-618253500.0,-10691960.0,-228935.382545,-618571600.0,19.111125,29.363579,3215.955896
2040203,Schuylkill,"POLYGON ((-8453246.777 4995405.859, -8453181.3...",True,3055.678,9005960.0,230224.981212,154407700.0,3040174.0,157284.409501,0.0,...,-494393300.0,-6970200.0,-173494.027976,-494393300.0,-6977055.0,-175070.367989,-494680400.0,9.170833,21.038511,1998.609704
2040204,Delaware Bay,"POLYGON ((-8404100.919 4824476.609, -8404179.3...",True,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Compare Loads from Methods 1 & 2

### Compare HUC12 loads

In [128]:
vars1 = ['huc12_name', 'in_drb', 'huc08', 'huc10', 'tp_load', 'tp_load_rem3']
vars2 = ['huc12_name', 'in_drb', 'huc08', 'huc10', 'tp_load', 'tp_load_rem3', 'tp_load_rem3_net' ]
# Concat two datasets for comparision, by creating a second index of keys
df = pd.concat(
    [huc12_load_gdf[vars1], huc12_outlet_loads_gdf[vars2]], 
    keys=['local', 'attenuated']
)
df.index.set_names('load_type', level=0, inplace=True)
df = df[df.in_drb]

In [129]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,huc12_name,in_drb,huc08,huc10,tp_load,tp_load_rem3,tp_load_rem3_net
load_type,huc12,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
local,020401010101,Town Brook-Headwaters West Brach Delaware River,True,02040101,0204010101,2165.076849,-452.700667,
local,020401010102,Betty Brook-Headwaters West Brach Delaware River,True,02040101,0204010101,1556.048844,-456.580400,
local,020401010103,Rose Brook-Headwaters West Brach Delaware River,True,02040101,0204010101,1611.633293,-444.403453,
local,020401010104,Elk Creek-Headwaters West Brach Delaware River,True,02040101,0204010101,1688.653291,-499.505703,
local,020401010105,Upper Little Delaware River,True,02040101,0204010101,2693.289548,-1494.308562,
...,...,...,...,...,...,...,...,...
attenuated,020402070507,Grecos Canal-Delaware Bay,True,02040207,0204020705,809.735318,247.484686,7328.570535
attenuated,020402070601,Round Pole Branch-Broadkill River,True,02040207,0204020706,10473.469242,6991.478911,6991.478911
attenuated,020402070602,Primehook Creek,True,02040207,0204020706,707.101204,-2206.291606,-2206.291606
attenuated,020402070603,Beaverdam Creek-Broadkill River,True,02040207,0204020706,3528.167276,-5877.524614,-10662.711918


In [130]:
# Select HUC10: Lackawaxen River, HUC-10 Watershed ID 0204010306
df_barplot = df[df.huc10=='0204010306' ]
df_barplot

Unnamed: 0_level_0,Unnamed: 1_level_0,huc12_name,in_drb,huc08,huc10,tp_load,tp_load_rem3,tp_load_rem3_net
load_type,huc12,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
local,20401030601,Carley Brook-Lackawaxen River,True,2040103,204010306,6069.376512,774.090533,
local,20401030602,Blooming Grove Creek,True,2040103,204010306,698.908504,-1708.732167,
local,20401030603,Lackawaxen River-Delaware River,True,2040103,204010306,2755.48166,-2024.497667,
attenuated,20401030601,Carley Brook-Lackawaxen River,True,2040103,204010306,21273.643749,-25962.50775,-12785.989901
attenuated,20401030602,Blooming Grove Creek,True,2040103,204010306,592.871413,-3628.373281,-3628.373281
attenuated,20401030603,Lackawaxen River-Delaware River,True,2040103,204010306,0.0,0.0,


In [131]:
# Select HUC10: West Branch Brandywine Creek
df_barplot = df[df.huc10=='0204020502' ]
df_barplot

Unnamed: 0_level_0,Unnamed: 1_level_0,huc12_name,in_drb,huc08,huc10,tp_load,tp_load_rem3,tp_load_rem3_net
load_type,huc12,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
local,20402050201,Sucker Run,True,2040205,204020502,2244.968746,441.494502,
local,20402050202,Upper West Branch Brandywine Creek,True,2040205,204020502,20578.714741,8960.638067,
local,20402050203,Doe Run,True,2040205,204020502,6534.718257,4509.846981,
local,20402050204,Buck Run,True,2040205,204020502,7479.787471,4967.263744,
local,20402050205,Lower West Branch Brandywine Creek,True,2040205,204020502,4922.542563,2609.577476,
attenuated,20402050201,Sucker Run,True,2040205,204020502,2244.079011,161.255024,161.255024
attenuated,20402050202,Upper West Branch Brandywine Creek,True,2040205,204020502,19976.120131,4322.580415,4161.325391
attenuated,20402050203,Doe Run,True,2040205,204020502,6187.910506,3074.060749,3074.060749
attenuated,20402050204,Buck Run,True,2040205,204020502,13479.847406,6402.016427,3327.955678
attenuated,20402050205,Lower West Branch Brandywine Creek,True,2040205,204020502,31691.276031,7182.49578,317.911596


In [132]:
df_barplot.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10 entries, ('local', '020402050201') to ('attenuated', '020402050205')
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   huc12_name        10 non-null     object 
 1   in_drb            10 non-null     boolean
 2   huc08             10 non-null     object 
 3   huc10             10 non-null     object 
 4   tp_load           10 non-null     float64
 5   tp_load_rem3      10 non-null     float64
 6   tp_load_rem3_net  5 non-null      float64
dtypes: boolean(1), float64(3), object(3)
memory usage: 20.7+ KB


In [133]:
# Create Bar plot for comparison
# https://holoviews.org/reference/elements/bokeh/Bars.html
df_barplot = df[df.huc10=='0204020502' ]
var = 'tp_load'
barplot = hv.Bars(
    df_barplot, kdims=['huc12', 'load_type'], vdims=[var], 
    # hover_cols=['huc12_name']
)
# For some reason, hover_cols is not working
barplot.opts(tools=['hover'], multi_level=False, xrotation=45, width=800, height=400)
barplot

In [134]:
# Can't get multiple x values (kdims) to work with hover_cols, maybe similar to 
# https://github.com/holoviz/hvplot/issues/919
df_barplot.hvplot.bar(
    x='huc12', y=var, 
    # hover_cols=['huc12_name'], 
    stacked=False,
)

In [135]:
# This produces "ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 5 has 1 dimension(s)"
# df_barplot.hvplot.bar(stacked=False)

In [136]:
# Try filtering DF first
df_barplot = df[df.huc10=='0204020502' ][['tp_load', 'huc12_name']]
df_barplot

Unnamed: 0_level_0,Unnamed: 1_level_0,tp_load,huc12_name
load_type,huc12,Unnamed: 2_level_1,Unnamed: 3_level_1
local,20402050201,2244.968746,Sucker Run
local,20402050202,20578.714741,Upper West Branch Brandywine Creek
local,20402050203,6534.718257,Doe Run
local,20402050204,7479.787471,Buck Run
local,20402050205,4922.542563,Lower West Branch Brandywine Creek
attenuated,20402050201,2244.079011,Sucker Run
attenuated,20402050202,19976.120131,Upper West Branch Brandywine Creek
attenuated,20402050203,6187.910506,Doe Run
attenuated,20402050204,13479.847406,Buck Run
attenuated,20402050205,31691.276031,Lower West Branch Brandywine Creek


In [137]:
# Same error as above, unless `stacked=True`
barplot = df_barplot.hvplot.bar(stacked=False)

In [138]:
print(barplot)

:Bars   [load_type,huc12]   (tp_load)


In [139]:
df_barplot.index

MultiIndex([(     'local', '020402050201'),
            (     'local', '020402050202'),
            (     'local', '020402050203'),
            (     'local', '020402050204'),
            (     'local', '020402050205'),
            ('attenuated', '020402050201'),
            ('attenuated', '020402050202'),
            ('attenuated', '020402050203'),
            ('attenuated', '020402050204'),
            ('attenuated', '020402050205')],
           names=['load_type', 'huc12'])

### Compare HUC08 loads

In [140]:
vars = ['huc08_name', 'in_drb', 
        'tp_load', 'tp_load_ps','tp_load_xsnps',
        'tp_load_rem1', 'tp_load_rem2', 'tp_load_rem3', 
        'tp_load_avoid',]
# Concat two datasets for comparision, by creating a second index of keys
df = pd.concat(
    [huc08_load_gdf[vars], 
        # huc08_outlet_loads_gdf[vars], 
        huc08_multioutfall_loads_gdf[vars],
    ], 
    keys=['local', 
        # 'attenuated-outlet', 
        'attenuated-multi'
    ]
)
df.index.set_names('load_type', level=0, inplace=True)
df = df[df.in_drb]

In [141]:
df.loc[:, ['huc08_name', 'in_drb', 'tp_load', 'tp_load_rem3']]

Unnamed: 0_level_0,Unnamed: 1_level_0,huc08_name,in_drb,tp_load,tp_load_rem3
load_type,huc08,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
local,2040101,Upper Delaware,True,75654.781768,-22314.174804
local,2040102,East Branch Delaware,True,36614.472818,-31946.03817
local,2040103,Lackawaxen,True,51232.942978,-5994.714921
local,2040104,Middle Delaware-Mongaup-Brodhead,True,91180.327491,-54165.002565
local,2040105,Middle Delaware-Musconetcong,True,311996.724363,99661.663833
local,2040106,Lehigh,True,412013.41698,89816.761168
local,2040201,Crosswicks-Neshaminy,True,265751.052554,33275.222268
local,2040202,Lower Delaware,True,972973.869331,51966.955117
local,2040203,Schuylkill,True,927493.354787,322449.785109
local,2040204,Delaware Bay,True,233305.918268,-2060.502037


In [142]:
# Select subset, if any
df_barplot = df

In [143]:
# Create Bar plot for comparison
# https://holoviews.org/reference/elements/bokeh/Bars.html
var = 'tp_load_rem3'
barplot = hv.Bars(
    df_barplot, kdims=['huc08', 'load_type'], vdims=[var], 
    # hover_cols=['huc12_name']
)
# For some reason, hover_cols is not working
barplot.opts(tools=['hover'], multi_level=False, xrotation=45, width=800, height=400)
barplot

  dataset.data.groupby(group_by, sort=False)]


# Save Calculated PA2 Results

In [144]:
%%time
# # Save PA2 aggregated results from Method 2: Attenuated reach loads accumulated through the stream network.
# # NOTE:  the 'brotli' compression engine writes slower than 'gzip', 
# # but decreases storage by ~35% while having similar read speeds.

# # huc12_outlet_loads_gdf.to_parquet(data_output_path /'huc12_outlet_loads_gdf.parquet',compression='brotli')
# huc10_outlet_loads_gdf.to_parquet(data_output_path /'huc10_outlet_loads_gdf.parquet',compression='brotli')
# huc08_outlet_loads_gdf.to_parquet(data_output_path /'huc08_outlet_loads_gdf.parquet',compression='brotli')

# huc08_multioutfall_loads_gdf.to_parquet(data_output_path /'huc08_multioutfall_loads_gdf.parquet',compression='brotli')


CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 5.96 µs


In [154]:
huc12_outlet_loads_gdf.to_parquet(
    data_output_path /'huc12_outlet_loads_gdf.parquet',
    engine='pyarrow',
    compression='brotli',
)