Summary Statistics
===

This notebook retrieves summary statsitics on restoration and protection. Summary stats include: area, practice type, practice count, cummulative reduction. 

Projection is broken down into DRWI protection and non-DRWI protection. Restoration summary tables include direct (restoration fund), indirect (operational and watershed conservation funds), PA DEP BMPs, & PA & NJ county-level restoration.

# Setup

## Load packages

In [1]:
# Import packages
from pathlib import Path
import pandas as pd
import numpy as np
import geopandas as gpd
import json
from shapely.validation import make_valid
import imp
import warnings

  import imp


## Paths

In [2]:
# Find your current working directory, which should be folder for this notebook.
Path.cwd()

WindowsPath('C:/Users/clulay/OneDrive - LimnoTech/Documents/GitHub/pollution-assessment/stage2')

In [3]:
# Set your project directory to your local folder for your clone of this repository
project_path = Path.cwd().parent
project_path

WindowsPath('C:/Users/clulay/OneDrive - LimnoTech/Documents/GitHub/pollution-assessment')

# Load data

In [8]:
fd_protec_gdf = gpd.read_parquet(project_path / Path('stage2/private/protection_bmps_from_FieldDoc.parquet'))
fd_rest_gdf = gpd.read_parquet(project_path / Path('stage2/private/restoration_bmps_from_FieldDoc.parquet'))
wcpa_protec_gdf = gpd.read_parquet(project_path / Path('stage2/Protected_Lands/WCPA_exclude_DRWI.parquet'))

# PA & NJ Ag & Dev
PA_NJ_dev = pd.read_csv(project_path / Path('stage2/private/PA_NJ_DevLoadReduction.csv'))
PA_NJ_ag = pd.read_csv(project_path / Path('stage2/private/PA_NJ_AgLoadReduction.csv'))

In [9]:
#fd_protec_gdf.set_index('practice_id', inplace=True)
#fd_rest_gdf.set_index('practice_id', inplace=True)

In [10]:
fd_dtypes = {
    'practice_id': 'category',
    'practice_name': 'category',
    'program_name': 'category',
    'organization': 'category',
    'description': 'category',
    'practice_type': 'category',
    'created_at': 'category',
    'modified_at': 'category'
}

fd_protec_gdf = fd_protec_gdf.astype(fd_dtypes)
fd_rest_gdf = fd_rest_gdf.astype(fd_dtypes)

# Set CRS

Convert to equal area crs to compute area. The FieldDoc exports do not have a projection assigned. Previous FieldDoc exports had a projection of EPSG 4326 applied. This projection must first be assigned to the FieldDoc gdfs, then the gdfs can be converted to equal area crs.

In [11]:
gdf_list = [fd_protec_gdf, fd_rest_gdf]

for item in gdf_list:
    item.set_crs(epsg=4326, allow_override=True, inplace=True)

In [12]:
gdf_list = [fd_protec_gdf, fd_rest_gdf, wcpa_protec_gdf]

for item in gdf_list:
    item.to_crs(crs='ESRI:102003', inplace=True)

# Summary stats

## FieldDoc protection

In [48]:
len(fd_protec_gdf)

75

In [49]:
fd_protec_summary = summary_stats(fd_protec_gdf)
fd_protec_summary

Unnamed: 0_level_0,practice_count,area_ac
practice_type,Unnamed: 1_level_1,Unnamed: 2_level_1
Conservation easement,37,21717.96
Fee acquisition,38,4675.72
TOTAL,75,26393.68


## WCPA protection

In [50]:
len(wcpa_protec_gdf)

22137

In [51]:
wcpa_protec_summary = summary_stats(wcpa_protec_gdf)
wcpa_protec_summary

Unnamed: 0,practice_count,area_ac
Agricultural Easement,5381,454105.48
Conservation Easement,3498,242667.25
Natural Resource Area - Federal,118,55583.2
Natural Resource Area - Local,4183,113308.77
Natural Resource Area - Private,1606,111332.82
Natural Resource Area - State,1703,772577.74
Park or Recreation Area - Federal,18,96156.22
Park or Recreation Area - Local,4903,174631.07
Park or Recreation Area - Private,173,6181.22
Park or Recreation Area - State,554,119183.87


## FieldDoc restoration

In [14]:
def summary_stats(gdf: gpd.GeoDataFrame, rest=False) -> pd.DataFrame:
    if gdf.crs.to_string() != 'ESRI:102003':
        gdf.to_crs(crs='ESRI:102003', inplace=True)
    
    gdf['area_ac'] = gdf.geometry.area/4046.86
    
    column_list = gdf.columns
    
    if 'OBJECTID' in column_list:
        count = gdf.groupby('RECLASS2')['OBJECTID'].count()
        area = round(gdf.groupby('RECLASS2')['area_ac'].sum(),2)
        
    else:
        count = gdf.groupby('practice_type')['practice_id'].count()
        
        has_geom = gdf[gdf['geometry'] != None]
        
        area = round(gdf.groupby('practice_type')['area_ac'].sum(),2)
        
    if rest == True: 
        tn_load_reduced = gdf.groupby('practice_type')['tn'].sum()
        tp_load_reduced = gdf.groupby('practice_type')['tp'].sum()        
        tss_load_reduced = gdf.groupby('practice_type')['tss'].sum()       
        
        frame = {'practice_count': count, 'area_ac': area,
                 'tn_load_reduced': tn_load_reduced,
                 'tp_load_reduced': tp_load_reduced,
                 'tss_load_reduced': tss_load_reduced}
        
        summary_df = pd.DataFrame(frame)
        
        totals_dict = {'practice_type': 'TOTAL',
                       'practice_count': summary_df['practice_count'].sum(),
                       'area_ac': summary_df['area_ac'].sum(),
                       'tn_load_reduced': summary_df['tn_load_reduced'].sum(),
                       'tp_load_reduced': summary_df['tp_load_reduced'].sum(),
                       'tss_load_reduced': summary_df['tss_load_reduced'].sum()}
    
    else:
        frame = {'practice_count': count, 'area_ac': area}
        
        summary_df = pd.DataFrame(frame)
        
        totals_dict = {'practice_type': 'TOTAL',
                       'practice_count': summary_df['practice_count'].sum(),
                       'area_ac': summary_df['area_ac'].sum()}

    totals = pd.DataFrame([totals_dict]).set_index('practice_type')
    
    summary_df = pd.concat([summary_df, totals])
    
    summary_df = summary_df[summary_df['practice_count'] > 0]
    
    return(summary_df)

In [15]:
fd_rest_gdf['practice_type'] = fd_rest_gdf['practice_type'].astype('object')
fd_rest_gdf['practice_type'].fillna('Not Specified', inplace=True)
fd_rest_gdf['practice_type'] = fd_rest_gdf['practice_type'].astype('category')
fd_rest_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1168 entries, 0 to 1167
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   practice_name      1168 non-null   category
 1   practice_id        1168 non-null   category
 2   program_name       1168 non-null   category
 3   program_id         1168 non-null   int64   
 4   organization       1168 non-null   category
 5   description        738 non-null    category
 6   practice_type      1168 non-null   category
 7   created_at         1168 non-null   category
 8   modified_at        1168 non-null   category
 9   tn                 1168 non-null   float64 
 10  tp                 1168 non-null   float64 
 11  tss                1168 non-null   float64 
 12  geometry           1047 non-null   geometry
 13  drainage_geometry  904 non-null    geometry
dtypes: category(8), float64(3), geometry(2), int64(1)
memory usage: 239.6 KB


In [16]:
fd_rest_gdf['program_name'].unique()

['Delaware River Restoration Fund', 'Delaware River Operational Fund', 'Delaware Watershed Conservation Fund']
Categories (3, object): ['Delaware River Operational Fund', 'Delaware River Restoration Fund', 'Delaware Watershed Conservation Fund']

In [17]:
direct_fd_rest = fd_rest_gdf[fd_rest_gdf['program_name'] == 'Delaware River Restoration Fund']

indirect_list = ['Delaware River Operational Fund', 'Delaware Watershed Conservation Fund']
indirect_fd_rest = fd_rest_gdf[fd_rest_gdf['program_name'].isin(indirect_list)]

In [18]:
len(indirect_fd_rest)

144

In [20]:
indirect_rest_summary = summary_stats(indirect_fd_rest, rest=True)
indirect_rest_summary

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Unnamed: 0_level_0,practice_count,area_ac,tn_load_reduced,tp_load_reduced,tss_load_reduced
practice_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Animal Waste Management System,1,0.0,0.0,0.0,0.0
Barnyard Runoff Control,5,0.84,0.0,0.0,0.0
"Bioretention/rain gardens - A/B soils, no underdrain",1,0.07,0.0,0.0,0.0
Bioretention/raingarden - C/D soils no underdrain,1,0.01,0.0,0.0,0.0
Conservation Cover,4,19.38,4.53,2.26,1139.29
Conservation easement,25,3427.71,0.0,0.0,0.0
Cover Crop,24,4932.02,0.0,0.0,0.0
Dry Extended Detention Ponds,1,0.59,0.0,0.0,0.0
Fee acquisition,19,5455.92,0.0,0.0,0.0
Forest Buffer,7,63.43,988.74,101.14,50925.83


In [21]:
direct_rest_summary = summary_stats(direct_fd_rest, rest=True)
direct_rest_summary

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Unnamed: 0_level_0,practice_count,area_ac,tn_load_reduced,tp_load_reduced,tss_load_reduced
practice_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Access Control,2,207.10,0.00,0.00,0.00
Access Road,9,0.00,0.00,0.00,0.00
Animal Mortality Facility,2,0.00,0.00,0.00,0.00
Aquatic Organism Passage,1,1.09,0.00,0.00,0.00
Barnyard Runoff Controls,11,61.07,52.55,16.19,11715.06
...,...,...,...,...,...
Wet Pond,1,0.20,52.74,26.49,35413.76
Wetland Creation - Floodplain,1,0.33,0.17,0.08,30.13
Wetland Restoration,3,10.08,36.44,21.44,12978.44
Wetland Restoration - Floodplain,1,0.44,0.17,0.04,16.15
