# MEP Preprossessing

In this notebook I manipulate watershed boundary layers used for the Massachusetts Estuaries Project, within [MEP study area](https://www.mass.gov/guides/the-massachusetts-estuaries-project-and-reports). 

1. Regroup subwatershed layers that were split by travel time.
2. Calculate the elevation percentile in subs (Lid_Sub_ZS)
3. Classify subwatersheds by elevation percentile (ele5pct_poly)
4. Intersect elevation classes with subwatersheds (sub_le5pct)
5. Intersect elevation classified subwatersheds with tax parcel data (subs_le5_tax)

Tax parcel data can then be used to generate land cover classifications within subwatersheds uplands and terminal zones (seepage faces) of subwatersheds.

NOTE: whenever you set up a new ArcGIS Pro project with python for batch processing make sure to uncheck `options > geoprocessing > 'add output datasets to open map'` this will save RAM and prevent crashes when you are looping through many files. 

## Next Steps

Consider doing more lidar based metrics such as topographix wetness index or topographic openess index

Consider adding slope. 

Consider summarizing landuse for older years. 

## Data 

Publication
Carlson, C.S., Masterson, J.P., Walter, D.A., and Barbaro, J.R., 2017, Development of simulated groundwater-contributing areas to selected streams, ponds, coastal water bodies, and production wells in the Plymouth-Carver region and Cape Cod, Massachusetts: U.S. Geological Survey Data Series 1074, 17 p. https://doi.org/10.3133/ds1074

Dataset: 
Carlson, C.S., Masterson, J.P., Walter, D.A., and Barbaro, J.R., 2017, Simulated groundwater-contributing areas to selected streams, ponds, coastal water bodies, and production wells, Plymouth-Carver region and Cape Cod, Massachusetts: U.S. Geological Survey data release, https://doi.org/10.5066/F7V69H2Z.


In [44]:
# this codeblock sets up the environment from jupyter notebooks
setup_notebook = "C:/Users/Adrian.Wiegman/Documents/GitHub/Wiegman_USDA_ARS/MEP/_Setup.ipynb"
%run $setup_notebook # magic command to run the notebook 

***
loading python modules...

  `module_list` contains names of all loaded modules

...module loading complete

***
setting up arcpy environment...

 input file directory (`idr`): C:\Workspace\Geodata\Massachusetts\
 working directory (`wdr`): C:\Workspace\Geodata\MEP\
 default geodatabase path: C:\Workspace\Geodata\MEP\Default.gdb
 temp dir (`tdr`): C:\Workspace\Geodata\MEP\temp
 output dir (`odr`): C:\Workspace\Geodata\MEP\outputs
 output coordinate system: NAD_1983_UTM_Zone_19N

... env setup complete

***
loading functions...

type `fn_`+TAB to autocomplete

 the object `def_list` contains user defined function names:
   fn_get_info
   fn_hello
   fn_recursive_glob_search
   fn_regex_search_replace
   fn_regex_search_0
   fn_arcpy_table_to_excel
   fn_arcgis_table_to_df
   fn_arcgis_table_to_np_to_pd_df

 use ??{insert fn name} to inspect
 for example running `??fn_get_info` returns:
[1;31mSignature:[0m [0mfn_get_info[0m[1;33m([0m[0mname[0m[1;33m=[0m[1;34m'fn_get_info'

In [2]:
# test function
fn_regex_search_0('Mystic Lake GT10 E','\w+10')
fn_regex_search_replace('MysticLakeGT10E','\wT10','')
#fn_regex_search_replace('Mystic Lake  E','  ',' ')

'MysticLakeE'

In [3]:
# make a working copy 
copyfile = r"C:\Workspace\Geodata\MEP\outputs\MEP_Subwatersheds.shp"
original = r"C:\Workspace\Geodata\Massachusetts\MEP\CC_MV_Subwatersheds\Subwatersheds.shp"
arcpy.management.Copy(original, copyfile, "ShapeFile", None)

In [193]:
# dissolve the MEP subwatersheds data
outfile = os.path.join("MEP_Subwatersheds_Dissolve")
arcpy.management.Dissolve(copyfile, outfile, "FID", None, "MULTI_PART", "DISSOLVE_LINES")

make a new feature class for subwatershed travel time. 

In [194]:
# make a new feature class for subwatershed travel time. 
fn_string = """def fn_regex_search_0 (string,pattern,noneVal="NA"):
    '''
    returns the first match of a regular expression pattern search on a string
    '''
    import re
    x = re.search(pattern,string)
    if x is None: 
        x= [noneVal]    
    return(x[0])
    """
arcpy.management.CalculateField(copyfile,
                                "Travel_Tim",
                                "fn_regex_search_0(!SUBWATER_D!,'\wT10','NA')",
                                "PYTHON3",
                                fn_string, "TEXT", "NO_ENFORCE_DOMAINS")

make a new subwatershed name field that excludes travel time

In [195]:
# make a new subwatershed name field that excludes travel time
fn_string = """def fn_regex_search_replace(string,pattern,replacement):
    '''
    returns the a string with a pattern substituted by a replacement
    '''
    import re
    x = re.sub(pattern,replacement,string)
    return(x)"""
newField = "SUBW_NAME"
arcpy.management.CalculateField(copyfile,
                                newField,
                                """fn_regex_search_replace(!SUBWATER_N!,"\wT10.*","")""", 
                                "PYTHON3",
                                fn_string,
                                "TEXT",
                                "NO_ENFORCE_DOMAINS")

In [219]:
# dissolve subwatersheds by subwatershed name.
arcpy.management.Dissolve(copyfile,
                          "MEP_SUBW_NAME", 
                          "SUBW_NAME", None, "MULTI_PART", "DISSOLVE_LINES")

In [None]:
# extract the statewide lidar dataset with mask of subwatersheds
raster = r"C:\Workspace\Geodata\Massachusetts\LiDAR_DEM\LiDAR_DEM.gdb\LiDAR_DEM_INT_16bit"
mask = "MEP_Subwatersheds_Dissolve"
outname = "lidar_extr"
lidar_extr = arcpy.sa.ExtractByMask(raster,mask)

In [204]:
lidar_extr.save(outname)

zonal stats to calc 5th percentile of elevation in each subcatchment

In [209]:
raster = "lidar_extr"
poly = "outputs/MEP_Subwatersheds"
zonefield = "SUBW_NAME"
pct = 5  # 5% percentile
outname = "Lid_Sub_ZS"
Lid_Sub_ZS = arcpy.ia.ZonalStatistics(poly, 
                                      zonefield, 
                                      raster, 
                                      "PERCENTILE", 
                                      "DATA", 
                                      "CURRENT_SLICE", 
                                      pct, 
                                      "AUTO_DETECT"); 
Lid_Sub_ZS.save(os.path.join(outname))

In [212]:
a = "lidar_extr"
b = "Lid_Sub_ZS"
print(a,b)

lidar_extr Lid_Sub_ZS


In [214]:
lidar_le5pct = arcpy.ia.LessThanEqual(a,b); 
lidar_le5pct.save("lidar_le5pct")

In [216]:
# convert raster of lidar_le5pct to polygon
outfile = "ele5pct_poly"
poly = arcpy.conversion.RasterToPolygon("lidar_le5pct", outfile, "SIMPLIFY", "VALUE", "SINGLE_OUTER_PART", None)

In [217]:
myfun = """def fn(x):
    y = "GT5%"
    if x == 1: y = "LE5%"
    return(y)"""
# rename the field gridcode 
arcpy.management.CalculateField(outfile, 
                                "ele5pct", 
                                "fn(!gridcode!)", 
                                "PYTHON3", 
                                myfun, "TEXT", "NO_ENFORCE_DOMAINS")
#arcpy.management.AlterField(outfile, 'gridcode', 'ElevLE5pct', 'Elev <= 5% percentile')

In [218]:
# dissolve new polygon layer by elevation class 
arcpy.management.Dissolve("ele5pct_poly",
                          "ele5pct_poly_diss", 
                          "ele5pct", None, "MULTI_PART", "DISSOLVE_LINES")

In [223]:
# compute the identity (intersection) of elevation poly and watershed poly
infeat = "ele5pct_poly_diss"
identfeat = "outputs/MEP_Subwatersheds"
outname = "subs_le5pct"
arcpy.analysis.Identity(infeat, identfeat, 
                        outname, "ALL", None, "NO_RELATIONSHIPS")

In [224]:
# compute the identity (intersection) of elevation poly and watershed poly
# without travel times
infeat = "ele5pct_poly_diss"
identfeat = "MEP_SUBW_NAME"
outname = "subs_travtim_le5pct"
arcpy.analysis.Identity(infeat, identfeat, 
                        outname, "ALL", None, "NO_RELATIONSHIPS")

In [225]:
# compute the identity (intersection) of elevation poly and watershed poly
# without travel times
infeat = "subs_le5pct"
identfeat = "MEP_TaxPar"
outname = "subs_le5_tax"
arcpy.analysis.Identity(infeat, identfeat, 
                        outname, "ALL", None, "NO_RELATIONSHIPS")

In [None]:
# compute the identity (intersection) of elevation poly and watershed poly
# withtravel times
infeat = "subs_travtim_le5pct"
identfeat = "MEP_TaxPar"
outname = "subs_tt_le5_tax"
arcpy.analysis.Identity(infeat, identfeat, 
                        outname, "ALL", None, "NO_RELATIONSHIPS")

In [None]:
# CAUTION: DO NOT RUN THIS CELL!!!!!!!!!!!!!!
# intersect sheds with soil
arcpy.analysis.Identity(r"C:\Workspace\Geodata\MEP\Default.gdb\subs_le5pct", 
                        r"C:\Workspace\Geodata\Massachusetts\Soils_MassGIS.gdb\SOILS_MUPOLYGON_TOP20", 
                        r"C:\Workspace\Geodata\MEP\Default.gdb\subs_le5pct_soil20", "ALL", None, "NO_RELATIONSHIPS")

In [4]:
# CAUTION: DO NOT RUN THIS CELL!!!!!!!!!!!!!!
# intersect 
infeat = r"C:\Workspace\Geodata\Massachusetts\lclu_gdb\MA_LCLU2016.gdb\LANDCOVER_LANDUSE_POLY"
clipfeat = "MEP_SUBW_NAME"
outname = "lclu16_clip"
arcpy.analysis.Clip(infeat, clipfeat, "lclu16_clip", None)

KeyboardInterrupt: 

In [None]:
# compute the itentity intersection of the output with the LCLU layer.
infeat = "subs_le5pct"
identfeat = "lclu16_clip"
outname = 'subs_le5_lclu16'
arcpy.analysis.Identity(infeat, identfeat, outname, "ALL", None, "NO_RELATIONSHIPS")

In [None]:
# 2023-03-30 RESUME HERE! 
# export feature table data 
print(def_list)

# the functions below provide options to export feature tables 
??fn_arcpy_table_to_excel
#fn_arcpy_table_to_excel(inFeaturePath,outTablePath=odr,outTableName="SUBS_TaxParAssess.xlsx")

??fn_arcgis_table_to_df #this one works better

??fn_arcgis_table_to_np_to_pd_df # this one doesn't work very well

In [121]:
inFeat = 'subs_le5_tax'
field_names = [f.name for f in arcpy.ListFields(inFeat)]
print(field_names)

# convert feature table to pandas data frame
df = fn_arcgis_table_to_df(in_fc='subs_le5_tax')

# remove unwanted columns
selected_fields = ['OBJECTID','SUBW_NAME','LOC_ID','ele5pct','Travel_Tim',"EMBAY_NAME","SUBWATER_N","Shape_Length","Shape_Area"]
df_select =df.filter(selected_fields,axis=1) # filter columns on index; use axis = 1 for cols use axis = 0 for rows)

# save pickle
df.to_pickle(os.path.join(odr,'df_'+infeat+'.pkl'))
df_select.to_pickle(os.path.join(odr,'df_'+infeat+'_select.pkl'))

# save csv
df.to_csv(os.path.join(odr,'df_'+infeat+'.csv'))
df_select.to_csv(os.path.join(odr,'df_'+infeat+'_select.csv'))

['OBJECTID', 'Shape', 'FID_subs_le5pct', 'FID_ele5pct_poly_diss', 'ele5pct', 'FID_MEP_Subwatersheds', 'OBJECTID_1', 'OBJECTID_12', 'SUBWATER_I', 'SUBWATER_N', 'SUBWATER_D', 'EMBAY_ID', 'EMBAY_NAME', 'EMBAY_DISP', 'X_Centroid', 'Y_Centroid', 'Acreage', 'GeoString', 'Shape_Leng', 'Travel_Tim', 'SUBW_NAME', 'FID_MEP_TaxPar', 'MAP_PAR_ID', 'LOC_ID', 'POLY_TYPE', 'MAP_NO', 'SOURCE', 'PLAN_ID', 'LAST_EDIT', 'BND_CHK', 'NO_MATCH', 'TOWN_ID', 'MERGE_SRC', 'Shape_Length', 'Shape_Area']


In [39]:
infeat = 'subs_le5pct_soil20'
field_names = [f.name for f in arcpy.ListFields(infeat)]
print(field_names)

# convert feature table to pandas data frame
df = fn_arcgis_table_to_df(in_fc=infeat)

# remove unwanted columns
selected_fields = ['OBJECTID','SUBW_NAME','LOC_ID','ele5pct','Travel_Tim',"EMBAY_NAME","SUBWATER_N","Shape_Length","Shape_Area", # watershed attributes
                  "COMPNAME","SLOPE","SLOPE_1","FRMLNDCLS",'HYDROLGRP','HYDRCRATNG','DRAINCLASS','DEP2WATTBL'] #soil attributes
df_select =df.filter(selected_fields,axis=1) # filter columns on index; use axis = 1 for cols use axis = 0 for rows)

# save pickle
df.to_pickle(os.path.join(odr,'df_'+infeat+'.pkl'))
df_select.to_pickle(os.path.join(odr,'df_'+infeat+'_select.pkl'))

# save csv
df.to_csv(os.path.join(odr,'df_'+infeat+'.csv'))
df_select.to_csv(os.path.join(odr,'df_'+infeat+'_select.csv'))

['OBJECTID', 'Shape', 'FID_subs_le5pct', 'FID_ele5pct_poly_diss', 'ele5pct', 'FID_MEP_Subwatersheds', 'OBJECTID_1', 'OBJECTID_12', 'SUBWATER_I', 'SUBWATER_N', 'SUBWATER_D', 'EMBAY_ID', 'EMBAY_NAME', 'EMBAY_DISP', 'X_Centroid', 'Y_Centroid', 'Acreage', 'GeoString', 'Shape_Leng', 'Travel_Tim', 'SUBW_NAME', 'FID_SOILS_MUPOLYGON_TOP20', 'AREASYMBOL', 'SPATIALVER', 'MUSYM', 'MUKEY', 'SS_AREA', 'MUSYM_AREA', 'SLOPE', 'AREANAME', 'MUNAME', 'COMPNAME', 'MUKIND', 'FRMLNDCLS', 'HYDRCRATNG', 'DRAINCLASS', 'MINSURFTEXT', 'TFACTOR', 'AWS100', 'AWS25', 'DEP2WATTBL', 'DWELLWB', 'HYDROLGRP', 'NIRRLCC', 'ROADS', 'SEPTANKAF', 'SLOPE_1', 'FLOODING', 'PONDING', 'CORCONCRET', 'TAXCLNAME', 'CM2RESLYR', 'RESKIND', 'PARMATNM', 'UNIFSOILCL', 'AASHTO', 'KFACTRF', 'KFACTWS', 'PHWATER', 'CLAY', 'KSAT', 'OM', 'SAND', 'NLEACHING', 'Shape_Length', 'Shape_Area']


In [8]:
infeat = 'subs_ele5_lclu16'
field_names = [f.name for f in arcpy.ListFields(infeat)]
print(field_names)

# convert feature table to pandas data frame
df = fn_arcgis_table_to_df(in_fc=infeat)

# remove unwanted columns
selected_fields = ['OBJECTID','SUBW_NAME','LOC_ID','ele5pct','Travel_Tim',"EMBAY_NAME","SUBWATER_N","Shape_Length","Shape_Area", # watershed attributes
                  "USE_CODE","USEGENCODE","COVERCODE","COVERNAME",'USEGENNAME'] #land use attributes
df_select =df.filter(selected_fields,axis=1) # filter columns on index; use axis = 1 for cols use axis = 0 for rows)

# save pickle
df.to_pickle(os.path.join(odr,'df_'+infeat+'.pkl'))
df_select.to_pickle(os.path.join(odr,'df_'+infeat+'_select.pkl'))

# save csv
df.to_csv(os.path.join(odr,'df_'+infeat+'.csv'))
df_select.to_csv(os.path.join(odr,'df_'+infeat+'_select.csv'))

['OBJECTID', 'Shape', 'FID_subs_le5pct', 'FID_ele5pct_poly_diss', 'ele5pct', 'FID_MEP_Subwatersheds', 'OBJECTID_1', 'OBJECTID_12', 'SUBWATER_I', 'SUBWATER_N', 'SUBWATER_D', 'EMBAY_ID', 'EMBAY_NAME', 'EMBAY_DISP', 'X_Centroid', 'Y_Centroid', 'Acreage', 'GeoString', 'Shape_Leng', 'Travel_Tim', 'SUBW_NAME', 'FID_LANDCOVER_LANDUSE_POLY', 'COVERNAME', 'COVERCODE', 'USEGENNAME', 'USEGENCODE', 'USE_CODE', 'POLY_TYPE', 'FY', 'TOWN_ID', 'TILENAME', 'Shape_Length', 'Shape_Area']


In [196]:
infeat = 'subs_le5pct'
field_names = [f.name for f in arcpy.ListFields(infeat)]
print(field_names)

# convert feature table to pandas data frame
df = fn_arcgis_table_to_df(in_fc=infeat)

# remove unwanted columns
selected_fields = ['OBJECTID','SUBW_NAME','ele5pct','Travel_Tim',"EMBAY_NAME","SUBWATER_N","Shape_Length","Shape_Area"]
df_select =df.filter(selected_fields,axis=1) # filter columns on index; use axis = 1 for cols use axis = 0 for rows)

# save pickle
df.to_pickle(os.path.join(odr,'df_'+infeat+'.pkl'))
df_select.to_pickle(os.path.join(odr,'df_'+infeat+'_select.pkl'))

# save csv
df.to_csv(os.path.join(odr,'df_'+infeat+'.csv'))
df_select.to_csv(os.path.join(odr,'df_'+infeat+'_select.csv'))

['OBJECTID', 'Shape', 'FID_ele5pct_poly_diss', 'ele5pct', 'FID_MEP_Subwatersheds', 'OBJECTID_1', 'OBJECTID_12', 'SUBWATER_I', 'SUBWATER_N', 'SUBWATER_D', 'EMBAY_ID', 'EMBAY_NAME', 'EMBAY_DISP', 'X_Centroid', 'Y_Centroid', 'Acreage', 'GeoString', 'Shape_Leng', 'Travel_Tim', 'SUBW_NAME', 'Shape_Length', 'Shape_Area']


In [9]:
# Join table with tax parcel assessor data
df_select.head()

Unnamed: 0_level_0,SUBW_NAME,ele5pct,Travel_Tim,EMBAY_NAME,SUBWATER_N,Shape_Length,Shape_Area,USE_CODE,USEGENCODE,COVERCODE,COVERNAME,USEGENNAME
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,NashaquitsaPond,GT5%,,MenemshaSquibnocketPond,NashaquitsaPond,4.705809,-0.001061,,0,0,,
2,TashmooPondMain,GT5%,,LakeTashmoo,TashmooPondMain,4.579976,0.438198,,0,20,Bare Land,Unknown
3,TashmooPondMain,GT5%,,LakeTashmoo,TashmooPondMain,22.081875,11.68219,,0,20,Bare Land,Unknown
4,TashmooPondMain,GT5%,,LakeTashmoo,TashmooPondMain,38.322392,13.874547,,0,20,Bare Land,Unknown
5,TashmooPondMain,GT5%,,LakeTashmoo,TashmooPondMain,2.206339,0.198506,,0,2,Impervious,Unknown


make a new feature class subwatershed ids exluding travel time

make new sub watershed layer that combines subwatersheds that were split by travel time

In [None]:
# Make a point feature layer from monitoring coordinates
# Set the local variables
in_table = r"C:\Users\Adrian.Wiegman\OneDrive - USDA\Research\Nload\MEP\MEP_Summary4_AW.xlsx\Coords$"
#in_table = r"C:\Users\Adrian.Wiegman\OneDrive - USDA\Research\Nload\MEP\MEP_Monitoring_Site_Coords.csv"
out_feature_class = "MEP_Monitoring_Site_Coords"
x_coords = "Lon"
y_coords = "Lat"

# Make the XY event layer...
arcpy.management.XYTableToPoint(in_table=in_table, 
                                out_feature_class=out_feature_class,
                                x_field=x_coords, 
                                y_field=y_coords)

# Print the total rows
print(arcpy.management.GetCount(out_feature_class))
#arcpy.management.AddJoin(out_feature_class, "OBJECTID", r"C:\Users\Adrian.Wiegman\OneDrive - USDA\Research\Nload\MEP\MEP_Monitoring_Site_Coords.csv", "OID", "KEEP_ALL", "NO_INDEX_JOIN_FIELDS")

In [None]:
# Appendix

In [None]:
## Unused code snippets