# Preprocess Monitoring Points

Adrian Wiegman | arhwiegman.github.io | adrian.wiegman@usda.gov

2023-03-30

---

In this notebook extract subwatershed ids for each monitoring within the [MEP study area](https://www.mass.gov/guides/the-massachusetts-estuaries-project-and-reports). 

input data is an excel table of points coordinates and watershed ids. 

the output will be a list of subwatershed ids with asociated point ids. 

Processing Steps


the fields in the Assess data will be used generate landcover classes for N loading
then the TaxPar LOC_ID's will be used to index the landcover classes for each discrete
the merged TaxPar layer will be intersected with watershed boundaries

In [38]:
# this codeblock sets up the environment from jupyter notebooks
setup_notebook = "C:/Users/Adrian.Wiegman/Documents/GitHub/Wiegman_USDA_ARS/MEP/_Setup.ipynb"
%run $setup_notebook # magic command to run the notebook 
fn_hello()

***
loading python modules...

  `module_list` contains names of all loaded modules

...module loading complete

***
setting up arcpy environment...

 input file directory (`idr`): C:\Workspace\Geodata\Massachusetts\
 working directory (`wdr`): C:\Workspace\Geodata\MEP\
 temp dir (`tdr`): C:\Workspace\Geodata\MEP\temp
 output dir (`odr`): C:\Workspace\Geodata\MEP\outputs
 default geodatabase path: C:\Workspace\Geodata\MEP\Default.gdb
 output coordinate system: NAD_1983_UTM_Zone_19N

... env setup complete

***
loading functions...

type `fn_`+TAB to autocomplete

 the object `def_list` contains user defined function names:
   fn_get_info
   fn_hello
   fn_recursive_glob_search
   fn_regex_search_replace
   fn_regex_search_0
   fn_arcpy_table_to_excel
   fn_arcgis_table_to_df
   fn_arcgis_table_to_np_to_pd_df

 use ??{insert fn name} to inspect
 for example running `??fn_get_info` returns:
[1;31mSignature:[0m [0mfn_get_info[0m[1;33m([0m[0mname[0m[1;33m=[0m[1;34m'fn_get_info'

In [39]:
inpath = os.path.join(odr,'MEP_Monitoring_Sites_MeasOnly_TableToExcel.xlsx')
df_in = pd.read_excel(inpath)
#df_in = fn_arcgis_table_to_df('MEP_Monitoring_Sites_MeasOnly').reset_index()
df_in.rename(columns={"OBJECTID":"FID"},inplace=True)
df_in.info()
df_in.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 98 entries, 0 to 97
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   FID         98 non-null     int64  
 1   Lat         98 non-null     float64
 2   Lon         98 non-null     float64
 3   Region      98 non-null     object 
 4   MEP         98 non-null     object 
 5   Site_Name   98 non-null     object 
 6   Region_MEP  98 non-null     object 
 7   SUBW_NAME   68 non-null     object 
 8   Notes_AW    11 non-null     object 
dtypes: float64(2), int64(1), object(6)
memory usage: 7.0+ KB


Unnamed: 0,FID,Lat,Lon,Region,MEP,Site_Name,Region_MEP,SUBW_NAME,Notes_AW
0,1,41.681859,-70.918844,Buzzards Bay,Acushnet,Acushnet River,Buzzards Bay > Acushnet > Acushnet River,,
1,2,41.681859,-70.918844,Buzzards Bay,Acushnet,Acushnet River,Buzzards Bay > Acushnet > Acushnet River,,
2,3,41.681859,-70.918844,Buzzards Bay,Acushnet,Acushnet River,Buzzards Bay > Acushnet > Acushnet River,,
3,4,41.681859,-70.918844,Buzzards Bay,Acushnet,Acushnet River,Buzzards Bay > Acushnet > Acushnet River,,
4,5,41.553741,-71.126612,Buzzards Bay,Westport,Adamsville Brook,Buzzards Bay > Westport > Adamsville Brook,,


In [50]:
#df['SUBW_NAME'].replace(r'^\s*$', np.nan, regex=True)
df = df_in.drop('Notes_AW',axis=1).dropna() # make a working copy
df['SUBW_NAME'].str.strip(to_strip = None) # remove whitespace
df.reset_index(inplace=True)
df.info()
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68 entries, 0 to 67
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   index       68 non-null     int64  
 1   FID         68 non-null     int64  
 2   Lat         68 non-null     float64
 3   Lon         68 non-null     float64
 4   Region      68 non-null     object 
 5   MEP         68 non-null     object 
 6   Site_Name   68 non-null     object 
 7   Region_MEP  68 non-null     object 
 8   SUBW_NAME   68 non-null     object 
dtypes: float64(2), int64(2), object(5)
memory usage: 4.9+ KB
   index  FID        Lat        Lon    Region         MEP        Site_Name  \
0     20   21  41.714883 -70.384253  Cape Cod  Barnstable      Alder Brook   
1     21   22  41.726801 -70.602171  Cape Cod    Phinneys       Back River   
2     22   23  41.578791 -70.563246  Cape Cod         GBB     Backus Brook   
3     23   24  41.711745 -70.379438  Cape Cod  Barnstable  Boat Cove Cree

In [49]:
df.SUBW_NAME

0                                        AlderCreek
1                              InnerBackRiverStream
2     MillPondFalmouth;BackusBrook;AshumetPond(11%)
3                  BoatCoveCreek;MillPondBarnstable
4                      BournesBrook;AshumetPond(4%)
                          ...                      
63                                 UpperTashmooPond
64                                     MillBrookWTI
65                                  PeasePointBrook
66                                    TiasquamRiver
67                                       BlackBrook
Name: SUBW_NAME, Length: 68, dtype: object

In [59]:
SUBW_NAME = [] # name of subwatershed in MEP data
PROPORTION = [] # proportion of flow draining to monitoring site at corresponding FID
FID = [] # FID of monitoring site
for i in range(len(df.FID)):
    print(i)
    #print(i,"|",df.Region_MEP[i])
    S = df.SUBW_NAME[i].split(";") # split column SUBW_NAME on semicolen
    #print(S)
    for s in S:
        match = fn_regex_search_0(s,"\(.*\)") # find parentheses
        replaced = fn_regex_search_replace(s,"\(.*\)","") # replace parentheses with blank

        digit = fn_regex_search_0(s,"\d+") # extract numeric digits
        if digit is 'NA':
            p = 1
        else:
            p = int(digit)/100 # calculate proportion of flow from digits
        print("  ",s,"|",replaced,"|",p)
        
        if fn_regex_search_0(replaced,"\w+") is 'NA':
            print("      empty... skipping to next")
            next # skip columns that dont have match
        
        # append values to lists
        SUBW_NAME.append(replaced)
        PROPORTION.append(p)
        FID.append(df.FID[i])

0
   AlderCreek | AlderCreek | 1
1
   InnerBackRiverStream | InnerBackRiverStream | 1
2
   MillPondFalmouth | MillPondFalmouth | 1
   BackusBrook | BackusBrook | 1
   AshumetPond(11%) | AshumetPond | 0.11
3
   BoatCoveCreek | BoatCoveCreek | 1
   MillPondBarnstable | MillPondBarnstable | 1
4
   BournesBrook | BournesBrook | 1
   AshumetPond(4%) | AshumetPond | 0.04
5
   BridgeCreek | BridgeCreek | 1
6
   FilendsPond | FilendsPond | 1
7
   GaugeBayRd | GaugeBayRd | 1
   CedarLakeFalmouth(54%) | CedarLakeFalmouth | 0.54
   EdmundsPond(50%) | EdmundsPond | 0.5
8
   RockHarborStream | RockHarborStream | 1
   CedarPond | CedarPond | 1
9
   BogWetlandYarmouth | BogWetlandYarmouth | 1
   JabezNedsPond | JabezNedsPond | 1
   HorsePond | HorsePond | 1
   WellHorsePond | WellHorsePond | 1
10
   ChaseGardenCreekFresh | ChaseGardenCreekFresh | 1
11
   ChildsRiver | ChildsRiver | 1
   JohnsPond(13%) | JohnsPond | 0.13
   AshumetPond(4%) | AshumetPond | 0.04
   GrassyPondFalmouth(46%) | GrassyPondFa

In [60]:
# create output dataframe
print(len(FID),
len(SUBW_NAME),
len(PROPORTION))
_ = pd.DataFrame({'FID':FID,
                       'SUBW_NAME':SUBW_NAME,
                       'PROPORTION':PROPORTION})
print(_)
print(df[['FID','Lat','Lon','Region_MEP']])
df_out = _.merge(df[['FID','Lat','Lon','Region_MEP']],on='FID')
print(df_out)
df_out.to_csv(os.path.join(odr,"df_monitoring_point_subs.csv"))

136 136 136
     FID             SUBW_NAME  PROPORTION
0     21            AlderCreek        1.00
1     22  InnerBackRiverStream        1.00
2     23      MillPondFalmouth        1.00
3     23           BackusBrook        1.00
4     23           AshumetPond        0.11
..   ...                   ...         ...
131   87      UpperTashmooPond        1.00
132   94          MillBrookWTI        1.00
133   96       PeasePointBrook        1.00
134   97         TiasquamRiver        1.00
135   98            BlackBrook        1.00

[136 rows x 3 columns]
    FID        Lat        Lon                               Region_MEP
0    21  41.714883 -70.384253      Cape Cod > Barnstable > Alder Brook
1    22  41.726801 -70.602171         Cape Cod > Phinneys > Back River
2    23  41.578791 -70.563246            Cape Cod > GBB > Backus Brook
3    24  41.711745 -70.379438  Cape Cod > Barnstable > Boat Cove Creek
4    25  41.576468 -70.551280           Cape Cod > GBB > Bournes Brook
..  ...        ...    