# Batch HydroLink 
## Developed By: Daniel Wieferich (USGS)
## 20171005

### Note: This code is a work in progress.  Any suggestions and comments are welcomed.
#### Also an issue was detected in the NHD High Resolution Services, which is returning no data when it should not.  This is currently being investigated with the NHD team.

### This code performs a hydrolink batch process on a text file (.csv ) or shapefile (.shp) using user defined latitude and longitude fields.  The code returns the reachcode and measure of the closest position on the High Resolution National Hydrography Dataset and the Medium Resolution NHDPlusV2.1 using web services.  The code also uses snap distance and stream name to help quantify a level of certainty.  Levels of certainty currently exist only for the NHDPlusV2.1, due to the availability of needed information from current web services.

### Next steps: 
1. Work with the NHD High Resolution team to see if additional information can be returned from services to help with assigning a level of certainty. (in progress)
2. It would be great to get a measure of distance to closest confluence as a measure of certainty.  Through manual linkage in the past I've noticed that points within 50m of a confluence are more likely to link to the wrong reach.
3. The current code assumes lat/lon are in NAD83.  The code should be altered to accept various crs inputs.
4. Improve upon error checking and notifications
5. Allow for join of hydro-linked data to original dataset?

In [4]:
#The importFile function deals with importing csv or shps into a dataframe.  
def importFile (inputFileName, latFieldName, lonFieldName, streamFieldName, inputId):
    if inputFileName.endswith('.csv'):
        print ('\n' + 'reading csv file' +'\n')
        try:
            df = pd.read_csv(inputFileName)
        except KeyError:
            print ('file did not properly import, verify file name and rerun')
            #It would be nice here to reask for inputFileName and then restart at try statement
    elif inputFileName.endswith('.shp'):
        print('\n' + 'reading shapefile' + '\n')
        try:
            df = gp.GeoDataFrame.from_file(inputFileName)
        except KeyError:
            print ('file did not properly import, verify file name and rerun')
            #It would be nice here to reask for inputFileName and then restart at try statement
    else:
        print('File type not currently accepted. Please try .csv or .shp')
        
    if latFieldName in df and lonFieldName in df and streamFieldName in df and inputId in df:
        df = df[[inputId, latFieldName, lonFieldName, streamFieldName]].copy()
        df = df.rename(columns={inputId: 'id', latFieldName: 'lat', lonFieldName: 'lon', streamFieldName: 'stream'})
        return df
    else: 
        print ('verify field names and rerun')
        
#Using initial lat,lon (in nad83) to link to High Resolution NHD
def hydrolinkHr(lat,lon,inputId):
    baseUrlHR = 'https://edits.nationalmap.gov/arcgis/rest/services/HEM/NHDHigh/MapServer/'
    getHrReach = baseUrlHR + 'exts/HEM_EditEvents_SOE/HEMGetReachcodeFromXY'
    getHrX = baseUrlHR + 'exts/HEM_EditEvents_SOE/HEMPointEvents'
    
    #Define variables, set initially as null values using similar denotation as the HydroLink Tool
    reachcodeHr = 'NO REACHCODE'
    measHigh = -999
    smDateHigh = ' '
    permId = 'NO PERM ID'
    xyHigh = '-999'
    
    #Structure original lat/lon into format needed for HEM SOE
    xy = '{"x":' + str(lon) + ',"y":' + str(lat) + ', "spatialReference": {"wkid":4269}}'
    
    if lat and (float(lat) > 24 and float(lat)<50) and lon and (float(lon) < -66 and float(lon)> -125):
        payload = {
            "point": xy ,
            "selectionLayerName": "NHDFLOWLINE",
            "selectionType": "TOPDOWNSTREAM",
            "searchToleranceMeters": 10000,
            "outWKID": 4269,
            "f": "json"}
        
        #Connects to web service
        r = requests.post(getHrReach,params=payload,verify=False).json()
        
        try:
            
                if r['resultStatus'] == 'success' and r['features']:
                    reachcodeHr = r['features'][0]['attributes']['REACHCODE']
                    payload2 = {
                      "point": xy ,
                      "reachcode": reachcodeHr,
                      "outWKID": 4269,
                      "f": "json"}

                    r2 = requests.post(getHrX,params=payload2,verify=False).json()
                    measHigh = r2['features'][0]['attributes']['MEASURE']
                    smDateHigh = r2['features'][0]['attributes']['REACHSMDATE']
                    permId = r2['features'][0]['attributes']['PERMANENT_IDENTIFIER']
                    xyHigh = r2['features'][0]['geometry']
                    return (reachcodeHr,measHigh,smDateHigh,permId,xyHigh)

                else:
                    return (reachcodeHr,measHigh,smDateHigh,permId,xyHigh)
        except:
            print ('Failed to process HR for: ' + str(inputId))
            return (reachcodeHr,measHigh,smDateHigh,permId,xyHigh)
    else:
        return (reachcodeHr,measHigh,smDateHigh,permId,xyHigh)

        
def hydrolinkMr(lat,lon,inputId):
    
    #Define variables, set initially as null values using similar denotation as the HydroLink Tool
    snapDist = -999
    xyMed = ''
    comidMed = 'NO COMID'
    reachMed = 'NO REACHCODE'
    measMed = -999
    gnisMed = ''
    
    #Additional Info about medium resolution service : https://www.epa.gov/waterdata/point-indexing-service
    xyMed = 'POINT(' + str(lon) + ' ' + str(lat) + ')'
    
    payloadMr = {
        'optNHDPlusDataset': '2.1',
        'pGeometry': xyMed,
        'pGeometryMod': 'SRID=8265',
        'pOutputPathFlag': 'TRUE',
        'pPointIndexingMaxDist': '2', #Kilometers
        'pPointIndexingMethod':'DISTANCE',
        'pResolution': '3',
        'pPointIndexingFcodeDeny' : [56600],
        'pReturnFlowlineGeomFlag':'FALSE',
        'f':'json'}
    
    baseUrlMr = 'https://ofmpub.epa.gov/waters10/PointIndexing.Service'
    
    #Connects to web service
    rMed = requests.post(baseUrlMr,params=payloadMr, verify=False).json()
    
    try:
        snapDist = rMed['output']['total_distance']
        xyMed = rMed['output']['end_point']['coordinates']
        comidMed = rMed['output']['ary_flowlines'][0]['comid']
        reachMed = rMed['output']['ary_flowlines'][0]['reachcode']
        measMed = rMed['output']['ary_flowlines'][0]['fmeasure']
        gnisMed = rMed['output']['ary_flowlines'][0]['gnis_name']
        return (snapDist,xyMed,comidMed,reachMed,measMed,gnisMed)
       
    except:
        print 
        return (snapDist,xyMed,comidMed,reachMed,measMed,gnisMed)
    

def cleanStreamName(stream):
    
    if stream:
        #remove case for case sensitive operations
        stream = stream.lower()
        
        #replace common abbreviations, this needs improvement but be careful not to replace strings we dont want to
        #this code currently assumes GNIS_NAME never contains abbreviations... something to verify
        #If you have a better way to do this let me know!!!!
        stream = stream.replace(' st ', 'stream')
        stream = stream.replace(' st.', 'stream')
        stream = stream.replace(' rv ', 'river')
        stream = stream.replace(' rv.', 'river')
        stream = stream.replace(' trib.', 'tributary')
        stream = stream.replace(' trib)', 'tributary')
        stream = stream.replace(' trib ', 'tributary')
        stream = stream.replace(' ck ', 'creek')
        stream = stream.replace(' ck.', 'creek')
        stream = stream.replace(' br ', 'branch')
        stream = stream.replace(' br.', 'branch')
        
        return stream
    else:
        return stream
                       
def mrCertainty(streamClean, gnisMed):
    import difflib
    
    
    #Stream Name Match
    if streamClean and gnisMed:
        gnisMed = gnisMed.lower()
        #print ('\n' + gnisMed)
        #print (streamClean)

        if gnisMed == streamClean:
            gnisCertMr = 1
            #print (gnisCert)
        #do not want name of main stem to be fuzzy matched.  To avoid remove those names with tributary or branch in them
        elif 'tributary' in streamClean or 'branch' in streamClean:
            gnisCertMr = 0
            #print (gnisCert)
        else:
            matchRatio = difflib.SequenceMatcher(lambda x: x == " ",streamClean, gnisMed).ratio()
            
            # From Python Documentation (https://docs.python.org/3/library/difflib.html):
            #"As a rule of thumb, a ratio() value over 0.6 means the sequences are close matches:"
            # At some point we should validate this rule of thumb but figured this is a good starting place
            if matchRatio >= 0.6:
                gnisCertMr = 1
                #print (gnisCert)
            #this is likely a match but less certain
            elif 0.6 > matchRatio >= 0.5:
                gnisCertMr = 0.5
                #print (gnisCert)
            else:
                gnisCertMr = 0
                #print (gnisCert)
                
    #else no stream name is supplied for one or both datasets, therefor stream name does not help improve certainty
    else:
        gnisCertMr = 0
    #--------------------------------------------------------------------   
    #Distance between original lat/lon and reach/measure linkage
    #Note: snapDist is returned in Kilometers
    if snapDist or snapDist==0:
        #print (snapDist)
        if snapDist == -999:
            distCertMr = 0
            #print (distCertMr)
        elif snapDist <=0.050:
            distCertMr = 1
            #print (distCertMr)
        elif 0.200<= snapDist <0.050:
            distCertMr = 0.5
            #print (distCertMr)
        else:
            distCertMr = 0
            #print (distCertMr)
    else:
        distCertMr = 0
        
        
    #ADD CODE TO CHECK DISTANCE TO CLOSEST CONFLUENCE, if less than ?50m? flag as needs visual confirmation
           
        
    return gnisCertMr, distCertMr
        

In [6]:
import requests
import json
import geopandas as gp
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

#User inputs variables needed for code to run
#inputFileName = input("Enter file name, including extension (only accepts .csv and .shp): ")
#latFieldName = input("Enter field name for latitude, note this is case sensitive: ")
#lonFieldName = input("Enter field name for longitude, note this is case sensitive: ")
#streamFieldName = input("Enter field name for stream name, note this is case sensitive: ")
#inputId = input("Enter field name for identifier, note this is case sensitive: ")

#Alternative to user input, info can be hard coded here
#inputFileName = 'testData/test2.shp'
#latFieldName = 'DamLatitud'
#lonFieldName = 'DamLongitu'
#streamFieldName = 'DamRiverNa'
#inputId = 'DamName'

#Alternative to user input, info can be hard coded here
inputFileName = 'testAR.csv'
latFieldName = 'lat'
lonFieldName = 'lon'
streamFieldName = 'riv'
inputId = 'uniqueId'


outData = []

df = importFile(inputFileName, latFieldName, lonFieldName, streamFieldName, inputId)

for row in df.itertuples():
        
        #Define variables based on row,field values
        lat = row.lat
        lon = row.lon
        inputId = row.id
        stream = str(row.stream)
        
        print ('working on : ' + str(inputId))
                
        reachcodeHr,measHigh,smDateHigh,permId,xyHigh = hydrolinkHr(lat,lon,inputId)
        snapDist,xyMed,comidMed,reachMed,measMed,gnisMed = hydrolinkMr(lat,lon,inputId)
        streamClean = cleanStreamName(stream)
        gnisCertMr, distCertMr = mrCertainty(streamClean, gnisMed)
        
        #record data in outData, this will be used to create dataframe
        outData.append({"id":inputId,"ReachHigh":reachcodeHr,"ReachMed":reachMed,"MeasHigh":measHigh,"MeasMed":measMed,"SmdateHigh":smDateHigh, "Comid":comidMed, "permIdHr": permId, "xyHigh":xyHigh, "xyMed":xyMed,"gnisCertMr":gnisCertMr, "distCertMr":distCertMr, "gnisNameM": gnisMed })

#Create Dataframe with hydro-link data        
outDf = pd.DataFrame(outData)


reading csv file

working on : AK-001
working on : AK-002
working on : AK-003
working on : AK-004
working on : AK-005
working on : AK-006
working on : AK-007
working on : AK-008
working on : AK-009
working on : AL-001
working on : AL-002
working on : AL-003
working on : AL-004
working on : AL-005
working on : AL-006
working on : AR-001
working on : AR-002
working on : AR-003
working on : AR-004
working on : AZ-002
working on : AZ-003
working on : AZ-004
working on : CA-001
working on : CA-002
working on : CA-003
working on : CA-004
working on : CA-005
working on : CA-006
working on : CA-007
working on : CA-008
working on : CA-009
working on : CA-010
working on : CA-011
working on : CA-012
working on : CA-013
working on : CA-014
working on : CA-015
working on : CA-016
working on : CA-017
working on : CA-018
working on : CA-019
working on : CA-020
working on : CA-021
working on : CA-022
working on : CA-023
working on : CA-024
working on : CA-025
working on : CA-026
working on : CA-027
w

working on : ME-029
working on : ME-030
working on : ME-031
working on : ME-032
working on : ME-033
working on : ME-034
working on : MI-001
working on : MI-002
working on : MI-003
working on : MI-004
working on : MI-005
working on : MI-006
working on : MI-007
working on : MI-008
working on : MI-009
working on : MI-010
working on : MI-011
working on : MI-012
working on : MI-013
working on : MI-014
working on : MI-015
working on : MI-016
working on : MI-017
working on : MI-018
working on : MI-019
working on : MI-020
working on : MI-021
working on : MI-022
working on : MI-023
working on : MI-024
working on : MI-025
working on : MI-026
working on : MI-027
working on : MI-028
working on : MI-029
working on : MI-030
working on : MI-032
working on : MI-033
Failed to process HR for: MI-033
working on : MI-034
working on : MI-035
working on : MI-036
working on : MI-038
working on : MI-039
working on : MI-040
working on : MI-041
working on : MI-042
working on : MI-043
working on : MI-044
working

working on : OR-034
working on : OR-035
working on : OR-036
working on : OR-037
working on : OR-038
working on : OR-039
working on : OR-040
working on : OR-041
working on : OR-042
working on : OR-043
Failed to process HR for: OR-043
working on : OR-044
working on : OR-045
working on : OR-046
working on : OR-047
working on : OR-048
working on : OR-049
working on : OR-050
working on : OR-051
working on : OR-052
working on : PA-001
working on : PA-002
working on : PA-003
working on : PA-004
working on : PA-005
working on : PA-006
working on : PA-007
working on : PA-008
working on : PA-009
working on : PA-010
working on : PA-011
working on : PA-012
working on : PA-013
working on : PA-014
working on : PA-015
working on : PA-016
working on : PA-017
working on : PA-018
working on : PA-019
working on : PA-020
working on : PA-021
working on : PA-022
working on : PA-023
working on : PA-024
working on : PA-025
working on : PA-026
Failed to process HR for: PA-026
working on : PA-027
working on : P

working on : VA-004
working on : VA-005
Failed to process HR for: VA-005
working on : VA-006
working on : VA-007
working on : VA-008
working on : VA-009
working on : VA-010
working on : VA-011
working on : VA-012
working on : VA-013
working on : VA-014
working on : VA-015
working on : VA-016
working on : VA-017
working on : VA-018
working on : VA-019
working on : VA-020
working on : VA-021
working on : VA-022
working on : VA-023
Failed to process HR for: VA-023
working on : VA-024
working on : VA-025
working on : VA-026
working on : VA-027
working on : VA-028
working on : VA-029
working on : VA-030
working on : VA-031
working on : VA-032
working on : VA-033
working on : VA-034
working on : VT-001
working on : VT-002
working on : VT-003
working on : VT-004
working on : VT-005
Failed to process HR for: VT-005
working on : VT-006
working on : VT-007
working on : VT-008
working on : VT-009
working on : VT-010
working on : VT-011
working on : VT-012
working on : VT-013
working on : VT-014
w

In [7]:
outDf

Unnamed: 0,Comid,MeasHigh,MeasMed,ReachHigh,ReachMed,SmdateHigh,distCertMr,gnisCertMr,gnisNameM,id,permIdHr,xyHigh,xyMed
0,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-001,NO PERM ID,-999,POINT(-152.395 57.80333)
1,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-002,NO PERM ID,-999,POINT(-152.395 57.80167)
2,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-003,NO PERM ID,-999,POINT(-149.4267 62.61833)
3,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-004,NO PERM ID,-999,POINT(-149.38 62.62667)
4,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-005,NO PERM ID,-999,POINT(nan nan)
5,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-006,NO PERM ID,-999,POINT(-134.5 58.365)
6,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-007,NO PERM ID,-999,POINT(-133.93099 56.98154)
7,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-008,NO PERM ID,-999,POINT(-146.404303 65.292798)
8,NO COMID,-999.00000,-999.00000,NO REACHCODE,NO REACHCODE,,0,0.0,,AK-009,NO PERM ID,-999,POINT(nan nan)
9,21658128,63.09372,54.22023,03150202003380,03150202003380,1330646400000,1,1.0,Cahaba River,AL-001,{AF1DF2AD-843B-48B8-AC66-AE5B737BA274},"{'x': -87.029659830529, 'y': 33.16560553685197...","[-87.0294274247695, 33.1656451163403]"


In [8]:
outFile = input("Name of output file. No extension:  ")
outFile = outFile + ".csv"
outDf.to_csv(outFile)

Name of output file. No extension:  arRivBasin
