# NHDPlusV1 Flowlines into Data Distillery Gc2

This code is in progress and is testing the use of Python to extract data from ScienceBase, add registration
information, and export data into Data Distillery Gc2.

General workflow involves:
 1: Identify needed data in ScienceBase
 2: Request data from ScienceBase by NHDPlusV1 Region
 3: Add regional flowline data to a common geodataframe 
 4: Add fields to document registration date and code
 5: Export data to Shapefile
 6: Zip Data and use UI to upload to GC2

In [1]:
#Import needed packages
import requests, json, os
import urllib.request as ur
import time
import geopandas as gpd
import datetime

In [2]:

#GC2 was having a hard time handling a shapefile of all regions being uploaded so I had to parse it out and 
#run the code on three sets of regions.  Each run produced a shapefile that I uploaded into GC2 with encoding ISO_8859_5
#for second and third shapefile I selected to append to the existing GC2 table of nhdplusv1_flowline.

#regList = ['Reg1','Reg2','Reg3','Reg4','Reg5','Reg6','Reg7','Reg8']
#regList = ['Reg9','Reg10U','Reg10L','Reg11','Reg12']
regList = ['Reg13','Reg14','Reg15','Reg16','Reg17','Reg18']


#Query to retrieve SB Items for NHDPlusV1
q="""https://www.sciencebase.gov/catalog/items?filter=tags={%22scheme%22:%22BIS%22,%22name%22:%22NHDPlusV1%22}&fields=files,id,tags&max=40&format=json"""

#Request query of SB Itmes
nhdItems = requests.get(q).json()


dfAll = None

#Loop through list of SB items returned in query
for item in nhdItems['items']:
    # For each item use tags to identify which region is being processed
    for tag in item['tags']:
        if 'Reg' in tag['name']:
            region = tag['name']
            #Only process a third of the regions at once.  GC2 was having a hard time handling a shapefile of all 
            #regions being uploaded so I had to parse it out and run the code on three sets of regions.
            if region in regList:
                time.sleep(2)
                #Look at files and find files for NHDFLOWLINE
                for file in item['files']:
                    fileName = str.lower(file['name'])
                    #Look for files that have nhdflowline. in them, download these 4 files which make up a shapefile
                    if 'nhdflowline.' in fileName:
                        print ('Retrieving region ' + region + ', file:' + fileName)
                        fileUrl = file['url']
                        try:
                            ur.urlretrieve(fileUrl, fileName)    
                        except:
                            print (region + " broke")
                #If the dataframe doesn't exist add region to dataframe (dfAll).  If it does exist append the region to existing dataframe.
                if dfAll is None:
                    dfAll = gpd.read_file('nhdflowline.shp', crs=4269)
                else:
                    dfReg = gpd.read_file('nhdflowline.shp', crs=4269)
                    dfAll = dfAll.append(dfReg)
                    
                #Remove temporary files before moving on to next region
                os.remove('nhdflowline.shp')
                os.remove('nhdflowline.prj')
                os.remove('nhdflowline.dbf')
                os.remove('nhdflowline.shx')



Retrieving region Reg17, file:nhdflowline.dbf
Retrieving region Reg17, file:nhdflowline.prj
Retrieving region Reg17, file:nhdflowline.shp
Retrieving region Reg17, file:nhdflowline.shx
Retrieving region Reg16, file:nhdflowline.dbf
Retrieving region Reg16, file:nhdflowline.prj
Retrieving region Reg16, file:nhdflowline.shp
Retrieving region Reg16, file:nhdflowline.shx
Retrieving region Reg18, file:nhdflowline.dbf
Retrieving region Reg18, file:nhdflowline.prj
Retrieving region Reg18, file:nhdflowline.shp
Retrieving region Reg18, file:nhdflowline.shx
Retrieving region Reg13, file:nhdflowline.dbf
Retrieving region Reg13, file:nhdflowline.prj
Retrieving region Reg13, file:nhdflowline.shp
Retrieving region Reg13, file:nhdflowline.shx
Retrieving region Reg14, file:nhdflowline.dbf
Retrieving region Reg14, file:nhdflowline.prj
Retrieving region Reg14, file:nhdflowline.shp
Retrieving region Reg14, file:nhdflowline.shx
Retrieving region Reg15, file:nhdflowline.dbf
Retrieving region Reg15, file:nhdf

In [3]:
dfAll.shape

(819243, 14)

In [4]:
#Drop a few unneeded fields
dfAll.drop('RESOLUTION', axis=1, inplace=True)
dfAll.drop('SHAPE_LENG', axis=1, inplace=True)
dfAll.drop('WBAREACOMI', axis=1, inplace=True)

#Insert a few registration fields to denote date and code used to create data
dfAll.insert(loc = 0, column='regCode', value= "https://github.com/dwief-usgs/BCB_Ipython_Notebooks/blob/master/NHDPlusV1_SB_Into_GC2.ipynb")

regDate = datetime.datetime.utcnow().isoformat()
dfAll.insert(loc = 0, column='regDate', value=regDate)

In [5]:
dfAll.head()

Unnamed: 0,regDate,regCode,COMID,ENABLED,FCODE,FDATE,FLOWDIR,FTYPE,GNIS_ID,GNIS_NAME,LENGTHKM,REACHCODE,geometry
0,2017-12-27T18:06:59.262094,https://github.com/dwief-usgs/BCB_Ipython_Note...,9301535,T,46006,2004-08-01,Uninitialized,StreamRiver,,,0.999,17010206126001,LINESTRING Z (-114.0585468667315 48.9999911239...
1,2017-12-27T18:06:59.262094,https://github.com/dwief-usgs/BCB_Ipython_Note...,22877591,T,55800,2004-08-01,With Digitized,ArtificialPath,799122.0,Kootenai River,0.247,17010101000002,LINESTRING Z (-115.9766453304209 48.5586159912...
2,2017-12-27T18:06:59.262094,https://github.com/dwief-usgs/BCB_Ipython_Note...,22877593,T,55800,2004-08-01,With Digitized,ArtificialPath,391351.0,Star Creek,0.055,17010101000003,LINESTRING Z (-115.97724033042 48.558313991294...
3,2017-12-27T18:06:59.262094,https://github.com/dwief-usgs/BCB_Ipython_Note...,22877595,T,46006,2004-08-01,With Digitized,StreamRiver,391351.0,Star Creek,1.999,17010101000003,LINESTRING Z (-116.0010061303831 48.5576273912...
4,2017-12-27T18:06:59.262094,https://github.com/dwief-usgs/BCB_Ipython_Note...,22877597,T,46006,2004-08-01,With Digitized,StreamRiver,391351.0,Star Creek,0.947,17010101000004,LINESTRING Z (-116.012473397032 48.55594812463...


In [6]:
#export as shapefile that will be zipped and uploaded into gc2
dfAll.to_file('nhdplusv1_flowline.shp', driver='ESRI Shapefile', crs_wkt='4269')