# Title: GC2 testing : Summarizing Fish Habitat Condition Scores to Analytical Units                                                                                                                                                                                                                      

# Abstract: This notebook explores the storage and data retrieval capabilities of  the GC2 platform.  Data are retrieved from a GC2 test database in this example.  #Summarizes National Fish Habitat Partnership's 2015 Fish Habitat Condition Index (HCI) information from NHDPlusV1 flowlines into user selected ecological or jurisdictional spatial units (analytical units) from the Spatial Features Registry.  Flowlines are assigned to analytical units by a spatial join of flowline midpoints with each analytical unit.  A length-weighted average of HCI scores in each analytical unit are then calculated.                                                                                                                                                                                        

# Contact Information: Daniel Wieferich (dwieferich@usgs.gov)                                                                                                                                                                                                                       

# Date: 20170228

# Import libraries

In [3]:
import requests
import geopandas as gpd
import geojson
import pandas as pd
import numpy as np
import time

# API Call to GC2 test database.  Returns analytical units (as 'dfau') and NFHP data linked to the NHDPlusV1 flowline midpoints (as 'df').  More description in the comments below.   

# A note on how to validate GEOJSON:
#valids geojson, test shows 'yes' valid
#validation = geojson.is_valid(data)
#validation['valid']         

In [4]:
#Retrieves Data for analytical units, Renders GEOJSON as Python object
url_au = "https://gc2.mapcentia.com/api/v1/sql/dwief?q=select featureuri as au_uri, the_geom from tstnad where featureuri = 'a6bba27a-ea49-11e6-9dda-6f223c9e40e9' or featureuri = 'a6bd9e54-ea49-11e6-9dde-f3016d8ddba6'"
r_au = requests.get(url=url_au)
data_au = geojson.loads(r_au.text)
dfau = gpd.GeoDataFrame.from_features(data_au['features'], 4269)

#Retrieves Data for NFHP and NHD, Renders GEOJSON as Python object
#This query joins spatial informatoin from the NHDPlusV1 flowline midpoints to NFHP Habitat Condition Index (HCI) scores. HCI values were multiplied by length of flowline (field = hcilength) to help preprocess information to be used in the numerator of the length weighted average calculations.
url = "https://gc2.mapcentia.com/api/v1/sql/dwief?q=select a.featureuri,b.cumu_hci*c.lengthkm as hcilength, c.lengthkm, a.the_geom from sfr_point a left join nhdplusv1_flowmid d on a.featureuri=d.featureuri left join nfhp2015hci b on d.comid=b.comid left join nhdplusv1_flowline c on d.comid=c.comid where a.ftyped='https://www.sciencebase.gov/vocab/term/58a49caee4b0f974afcf03b3'"
r = requests.get(url=url)
data = geojson.loads(r.text)
df = gpd.GeoDataFrame.from_features(data['features'], 4269)

# Spatial join of information assigns NHDPlusV1 midpoints, containing NFHP habitat condition scores, to the selected analytical units (this example uses two EPA Omernick Level 3 Ecoregions).  Processing time is returned at completion.

In [5]:
start_time = time.time()

#Spatial join of the two geodataframes
pointInPolys = gpd.sjoin(df, dfau, how='left',op='within')

#Print time process takes
print("--- %s minutes to run spatial join ---" % ((time.time() - start_time)/60))

--- 0.000200001398722 minutes to run spatial join ---


In [6]:
pointInPolys.head(10)

Unnamed: 0,featureuri,geometry,hcilength,lengthkm,index_right,au_uri
0,bb488d0a-ed63-11e6-be02-73ec1c432711,(POINT (-12629508.034164 5774913.4014151)),11.8312,2.572,1,a6bd9e54-ea49-11e6-9dde-f3016d8ddba6
1,bb488d0f-ed63-11e6-be07-8b101555701e,(POINT (-12628639.815941 5774219.1682246)),4.8714,1.059,1,a6bd9e54-ea49-11e6-9dde-f3016d8ddba6
2,bb4aee74-ed63-11e6-be0d-e389826550ba,(POINT (-12625241.892791 5773433.4739399)),9.8118,2.133,1,a6bd9e54-ea49-11e6-9dde-f3016d8ddba6
3,a95be0d7-ed66-11e6-b7c6-27e3c1d64dca,(POINT (-12462004.805999 5428281.5660944)),2.3135,0.661,0,a6bba27a-ea49-11e6-9dda-6f223c9e40e9
4,a95be0d8-ed66-11e6-b7c7-5f7265177b6d,(POINT (-12462043.960292 5429713.4660295)),3.248,0.928,0,a6bba27a-ea49-11e6-9dda-6f223c9e40e9
5,a95e422e-ed66-11e6-b7c8-e3f6a3ab424a,(POINT (-12462647.652125 5431213.0630325)),4.795,1.37,0,a6bba27a-ea49-11e6-9dda-6f223c9e40e9


# Groups the information by analytical unit, summing fields hcilength and lengthkm. Next hcilength is divided by total length of flowlines in the analytical unit to provide the final length-weighted average HCI score per analytical unit (field name = lw_hci)

In [7]:
#reference =>  http://pandas.pydata.org/pandas-docs/stable/groupby.html
pointInPolys.hcilength = pointInPolys.hcilength.apply(float)
pointInPolys.lengthkm = pointInPolys.lengthkm.apply(float)
df = pd.DataFrame(pointInPolys, columns=['au_uri','hcilength','lengthkm'])
grouped = df.groupby('au_uri')
df_agg = grouped.aggregate(np.sum)
df_agg.reset_index(level='1', inplace=True)
df_agg.head()
#list(df_agg.columns.values)

Unnamed: 0,au_uri,hcilength,lengthkm
0,a6bba27a-ea49-11e6-9dda-6f223c9e40e9,10.3565,2.959
1,a6bd9e54-ea49-11e6-9dde-f3016d8ddba6,26.5144,5.764


In [8]:
df_agg['lw_hci'] = df_agg.hcilength / df_agg.lengthkm
df_agg.head()

Unnamed: 0,au_uri,hcilength,lengthkm,lw_hci
0,a6bba27a-ea49-11e6-9dda-6f223c9e40e9,10.3565,2.959,3.5
1,a6bd9e54-ea49-11e6-9dde-f3016d8ddba6,26.5144,5.764,4.6
