# Gage Assignment to Zip Code
_Calvin Whealton_

This notebook completes the calculations to find the nearest stream gages to the zip code nominal location. Most coordinates of the input files are in WGS84 units (decimal latitude and longitude). These are reprojected to a different coordinate reference system (CRS). Distance calculations are performed in US National Atlas Projection. Distances are in meters for this projection. 

In [None]:
import numpy as np
import os
import pandas as pd
import geopandas as gpd
from geopy.distance import geodesic
from shapely import wkt
from shapely.geometry import Point

## Zip Code Processing to point lat-long coordinates

Reading in shapefile for zip codes as a shapefile as table. It includes the interpolation latitude and longitude of the zip code. These will be used in the distance calculation.

In [None]:
os.chdir('/Users/calvinwhealton/Documents/GitHub/floods_housing_zipcode/data/geo_data/tl_2019_us_zcta510')
zip_data = gpd.read_file('tl_2019_us_zcta510.shp')

In [None]:
zip_data.head()

In [None]:
# converting to from string (read in)  to float to do conversion later
zip_data['x'] = zip_data['INTPTLON10'].astype(float)
zip_data['y'] = zip_data['INTPTLAT10'].astype(float)

In [None]:
# dropping the geometry and other columns that are not needed form the shapefile
zip_data.drop(['geometry','CLASSFP10','MTFCC10','FUNCSTAT10'],axis=1,inplace=True)

In [None]:
# using x and y values to set geometry
zip_data_latlong = gpd.GeoDataFrame(
                    zip_data, geometry=gpd.points_from_xy(zip_data.x, zip_data.y))

In [None]:
zip_data_latlong.head()

In [None]:
# setting to wgs84 coordinate reference system (crs)
zip_data_latlong.crs = {'init' :"EPSG:4269"}

In [None]:
zip_data_latlong.crs

## Stream Gages

Reading in a dataset for the gages. This is supplementatry information that more columns than were needed. The values were pre-selected based on at least 20 years of peak flood data, which is used in the assignment of return periods for events.

In [None]:
os.chdir('/Users/calvinwhealton/Documents/GitHub/floods_housing_zipcode/data/gage_data')
gage_locs = pd.read_csv('usgs_supp.txt',sep='\t',comment='#')

In [None]:
gage_locs.head()

In [None]:
# dropping locations without coordinates
gage_locs.dropna(subset=['dec_lat_va','dec_long_va'],inplace=True)

In [None]:
# converting to a geodataframe
gage_locs_gpd = gpd.GeoDataFrame(gage_locs,geometry=gpd.points_from_xy(gage_locs.dec_long_va, gage_locs.dec_lat_va))

In [None]:
# setting coordinate reference system to wgs84
gage_locs_gpd.crs = {'init' :"EPSG:4269"}

In [None]:
gage_locs_gpd.head()

## Reprojecting dataframes for distance calcs
Decimal lat-long are generally not good for distance calculations. Will reproject the dataframe to a coordinate system more appropriate for distance calcluation.

In [None]:
gages_for_dist_calc = gpd.GeoDataFrame(gage_locs,columns=['site_no','geometry'])

In [None]:
gages_for_dist_calc.crs = {'init' :"EPSG:4269"}

In [None]:
gages_for_dist_calc = gages_for_dist_calc.to_crs('EPSG:2163')

In [None]:
gages_for_dist_calc['x'] = gages_for_dist_calc['geometry'].x
gages_for_dist_calc['y'] = gages_for_dist_calc['geometry'].y

In [None]:
zip_locs_dist = zip_data_latlong.to_crs('EPSG:2163')

In [None]:
gages_for_dist_calc.head()

In [None]:
def closest_n_gage_to_zip(point,gages,n):
    '''
    function to find the n closest gages to the point
    points and gages are assumed to be projected into a consistent CRS
    CRS should be appropriate for distance calculation
    point = location of interst (zip code interpolation latitude and longitude)
    gages = vector of points for gages with a gage number index (shapely points)
    n = number of nearest points
    returns a dataframe for the n closest points with their Euclidian distance
    
    this function is slow, but it rarely needs to be computed
    
    '''
    
    # setting up dataframe that will be returned
    dist_gage = pd.DataFrame(columns=['site_no','dist'])
    
    # site numbers are the index of the gage dataframe
    dist_gage['site_no'] = gages['site_no']
    
    # calculating euclidian distance
    dist_gage['dist'] = np.sqrt(np.power(np.array(point.x-gages['x']),2) + np.power(np.array(point.y-gages['y']),2))
                 
    # sorting the results
    dist_gage.sort_values(by=['dist'],inplace=True)
    
    return dist_gage.iloc[0:n]

In [None]:
# reprojecting
zip_locs_dist = zip_data_latlong.to_crs('EPSG:2163')

In [None]:
# making columns for gages and istances to gage
for j in range(10):
    zip_locs_dist['gage'+str(j)] = 0
    zip_locs_dist['dist'+str(j)] = 0

In [None]:
# loop to calculate all the distances
# this loop is slow, but it rarely has to be computed

for ind in zip_locs_dist.index:
    
    # zip code location evaluated in the loop
    zip_loc = zip_locs_dist.loc[ind,'geometry']
    
    # finding closest gages
    closest_gages = closest_n_gage_to_zip(zip_loc,gages_for_dist_calc,10)
    
    # storing the gage and distance values in the initialized columns
    for j in range(closest_gages.shape[0]):
        zip_locs_dist.loc[ind,'dist'+str(j)] = closest_gages['dist'].values[j]
        zip_locs_dist.loc[ind,'gage'+str(j)] = closest_gages['site_no'].values[j]


In [None]:
zip_locs_dist.head()

In [None]:
os.chdir('/Users/calvinwhealton/Documents/GitHub/floods_housing_zipcode/data')
zip_locs_dist.to_csv('zip_gage_dist_2020-08-10.csv')