# Example notebook for creating GIS features for well exposure model

Features to create:
* Type of soil using polygon layer for a given well location
* Slope from a source to well


****
Notes on feature creation:

- Type of soil using polygon layer: Point-within-polygon calculation

- Slope from a source to well
    * Point-within-polygon to get slope @ well pt & source pt from elevation polygon map
    * Calculation to subtract elevations
    * If multiple sources -- what do we want? Slope of nearest source? Average slope of sources?
****

Features to Explore:
    - Slope Aspect

Features Created:
* Slope phase
* Soil series (primary --> tertiary)
* Soil texture (primary --> tertiary)
* Map Unit groups (See Legend.pdf last page)

Features ALREADY MADE:
* Soil Classification - MUSYM

* Bedrock depth - BROCKDEPMI

* water table depth (annual min) - WTDEPANNMI

* the flooding/ponding frequency ones
    * dominant - FLODFREQDC
    * maximum - FLODFREQMA
    * ponding freq - PONDFREQPR

* drainage class
    * dominant - DRCLASSDCD
    * wettest - DRCLASSWET

* hydro group - HYDGRPDCD

* slope - SLOPE

* slope gradient 
    * (dominant) - SLOPEGRADD
    * (weighted average) - SLOPEGRADW

In [2]:
import pandas as pd
import geopandas as gpd
import numpy as np
import model_utils

In [48]:
created_features = ['map_unit_groups', 
                    'slope_phase', 
                    'primary_series', 'secondary_series', 'tertiary_series',
                    'primary_texture', 'secondary_texture', 'tertiary_texture',
                    'extra_desc',
                    'elevation',
                    'slope_aspect']

existing_features = ["MUSYM",
                     "MUNAME",
                     "BROCKDEPMI",
                     "WTDEPANNMI",
                     "FLODFREQDC", "FLODFREQMA", "PONDFREQPR", 
                     "DRCLASSDCD", "DRCLASSWET",
                     "HYDGRPDCD",
                     "SLOPE",
                     "SLOPEGRADD", "SLOPEGRADW"]

## Private Well Sampling Lab Reports

In [49]:
private_well_gdf = gpd.read_file('../../data/modeling_data/well_exposure/base_samples/private_well_gdf.geojson')


# get all unique private wells with their coordinates
private_well_locations = private_well_gdf[['geometry']]
private_well_locations = private_well_locations.drop_duplicates().reset_index(drop = True)

orig_private_well_locations = private_well_locations.copy()

In [50]:
private_well_gdf.rename(columns = {'date_sampled_well_well' : 'date_sampled_well'}, inplace = True)

In [51]:
private_well_gdf.shape

(456, 21)

In [52]:
private_well_locations.shape

(221, 1)

#### Attach geographic features dependent upon coordinate location 

Elevation

In [53]:
elevation = []
for location in private_well_locations['geometry']:
    elevation.append(model_utils.get_elevation(location.y, location.x))
    
private_well_locations['elevation'] = elevation

In [54]:
private_well_locations.shape

(221, 2)

Soil

In [3]:
soil_df = gpd.read_file("zip://../../data/features/soil_features.zip")

In [56]:
private_well_locations = gpd.sjoin(private_well_locations, soil_df, op='within')
private_well_locations.drop(columns = ['index_right'], inplace = True)

  if (await self.run_code(code, result,  async_=asy)):


In [57]:
private_well_locations.shape

(220, 23)

Slope aspect

In [58]:
private_well_locations['slope_aspect'] = model_utils.get_slope_aspect(raster_path = '../../data/features/slope_aspect/aspect_compressed_5000_4700.tif',
                                                                      locations_df = private_well_locations)
private_well_locations['slope_aspect'] = private_well_locations['slope_aspect'].replace({-9999 : 0})

In [59]:
for feature in created_features + existing_features:
    private_well_locations.rename(columns = {feature : f'{feature}_well'}, inplace = True)

In [60]:
private_well_locations.shape

(220, 24)

In [61]:
private_well_locations.head(2)

Unnamed: 0,geometry,elevation_well,MUSYM_well,BROCKDEPMI_well,WTDEPANNMI_well,FLODFREQDC_well,FLODFREQMA_well,PONDFREQPR_well,DRCLASSDCD_well,DRCLASSWET_well,...,map_unit_groups_well,slope_phase_well,primary_series_well,secondary_series_well,tertiary_series_well,primary_texture_well,secondary_texture_well,tertiary_texture_well,extra_desc_well,slope_aspect_well
0,POINT (-72.47534 42.64166),76,230A,,,,,0-14%,Well drained,Well drained,...,Excessively drained to somewhat poorly drained...,0 to 3 percent slopes,unadilla,,,silt loam,,,,0.0
1,POINT (-72.47399 42.64249),79,230A,,,,,0-14%,Well drained,Well drained,...,Excessively drained to somewhat poorly drained...,0 to 3 percent slopes,unadilla,,,silt loam,,,,135.0


###### Create final private well geodataframe

In [62]:
private_well_gdf = private_well_gdf.merge(private_well_locations, on = ['geometry'])

In [63]:
private_well_gdf.shape

(423, 44)

In [64]:
private_well_gdf.columns

Index(['RTN', 'date_sampled_well', 'sample_id', 'lab', 'Matrix', 'lat', 'lon',
       'NEtFOSAA_well', 'PFBS_well', 'PFDA_well', 'PFDoA_well', 'PFHpA_well',
       'PFHxA_well', 'PFHxS_well', 'PFNA_well', 'PFOA_well', 'PFOS_well',
       'PFTA_well', 'PFTrDA_well', 'PFUnA_well', 'geometry', 'elevation_well',
       'MUSYM_well', 'BROCKDEPMI_well', 'WTDEPANNMI_well', 'FLODFREQDC_well',
       'FLODFREQMA_well', 'PONDFREQPR_well', 'DRCLASSDCD_well',
       'DRCLASSWET_well', 'HYDGRPDCD_well', 'SLOPE_well', 'SLOPEGRADD_well',
       'SLOPEGRADW_well', 'map_unit_groups_well', 'slope_phase_well',
       'primary_series_well', 'secondary_series_well', 'tertiary_series_well',
       'primary_texture_well', 'secondary_texture_well',
       'tertiary_texture_well', 'extra_desc_well', 'slope_aspect_well'],
      dtype='object')

## Disposal Site Source Sampling Reports

In [65]:
# turn into geodataframe
disposal_source_gdf = gpd.read_file('../../data/modeling_data/well_exposure/base_samples/diposal_source_gdf.geojson')

# get all unique disposal sources with their coordinates
disposal_source_locations = disposal_source_gdf[['geometry']]
disposal_source_locations = disposal_source_locations.drop_duplicates().reset_index(drop = True)

In [66]:
disposal_source_gdf.columns

Index(['level_0_DS', 'RTN', 'index_DS', 'report', 'lab', 'sample_id', 'Matrix',
       'date_sampled_ds', 'lat', 'lon', 'NEtFOSAA_DS', 'PFBS_DS', 'PFDA_DS',
       'PFDoA_DS', 'PFHpA_DS', 'PFHxA_DS', 'PFHxS_DS', 'PFNA_DS', 'PFOA_DS',
       'PFOS_DS', 'PFTA_DS', 'PFTrDA_DS', 'PFUnA_DS', 'geometry'],
      dtype='object')

In [67]:
disposal_source_gdf.shape

(18, 24)

In [68]:
disposal_source_locations.shape

(18, 1)

#### Attach geographic features dependent upon coordinate location 

Elevation

In [69]:
elevation = []
for location in disposal_source_locations['geometry']:
    elevation.append(model_utils.get_elevation(location.y, location.x))
    
disposal_source_locations['elevation'] = elevation

Soil

In [70]:
disposal_source_locations = gpd.sjoin(disposal_source_locations, soil_df, op='within')
disposal_source_locations.drop(columns = ['index_right'], inplace = True)

  if (await self.run_code(code, result,  async_=asy)):


Slope aspect
* Raster is about 50m * 50m

In [71]:
disposal_source_locations['slope_aspect'] = model_utils.get_slope_aspect(raster_path = '../../data/features/slope_aspect/aspect_compressed_5000_4700.tif',
                                                                      locations_df = disposal_source_locations)
disposal_source_locations['slope_aspect'] = disposal_source_locations['slope_aspect'].replace({-9999 : 0})

In [72]:
for feature in created_features + existing_features:
    disposal_source_locations.rename(columns = {feature : f'{feature}_ds'}, inplace = True)

In [73]:
disposal_source_locations.head(2)

Unnamed: 0,geometry,elevation_ds,MUSYM_ds,BROCKDEPMI_ds,WTDEPANNMI_ds,FLODFREQDC_ds,FLODFREQMA_ds,PONDFREQPR_ds,DRCLASSDCD_ds,DRCLASSWET_ds,...,map_unit_groups_ds,slope_phase_ds,primary_series_ds,secondary_series_ds,tertiary_series_ds,primary_texture_ds,secondary_texture_ds,tertiary_texture_ds,extra_desc_ds,slope_aspect_ds
0,POINT (-73.23162 42.52141),342,505C,,,,,0-14%,Well drained,Well drained,...,Excessively drained to somewhat poorly drained...,8 to 15 percent slopes,nellis,,,loam,,,,0.0
1,POINT (-71.49556 42.38547),69,256A,,69.0,,,0-14%,Moderately well drained,Moderately well drained,...,Excessively drained to somewhat poorly drained...,0 to 3 percent slopes,deerfield,,,loamy sand,,,,0.0


In [74]:
disposal_source_gdf = disposal_source_gdf.merge(disposal_source_locations, on = ['geometry'])

In [75]:
disposal_source_gdf.shape

(17, 47)

## Merge datasets
* Merge based on RTN
* Possibly look at places that are close by distance?? Then possibly attach to private wells
    * Look at max of current connection

In [76]:
PFAS_disposal_site_info = pd.read_parquet('../../data/disposal_sites/PFAS_Sites_2021-11-07_geocoded.parquet')

In [77]:
PFAS_disposal_site_info = PFAS_disposal_site_info[['RTN', 'Notif_Date']]

In [78]:
private_well_gdf['RTN'].unique()

array(['1-0021289', '2-0020439', '2-0020923', '2-0021045', '2-0021075',
       '3-0036649', '3-0036774', '4-0027571'], dtype=object)

In [79]:
disposal_source_gdf['RTN'].unique()

array(['1-0021230', '2-0020439', '2-0021045', '2-0021072', '2-0021075',
       '2-0021116', '2-0021349', '2-0021383', '2-0021446', '2-0021455',
       '2-0021541', '3-0036118', '3-0036649', '3-0036899', '3-0036926',
       '4-0027571', '2-0021682'], dtype=object)

In [80]:
final_df = private_well_gdf.merge(disposal_source_gdf, on = 'RTN')

In [81]:
final_df.shape

(219, 90)

##### After attaching create additional features
    * Slope from Disposal Site to Well
    * Days since release
    * Distance from disposal site to well
    * Direction (bearing) from disposal site to well (0 to 360)

In [82]:
# Slope from Disposal Site to Well
final_df['elevation_ds_to_well'] = final_df['elevation_ds'] - final_df['elevation_well']

In [83]:
final_df = final_df.merge(PFAS_disposal_site_info, on = 'RTN')

In [84]:
final_df['date_sampled_ds']= pd.to_datetime(final_df['date_sampled_ds'])
final_df['date_sampled_well']= pd.to_datetime(final_df['date_sampled_well'])
final_df['Notif_Date']= pd.to_datetime(final_df['Notif_Date'])
final_df['days_since_release'] = (final_df['date_sampled_well'] - final_df['Notif_Date']).astype('timedelta64[D]')

In [85]:
# final_df['days_since_release'] = np.where(final_df['date_sampled_ds'] == '01/01/2001', None, final_df['days_since_release'])
final_df['days_since_release'] = np.where(final_df['date_sampled_well'] == '01/01/2001', None, final_df['days_since_release'])

In [86]:
# Distance from disposal site to well
# Get bearing from disposal site to each
distance_lst = []
bearing_lst = []
for r in final_df.iterrows():
    
    well_x = float(r[1]['geometry_x'].x)
    well_y = float(r[1]['geometry_x'].y)
    
    ds_x = float(r[1]['geometry_y'].x)
    ds_y = float(r[1]['geometry_y'].y)
    
    dist = np.square(ds_x - well_x) + np.square(ds_y - well_y)
    
    bearing = model_utils.get_bearing(ds_y, ds_x, well_y, well_x)
    
    bearing_lst.append(bearing)
    distance_lst.append(dist)

In [87]:
final_df['distance_ds_to_well'] = distance_lst
final_df['bearing_ds_to_well'] = bearing_lst

In [88]:
final_df.head(2)

Unnamed: 0,RTN,date_sampled_well,sample_id_x,lab_x,Matrix_x,lat_x,lon_x,NEtFOSAA_well,PFBS_well,PFDA_well,...,primary_texture_ds,secondary_texture_ds,tertiary_texture_ds,extra_desc_ds,slope_aspect_ds,elevation_ds_to_well,Notif_Date,days_since_release,distance_ds_to_well,bearing_ds_to_well
0,2-0020439,2019-07-30,107-WP,SGS,DW - Drinking Water,42.386142,-71.484495,0.915,0.721537,0.95,...,loamy sand,,,,0.0,5,2018-01-24,552.0,0.000123,85.295672
1,2-0020439,2019-07-30,115-WP,SGS,DW - Drinking Water,42.385724,-71.484219,0.915,0.874766,0.95,...,loamy sand,,,,0.0,5,2018-01-24,552.0,0.000129,88.258785


In [89]:
final_df.shape

(219, 95)

In [90]:
final_df = final_df.drop(columns = ['geometry_x', 'geometry_y'])

final_df = final_df.reset_index(drop = True)

final_df = pd.DataFrame(final_df)

final_df.to_csv('../../data/modeling_data/well_exposure/well_exposure_modeling_dataset.csv')

In [91]:
private_well_gdf.drop(columns = ['geometry']).to_parquet('../../data/modeling_data/well_exposure/well_exposure_private_well.parquet')
private_well_gdf.to_file('../../data/modeling_data/well_exposure/well_exposure_private_well.geojson', driver='GeoJSON')


This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

  private_well_gdf.drop(columns = ['geometry']).to_parquet('../../data/modeling_data/well_exposure/well_exposure_private_well.parquet')


In [92]:
disposal_source_gdf.drop(columns = ['geometry']).to_parquet('../../data/modeling_data/well_exposure/well_exposure_disposal_source.parquet')
disposal_source_gdf.to_file('../../data/modeling_data/well_exposure/well_exposure_disposal_source.geojson', driver='GeoJSON')


This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

  disposal_source_gdf.drop(columns = ['geometry']).to_parquet('../../data/modeling_data/well_exposure/well_exposure_disposal_source.parquet')


***
***