## Assigining attribute values to Oregon tax lot parcels

### Summary
In this notebook we add descriptive attributes to Oregon tax lot polygons for use in Landmapper. These attributes are displayed on the first page of the Landmapper map package. 
* **ID** - fieldname: *id*, source: create, type: double
* **Acres** - fieldname: *acres*, source: create, type: double
* **Elevation range** - fieldnames: *min_ft*, *max_ft*, source: , type: double
* **Legal Description** - fieldname: *legalDesc*, source: , type:text
* **County** - fieldname: *county*, source:parcels, type: text
* **Forest Fire District** - fieldname: *odf_fpd*, source:
* **Structure Fire District** - fieldname: *agency*, source:
* **Land use** - fieldname: *landuse*, source: parcel, type: text
* **Watershed Name** - fieldname: *name*, source: USGS WBD
* **Watershed (HUC)** - fieldname: *huc12*, source USGS WBD
* **Coordinates** - fieldnames: lat, lon
* **Elevation Range** - fieldnames: min, max

**Sources**
* Parcels - https://ormap.net/ Use 'Fetch Parcel Data from ORMAP' notebook to download tax lots
* Zoning - https://www.oregon.gov/lcd/about/pages/maps-data-tools.aspx
* Legal Description - https://gis.blm.gov/orarcgis/rest/services/Land_Status/BLM_OR_PLSS/MapServer
* Watersheds - https://hydro.nationalmap.gov/arcgis/rest/services/wbd/MapServer
* Forest Fire District - https://www.oregon.gov/odf/aboutodf/pages/mapsdata.aspx
* Structure Fire District - https://osfm-geo.hub.arcgis.com/datasets/structural-fire-districts/explore

In [1]:
%load_ext autotime
import os

import pandas as pd
import geopandas as gpd
import numpy as np
import dask_geopandas
import dask.dataframe
from dask.distributed import Client, LocalCluster
from rasterstats import zonal_stats
import rasterio 

In [2]:
# PROJECT PATHS
# also stored on knowsys at Landmapper_2020/Data
TAXLOTS = "../data/merge_taxlots_110823_ele.shp"
ZONING = "../data/OR_source/Oregon_Zoning_2017/Oregon_Zoning_2017.shp"
WATERSHED = "../data/OR_source/WBD_OR.gdb"
COUNTY_CODES = "../data/OR_source/ORCountyCodes.csv"
PLSS = "../data/OR_source/CadNSDI_PLSS_web.gdb"
PLSS_LAYER = "PLSSIntersected"
FOREST_FIRE = "../data/OR_source/Boundaries_Odf_Districts_Forest_Protection.gdb"
FOREST_LAYER = "Boundaries_Odf_Districts_Forest_Protection"
STRUCTURE_FIRE = "../data/OR_source/BoundariesStructuralFireProtectionDistricts100K/Boundaries_Structural_Fire_Protection_Districts_100K.gdb"
STRUCTURE_LAYER = "Boundaries_Structural_Fire_Protection_Districts_100K"

time: 1.07 ms


### Load and preprocess tax lots

Still nee to implement zonal statistics of DEM data to assign MIN & MAX values to each taxlot in this notebook. Currently being done outside of this process - MIN/MAX values are already in TAXLOTS file. 

In [132]:
# read in parcels 
OR = gpd.read_file(TAXLOTS)
# grab crs
crs = OR.crs

time: 6min 35s


In [133]:
OR.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1801839 entries, 0 to 1801838
Data columns (total 14 columns):
 #   Column      Dtype   
---  ------      -----   
 0   id          int64   
 1   OBJECTID    int64   
 2   MapNumber   object  
 3   ORMapNum    object  
 4   Taxlot      object  
 5   MapTaxlot   object  
 6   ORTaxlot    object  
 7   County      int64   
 8   RefLink     object  
 9   Shape_Leng  float64 
 10  Shape_Area  float64 
 11  MIN         float64 
 12  MAX         float64 
 13  geometry    geometry
dtypes: float64(4), geometry(1), int64(3), object(6)
memory usage: 192.5+ MB
time: 94.3 ms


In [135]:
#drop unneeded fields
OR.drop(['Shape_Leng', 'Shape_Area', 'RefLink', 'MapNumber', 'ORMapNum'], axis=1, inplace=True)

time: 449 ms


In [137]:
#read in county matrix - path at top
codes = pd.read_csv(COUNTY_CODES)
# join based on LANDUSE_CD
OR_county = pd.merge(OR, codes, on="County")
OR_county.drop('County', axis=1, inplace=True)
OR_county.rename(columns={'County_Name': 'county'}, inplace=True)

time: 1.49 s


### Join with attributes

In [84]:
def special_join(df, join_df):
    """
    Returns spatial join of two input features
    
    Parameters
    ----------
    df : geodataframe
        left join features
    join_df : geodataframe
        right join features
        
    Returns
    -------
    out_df : geodataframe
        spatial join of two input features
    """
    out_df = df.to_crs(4326)
    out_df = gpd.overlay(join_df, out_df, how='intersection')
    #there might be multiple per taxlot, so choose the largest
    out_df['area'] = out_df.geometry.area
    #sort by area
    out_df.sort_values(by='area', inplace=True)
    #drop duplicates, keep largest/last
    out_df.drop_duplicates(subset='id', keep='last', inplace=True)
    out_df.drop(columns=['area'], inplace=True)
    return out_df

time: 704 µs


In [85]:
join = OR[['id', 'geometry']]

time: 82.2 ms


Watersheds are specified at the subwatershed level, including name and huc12 

In [86]:
# read in Watershed (HUC) polygons
gdf = gpd.read_file(WATERSHED).to_crs(crs)
water = gdf[['Name', 'HUC12', 'geometry']]

time: 7.48 s


In [87]:
# spatial join 
OR_huc = special_join(water, join)
huc_out = pd.DataFrame(OR_huc[['id', 'Name', 'HUC12']])
huc_out.info()


  out_df['area'] = out_df.geometry.area


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1801839 entries, 418917 to 1771308
Data columns (total 3 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   id      int64 
 1   Name    object
 2   HUC12   object
dtypes: int64(1), object(2)
memory usage: 55.0+ MB
time: 7min 45s


Legal description pulled from PLSS data - Township, Section, Range

In [109]:
# read in PLSS dataset
plss = gpd.read_file(PLSS, driver='FileGDB', layer=PLSS_LAYER)
plss = plss[['TWNSHPLAB', 'FRSTDIVNO', 'geometry']]
plss = plss.rename(columns={'TWNSHPLAB': 'township', 'FRSTDIVNO': 'section'})

time: 21min 50s


In [110]:
#format the column 
plss['LegalDesc'] = (plss.apply(lambda x: "S{} ({})".format(x.section, x.township), axis=1))
plss = plss[['LegalDesc', 'geometry']]

time: 25 s


In [111]:
plss.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2015077 entries, 0 to 2015076
Data columns (total 2 columns):
 #   Column     Dtype   
---  ------     -----   
 0   LegalDesc  object  
 1   geometry   geometry
dtypes: geometry(1), object(1)
memory usage: 30.7+ MB
time: 5.89 ms


In [112]:
# spatial join 
OR_plss = special_join(plss, join)
plss_out = pd.DataFrame(OR_plss[['id', 'LegalDesc']])
plss_out.info()


  out_df['area'] = out_df.geometry.area


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1800362 entries, 121220 to 2139398
Data columns (total 2 columns):
 #   Column     Dtype 
---  ------     ----- 
 0   id         int64 
 1   LegalDesc  object
dtypes: int64(1), object(1)
memory usage: 41.2+ MB
time: 6min 22s


Forest Fire Districts

In [113]:
# read in forest fire district data
fire = gpd.read_file(FOREST_FIRE, driver="FileGDB", layer = FOREST_LAYER)
fire = fire[['ODF_FPD', 'geometry']]

time: 213 ms


In [114]:
#spatial join with tax lots
OR_fire = special_join(fire, join)
fire_out = pd.DataFrame(OR_fire[['id', 'ODF_FPD']])
fire_out.info()


  out_df['area'] = out_df.geometry.area


<class 'pandas.core.frame.DataFrame'>
Int64Index: 708014 entries, 311416 to 499303
Data columns (total 2 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   id       708014 non-null  int64 
 1   ODF_FPD  708014 non-null  object
dtypes: int64(1), object(1)
memory usage: 16.2+ MB
time: 29min 4s


Structural Fire District

In [118]:
# read in structure fire district data
struct = gpd.read_file(STRUCTURE_FIRE, driver="FileGDB", layer = STRUCTURE_LAYER)
struct = struct[['Agency', 'geometry']]

time: 1.43 s


In [119]:
#spatial join with tax lots
OR_struct = special_join(struct, join)
struct_out = pd.DataFrame(OR_struct[['id', 'Agency']])
struct_out.info()


  out_df['area'] = out_df.geometry.area


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1698982 entries, 480006 to 1329845
Data columns (total 2 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   id      int64 
 1   Agency  object
dtypes: int64(1), object(1)
memory usage: 38.9+ MB
time: 30min 14s


Oregon Zoning Use

In [122]:
# read in Zoning data
zone = gpd.read_file(ZONING)
zone = zone[['orZDesc', 'geometry']]

time: 43.5 s


In [123]:
#spatial join with tax lots
OR_zone = special_join(zone, join)
zone_out = pd.DataFrame(OR_zone[['id', 'orZDesc']])
zone_out.info()


  out_df['area'] = out_df.geometry.area


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1793947 entries, 644587 to 1836512
Data columns (total 2 columns):
 #   Column   Dtype 
---  ------   ----- 
 0   id       int64 
 1   orZDesc  object
dtypes: int64(1), object(1)
memory usage: 41.1+ MB
time: 17h 54min 13s


Combine and export 

In [138]:
# merge dataframes
export = OR_county.merge(huc_out, on='id', how='left')
export = export.merge(plss_out, on='id', how='left')
export = export.merge(fire_out, on='id', how='left')
export = export.merge(struct_out, on='id', how='left')
export = export.merge(zone_out, on='id', how='left')
export.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1801839 entries, 0 to 1801838
Data columns (total 15 columns):
 #   Column     Dtype   
---  ------     -----   
 0   id         int64   
 1   OBJECTID   int64   
 2   Taxlot     object  
 3   MapTaxlot  object  
 4   ORTaxlot   object  
 5   MIN        float64 
 6   MAX        float64 
 7   geometry   geometry
 8   county     object  
 9   Name       object  
 10  HUC12      object  
 11  LegalDesc  object  
 12  ODF_FPD    object  
 13  Agency     object  
 14  orZDesc    object  
dtypes: float64(2), geometry(1), int64(2), object(10)
memory usage: 220.0+ MB
time: 8.1 s


In [145]:
export_sub = export[['id', 'ODF_FPD', 'Agency', 'orZDesc', 'HUC12', 'Name', 'LegalDesc', 'MIN', 'MAX', 'OBJECTID','county', 'geometry']]
export_sub.insert(9, 'source', 'ORMAP')
export_sub.insert(11, 'map_id', '')
export_sub.rename(columns={'ODF_FPD': 'odf_fpd', 'Agency': 'agency', 'orZDesc':'orzdesc', 'HUC12':'huc12', 'Name':'name', 'OBJECTID':'map_taxlot', 'LegalDesc':'legalDesc'}, inplace=True)
export_sub.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1801839 entries, 0 to 1801838
Data columns (total 14 columns):
 #   Column      Dtype   
---  ------      -----   
 0   id          int64   
 1   odf_fpd     object  
 2   agency      object  
 3   orzdesc     object  
 4   huc12       object  
 5   name        object  
 6   legalDesc   object  
 7   MIN         float64 
 8   MAX         float64 
 9   source      object  
 10  map_taxlot  int64   
 11  map_id      object  
 12  county      object  
 13  geometry    geometry
dtypes: float64(2), geometry(1), int64(2), object(9)
memory usage: 206.2+ MB
time: 221 ms


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  export_sub.rename(columns={'ODF_FPD': 'odf_fpd', 'Agency': 'agency', 'orZDesc':'orzdesc', 'HUC12':'huc12', 'Name':'name', 'OBJECTID':'map_taxlot', 'LegalDesc':'legalDesc'}, inplace=True)


In [142]:
EXPORT = '../data/OR_Attributes.csv'
export_sub.to_csv(EXPORT, encoding='utf-8', index=False)

time: 1min 42s


In [146]:
EXPORT = '../data/OR_Attributes.shp'
export_sub.to_file(EXPORT)

time: 8min 10s


In [3]:
# read in parcels 
OR = gpd.read_file('../data/OR_Attributes.shp')

time: 6min 20s


In [None]:
# #set up client with 32 cores 
# client = Client(
#     LocalCluster(
#         n_workers = 32,
#         processes=True,
#         threads_per_worker=5
#     )
# )

# #create dask dataframe
# OR_dask = dask_geopandas.from_geopandas(OR_county, npartitions=160)
# OR_dask.info()
# test_join = dask_geopandas.sjoin(OR_dask, water, predicate='within')
# r = test_join.compute()