This is part of the process to rebuild a parcel-TM2 maz lookup. The inputs include the previous lookup, several GIS layers created by spatial join and manual fixing; the output is a parcel-level lookup with both the old and new MAZ designations.
This Asana task gives a more complete and detailed picture of the lookup creation process: https://app.asana.com/0/0/1200719394714319/f 

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import os
import fiona

In [2]:
# initial lookup table
old_lookup_file = 'C:\\Users\\{}\\Box\\Modeling and Surveys\\Urban Modeling\\Bay Area UrbanSim\\PBA50\\Current PBA50 Large General Input Data\\2020_08_17_parcel_to_maz22.csv'.format(os.getenv('USERNAME'))
old_lookup = pd.read_csv(old_lookup_file, usecols = ['PARCEL_ID', 'maz'])
print('Read {} rows of parcel_id/maz lookup table'.format(old_lookup.shape[0]))
old_lookup.rename(columns={'maz': 'maz_old'}, inplace=True)
print(old_lookup.dtypes)
display(old_lookup.head())

Read 1956208 rows of parcel_id/maz lookup table
PARCEL_ID    int64
maz_old      int64
dtype: object


Unnamed: 0,PARCEL_ID,maz_old
0,229116,310596
1,244166,331415
2,202378,310099
3,2004420,710778
4,340332,318182


In [3]:
# new tables created during the new spatil join and fix process
print(fiona.listlayers(r'M:\Data\GIS layers\p10_TM2_maz\p10_maz\p10_maz.gdb'))
lookup_spatial_join_raw = gpd.read_file(r'M:\Data\GIS layers\p10_TM2_maz\p10_maz\p10_maz.gdb', layer='p10_pba50_2020_07_16_MAZ_tbl')
lookup_spatial_join_fix_raw = gpd.read_file(r'M:\Data\GIS layers\p10_TM2_maz\p10_maz\p10_maz.gdb', layer='p10_no_MAZ_from_spatial_join_fix_tbl')

['p10_parcels_mazs_TM2_v2_2', 'p10_parcels_mazs_TM2_v2_2_Di', 'mazs_TM2_v2_2_Project', 'p10_pba50_2020_07_16', 'p10_pba50_2020_07_16_MAZ', 'p10_no_MAZ_from_spatial_join', 'p10_no_MAZ_from_spatial_join_tbl', 'p10_no_MAZ_from_spatial_join_fix', 'p10_no_MAZ_from_spatial_join_fix_tbl', 'p10_pba50_2020_07_16_MAZ_tbl']


In [4]:
lookup_spatial_join = lookup_spatial_join_raw[['PARCEL_ID', 'maz']]
lookup_spatial_join.rename(columns = {'maz': 'maz_new'}, inplace=True)
lookup_spatial_join['maz_source'] = ''
lookup_spatial_join.loc[lookup_spatial_join.maz_new.notnull(), 'maz_source'] = 'spatial join'
lookup_spatial_join['PARCEL_ID'] = lookup_spatial_join['PARCEL_ID'].apply(lambda x: int(round(x)))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


In [5]:
lookup_spatial_join_fix = lookup_spatial_join_fix_raw[['PARCEL_ID', 'maz', 'maz_source']]
lookup_spatial_join_fix.rename(columns = {'maz': 'maz_fix', 'maz_source': 'maz_source_fix'}, inplace=True)
lookup_spatial_join_fix['PARCEL_ID'] = lookup_spatial_join_fix['PARCEL_ID'].apply(lambda x: int(round(x)))

# I manually fixed 6 parcels by looking at the map and the old lookup, label 'maz_source' as 'oldLookup'
print(lookup_spatial_join_fix.loc[lookup_spatial_join_fix.maz_source_fix.isnull()].shape[0])
lookup_spatial_join_fix.loc[lookup_spatial_join_fix.maz_source_fix.isnull(), 'maz_source_fix'] = 'oldLookup'
print(lookup_spatial_join_fix.maz_source_fix.value_counts())

6
oldLookup                 3087
oldLookup999999_zeroHH    1088
Name: maz_source_fix, dtype: int64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [6]:
new_lookup = lookup_spatial_join.merge(lookup_spatial_join_fix, on='PARCEL_ID', how='outer')

fix_idx = new_lookup.maz_new.isnull()
new_lookup.loc[fix_idx, 'maz_source'] = new_lookup['maz_source_fix']
new_lookup.loc[fix_idx, 'maz_new'] = new_lookup['maz_fix']

new_lookup = new_lookup[['PARCEL_ID', 'maz_new', 'maz_source']]

In [7]:
# merge old and new lookup
lookup = old_lookup.merge(new_lookup, on='PARCEL_ID', how='outer')

# Parcel_ID 2054503 (a synthetic parcel) is not in the new lookup, likely due to "repair geometry",
# use maz value from the old lookup
print(lookup.loc[lookup.maz_new.isnull()])
lookup.loc[lookup.PARCEL_ID == 2054503, 'maz_new'] = lookup['maz_old']
lookup.loc[lookup.PARCEL_ID == 2054503, 'maz_source'] = 'oldLookup'

# add field to compare
lookup['compare'] = 'same'
lookup.loc[lookup.maz_old != lookup.maz_new, 'compare'] = 'diff'

      PARCEL_ID  maz_old  maz_new maz_source
1833    2054503   999999      NaN        NaN


In [8]:
# some stats
display(lookup['maz_source'].value_counts())
display(lookup['compare'].value_counts())

spatial join              1952032
oldLookup                    3088
oldLookup999999_zeroHH       1088
Name: maz_source, dtype: int64

same    1914353
diff      41855
Name: compare, dtype: int64

In [9]:
lookup.to_csv('M:\\Data\\GIS layers\\p10_TM2_maz\\p10_maz\\p10_maz_lookup_compare_20210804.csv', index=False)