# Preprocess Tax Parcels

Adrian Wiegman | arhwiegman.github.io | adrian.wiegman@usda.gov

2023-03-30

---

In this notebook I manipulate a batch of tax parcel geodatabases from the towns  within the the [MEP study area](https://www.mass.gov/guides/the-massachusetts-estuaries-project-and-reports). 

## Data

Data were obtained from obtained from the MassGIS on March 27, 2023
 at the following link: 
 https://www.mass.gov/info-details/massgis-data-property-tax-parcels

## Processing Steps
1. The files are unzipped
2. TaxPar layers are merged into one geodatabase for all towns 
3. The Assess tables are converted to xlsx format
4. The Assess xlsx files are concatenated into one table
    - within the `outputs` folder
5. Concatenate use code `LUC_LUT` tables from all towns
6. Filter use code for use descriptions that contain cran

The common features in the `TaxPar` and `Assess` tables can be indexed using `LOC_ID`

The outputs will be used in two ways
1. the fields in the Assess data will be used generate landcover classes for N loading
2. then the TaxPar LOC_ID's will be used to index the landcover classes for each discrete 
3. the merged TaxPar layer will be intersected with watershed boundaries

In [9]:
# this codeblock sets up the arcpy environment from jupyter notebooks
setup_notebook = "C:/Users/Adrian.Wiegman/Documents/GitHub/Wiegman_USDA_ARS/MEP/_Setup.ipynb"
%run $setup_notebook # magic command to run the notebook 
fn_hello()

loading python modules...
  `module_list` contains names of all loaded modules
setting up arcpy environment...
input file directory (idr): C:\Workspace\Geodata\Massachusetts\
working directory (wdr): C:\Workspace\Geodata\MEP\
temp dir (tdr): C:\Workspace\Geodata\MEP\temp
default geodatabase path: C:\Workspace\Geodata\MEP\Default.gdb
environment setup complete
functions loaded
hello world


In [76]:
# select tax parcel data and uncompress
TaxDir = os.path.join(idr,"TaxParcels")

In [156]:
# select tax parcel data and unzip all files
# completed on 2023-03-30 
TaxDirNames = fn_recursive_glob_search(TaxDir,'.zip')
print(TaxDirNames)
import zipfile
for path_to_zip_file in TaxDirNames:
    with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
        zip_ref.extractall(TaxDir)

['C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M003_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M020_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M036_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M041_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M042_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M052_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M055_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M062_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M072_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M075_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M082_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M086_parcels_gdb.zip', 'C:\\Workspace\\Geodata\\Massachusetts\\TaxParcels\\M089_parcels_gdb.zip', 'C:\\Workspace\\Geodata\

In [157]:
# get list of gdb file paths
gdbpaths = fn_recursive_glob_search(TaxDir,'.gdb')
print(gdbpaths[0])

# get the town id for each file
townIDs = [fn_regex_search_0(i,"M\d{3}") for i in gdbpaths]
#print(townIDs)

def fn_make_filepaths(lyr,townIDs,gdbpaths):
    # append the layer name to the town ids
    lyrpaths = ["\\"+i+lyr for i in townIDs]

    # join the laer with 
    filepaths = [gdbpaths[i]+lyrpaths[i] for i in range(len(lyrpaths))] 
    print(filepaths[0])
    return(filepaths)
filepaths = fn_make_filepaths("TaxPar",townIDs,gdbpaths)

C:\Workspace\Geodata\Massachusetts\TaxParcels\M003_parcels_CY22_FY22_sde.gdb
C:\Workspace\Geodata\Massachusetts\TaxParcels\M003_parcels_CY22_FY22_sde.gdb\M003TaxPar


In [66]:
# merge all extracted gbd files into one dataset
outname="MEP_TaxPar"
outpath=tdr # save it in the temp dir
arcpy.management.Merge(filepaths, 
                       outname,
                       "", "ADD_SOURCE_INFO")

In [88]:
arcpy.management.Copy("MEP_TaxPar", "MEP_TaxPar_Copy")

In [158]:
filepaths = fn_make_filepaths("Assess",townIDs,gdbpaths)
#fp = filepaths[0:1]
#fps = filepaths[1:3]

C:\Workspace\Geodata\Massachusetts\TaxParcels\M003_parcels_CY22_FY22_sde.gdb\M003Assess


In [180]:
# get a list of xlsx files
excel_files = fn_recursive_glob_search(TaxDir,'.xlsx')
# read all th excel files as dataframes and store in a list
dfs = [pd.read_excel(xlf) for xlf in excel_files]

In [181]:
# concatenate all the excel files to one dataframe
df = pd.concat(dfs)

In [182]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 390881 entries, 0 to 18035
Data columns (total 38 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   OBJECTID    390881 non-null  int64  
 1   PROP_ID     390881 non-null  object 
 2   LOC_ID      390240 non-null  object 
 3   BLDG_VAL    390881 non-null  int64  
 4   LAND_VAL    390881 non-null  int64  
 5   OTHER_VAL   390881 non-null  int64  
 6   TOTAL_VAL   390881 non-null  int64  
 7   FY          390881 non-null  int64  
 8   LOT_SIZE    390880 non-null  float64
 9   LS_DATE     381481 non-null  float64
 10  LS_PRICE    390237 non-null  float64
 11  USE_CODE    390881 non-null  object 
 12  SITE_ADDR   390787 non-null  object 
 13  ADDR_NUM    379686 non-null  object 
 14  FULL_STR    390786 non-null  object 
 15  LOCATION    16922 non-null   object 
 16  CITY        390880 non-null  object 
 17  ZIP         163433 non-null  object 
 18  OWNER1      390662 non-null  object 
 19  OWN

In [184]:
# write df to pickle, to unpickle, use pd.read_pickle("path_to_.pkl")
# reading pickle is much faster than csv, do this for files over 10MB
df.to_pickle(os.path.join(wdr,"outputs",'MEP_TaxParAssess.pkl'))

In [185]:
# write do csv
df.to_csv(os.path.join(wdr,"outputs",'MEP_TaxParAssess.csv'))

In [113]:
outname="MEP_TaxPar_Copy"
intables = outtables
out = arcpy.management.Merge(intables[0:2],
                       outname,
                       "", "ADD_SOURCE_INFO")
#mx.addLayer(out)

## Appendix

lyrFile = arcpy.mp.LayerFile(outname)
mx.addLayer(outname,'TOP')