# Ouray Tax Assessor - Data Analysis (pre-modeling)

Author: **Bryce A Young** (git bryceayoung) | 
Created: **2024-02-19** | 
Modified: **2025-02-19**

In this notebook, we analyze data relationships and distributions of building features from Ouray County tax assessor. The dataframe that we use for this analysis was created in `data_prep/assessor_data_cleaning.ipynb`. We join the tax assessor data to the building centroids, completing the dataset (sans risk scores).

#### Data 
- (tabular) data/tax_assessor/assessor_clean_v3.csv
- (geom) data/_temp/centr_hiz_parcel.gpkg

#### Workflow 
- Unstructured, freeform exploration of data distributions and relationships (it might be messy)

## Step 0: Setup Environment
---

In [1]:
import os
### Directory ###
# Repository
os.chdir(r'D:/_PROJECTS/P001_OurayParcel/ouray')
# Root workspace
ws = r'D:/_PROJECTS/P001_OurayParcel'

### Data paths ###
# Folder where all the data inputs and outputs will live
data = os.path.join(ws, 'data')
# Folder with tax assessor data
tax = os.path.join(data, 'tax_assessor')
# Scratch folder (contains footprint centroids with hiz info)
scratch = os.path.join(data, '_temp')
# Any final outputs go here
out = os.path.join(data, '_out')
# Figures to export
figs = os.path.join(out, 'figures')

# Ensure correct working directory
os.getcwd()

'D:\\_PROJECTS\\P001_OurayParcel\\ouray'

In [2]:
import pandas as pd
# Read in assessor database
df = pd.read_csv(os.path.join(tax, 'assessor_clean_v3.csv'))
# Set option to display max cols
pd.set_option('display.max_columns', None)
# Preview
df.head()

Unnamed: 0,Account Number,Actual Year Built\n(AYB),Air Conditioning\n(AIRC),Architecture Style\n(ARCH),Area Acres\n(Area_ACRES),Area SQFT\n(Area_SQFT),Condition\n(COND),DGR - Detached Garage\n(SubArea_DGR),Effective Year Built\n(EYB),Exterior Percent\n(EXW_PERCENT),Exterior Wall\n(EXW),Floor\n(FLR),Foundation\n(FOUND),Frame\n(FRME),Heating Fuel\n(HTFL),Interior Wall\n(INT),Neighborhood\n(NBHD),OPP - Open Porch\n(SubArea_OPP),PTO - Patio\n(SubArea_PTO),Parcel Number,RMS\n(RMS),Roof Cover\n(RCVR),Roof Structure\n(RSTR),Type,WBL - Wood Balcony\n(SubArea_WBL)
0,M000034,1976.0,0 - N/A,0 - PRE-HUD,0.0,1064.0,5 - BELOW AVERAGE,0.0,1995.0,100.0,22 - ALUM,1 - WDJST PLYW,0.0,2 - FRME 2X4,3 - GAS,0 - N/A,1000200 - 4J MH PARK,176.0,0.0,MOBILEM00034,0.0,3 - METAL,4 - ARCH,Mobile Home,0.0
1,M000037,1992.0,0 - N/A,1 - SINGLE,0.0,840.0,3 - GOOD,0.0,2005.0,100.0,4 - FRAME,1 - WDJST PLYW,0.0,3 - FRME 2X6,3 - GAS,1 - DRYWALL,1000200 - 4J MH PARK,160.0,0.0,MOBILEM00037,0.0,3 - METAL,3 - GABLE,Mobile Home,0.0
2,M000040,1985.0,0 - N/A,1 - SINGLE,0.0,980.0,4 - AVERAGE,0.0,1985.0,100.0,4 - FRAME,1 - WDJST PLYW,0.0,2 - FRME 2X4,3 - GAS,2 - PLSTR/LTH,1000100 - SWISS VILLAGE MH PARK,104.0,0.0,MOBILEM00040,0.776148,3 - METAL,3 - GABLE,Mobile Home,0.0
3,M000042,2002.0,0 - N/A,1 - SINGLE,0.0,1216.0,4 - AVERAGE,0.0,2005.0,100.0,3 - DESTRESSED METAL,1 - WDJST PLYW,0.0,2 - FRME 2X4,3 - GAS,1 - DRYWALL,1000100 - SWISS VILLAGE MH PARK,212.0,0.0,MOBILEM00042,0.0,3 - METAL,3 - GABLE,Mobile Home,0.0
4,M000043,1975.0,0 - N/A,0 - PRE-HUD,0.0,1064.0,4 - AVERAGE,0.0,1995.0,100.0,3 - DESTRESSED METAL,1 - WDJST PLYW,0.0,2 - FRME 2X4,3 - GAS,0 - N/A,1000200 - 4J MH PARK,24.0,0.0,MOBILEM00043,0.0,3 - METAL,3 - GABLE,Mobile Home,0.0


In [3]:
df.info

<bound method DataFrame.info of      Account Number  Actual Year Built\n(AYB) Air Conditioning\n(AIRC)  \
0           M000034                    1976.0                  0 - N/A   
1           M000037                    1992.0                  0 - N/A   
2           M000040                    1985.0                  0 - N/A   
3           M000042                    2002.0                  0 - N/A   
4           M000043                    1975.0                  0 - N/A   
...             ...                       ...                      ...   
3471        R006937                    2023.0                  0 - N/A   
3472        R006938                    2023.0                  0 - N/A   
3473        R006947                    2023.0                  0 - N/A   
3474        R006948                    2023.0                  0 - N/A   
3475        R006955                    1964.0                  0 - N/A   

     Architecture Style\n(ARCH)  Area Acres\n(Area_ACRES)  \
0                 

In [4]:
import geopandas as gpd
gdf = gpd.read_file(os.path.join(scratch, 'centr_hiz_parcel.gpkg'))
gdf.head()

Unnamed: 0,County,wui_class,min_ssd,mean_cc0_2m,mean_cc2_4m,mean_cc4_8m,mean_cc8_40m,intersections,PARCELNB,ACCOUNT,ACRES_calc,hiz_ext_out,geometry
0,Ouray County,5,221.938317,,,,,0,403535300013,R002619,40.3299,True,POINT (233065.909 4241052.6)
1,Ouray County,5,11.9362,0.079294,0.119998,0.066202,0.003467,2,403726200085,R002417,35.8913,False,POINT (242785.257 4243534.402)
2,Ouray County,1,213.672097,0.03228,0.038839,0.006272,0.0,0,403915300016,R004080,40.1202,False,POINT (251212.196 4245423.292)
3,Ouray County,5,183.675975,0.063696,0.097281,0.033223,0.082861,0,403736200037,R002506,39.7779,False,POINT (244097.801 4241432.894)
4,Ouray County,5,101.386718,0.037428,0.219677,0.253634,0.000842,0,403925218002,R004877,16.5952,True,POINT (254078.896 4242776.879)


In [5]:
# Extract the first character from each ACCOUNT value
gdf['prefix'] = gdf['ACCOUNT'].astype(str).str[0]  # Ensure it's a string and grab the first character

# Count occurrences of each unique prefix
prefix_counts = gdf['prefix'].value_counts()

# Display results
print(prefix_counts)

prefix
R    4435
N      45
C      11
F      10
M       9
U       8
S       3
T       3
H       2
1       1
B       1
A       1
K       1
D       1
V       1
E       1
Name: count, dtype: int64


In [6]:
# Extract the first character from each ACCOUNT value
df['prefix'] = df['Account Number'].astype(str).str[0]  # Ensure it's a string and grab the first character

# Count occurrences of each unique prefix
prefix_counts = df['prefix'].value_counts()

# Display results
print(prefix_counts)

prefix
R    3337
M     139
Name: count, dtype: int64


Okay, so we can see that there is a mismatch of the number of homes that have tax assessor information and the number of homes that are in the building footprints database. Namely, there are quite a few homes missing from the assessor database. Additionally, there are 9 mobile homes in the mbf dataframe and 139 in the tax assessor dataframe. 

When I cleaned the assessor database, I believe I removed homes where there were multiple homes per parcel number. Looks like I may also have removed commercial structures. If I go back and correct this, then I should have a better dataset to work with.