# Clustering 2 - Ouray County Parcel Risk
**Author:** Bryce A Young  
**Created:** 2025-04-08 | 
**Modified:** 2025-04-08  

#### Overview
This notebook is largely a revision of the analysis performed in `clustering.ipynb`. 

In this notebook, I do four things: 
1. Import data on the 4000+ homes in Ouray County for which I have both LiDAR and tax assessor data. I select and clean the features with classification and one-hot encoding.
2. Use t-SNE to reduce the dimensionality of the dataset to two dimensions, modifying hyperparameters to get clusters to appear while revealing data relationships.
3. Use DBSCAN to cluster home types based on structure and defensible space features that relate both to *susceptibility* and *heat and ember outputs*.
4. Analyze the clusters for the salient features of structure and DSpace archetypes.

*NOTE: Clustering effectively groups homes into archetypes, then we can go back and assign archetypes to homes and assess how many of those homes burned in historic fires such as Palisades, Eaton, Lahaina, Marshall, and Camp.*

## Environment Setup
---

In [None]:
# setup environment
import os
### Directory ###
# Repository
os.chdir(r'D:/_PROJECTS/P001_OurayParcel/ouray')
# Root workspace
ws = r'D:/_PROJECTS/P001_OurayParcel'

### Data paths ###
# Folder where all the data inputs and outputs will live
data = os.path.join(ws, 'data')
# Scratch folder for intermediate files
scratch = os.path.join(data, '_temp')
# Any final data outputs go here
out = os.path.join(data, '_out')
# Figures to export
figs = os.path.join(ws, r'output/figures')

# correct working directory
os.getcwd()

'D:\\_PROJECTS\\P001_OurayParcel\\ouray'

## Import data
---

In [4]:
import geopandas as gpd
import pandas as pd
pd.set_option('display.max_columns', None) # Display all columns when previewing dfs

gdf = gpd.read_file(os.path.join(out, 'centr_full_nonan_WKID26913.gpkg'))
# df = gdf.drop(columns='geometry')

gdf.head()

Unnamed: 0,County,wui_class,min_ssd,mean_cc0_2m,mean_cc2_4m,mean_cc4_8m,mean_cc8_40m,intersections,PARCELNB,ACCOUNT,ACRES_calc,hiz_ext_out,Actual Year Built\n(AYB),Air Conditioning\n(AIRC),Architecture Style\n(ARCH),Area Acres\n(Area_ACRES),Area SQFT\n(Area_SQFT),Condition\n(COND),DGR - Detached Garage\n(SubArea_DGR),Effective Year Built\n(EYB),Exterior Percent\n(EXW_PERCENT),Exterior Wall\n(EXW),Floor\n(FLR),Frame\n(FRME),Heating Fuel\n(HTFL),Interior Wall\n(INT),Neighborhood\n(NBHD),OPP - Open Porch\n(SubArea_OPP),PTO - Patio\n(SubArea_PTO),RMS\n(RMS),Roof Cover\n(RCVR),Roof Structure\n(RSTR),Type,WBL - Wood Balcony\n(SubArea_WBL),geometry
0,Ouray County,5,11.9362,0.079294,0.119998,0.066202,0.003467,2,403726200085,R002417,35.8913,False,1999.0,0 - N/A,1 - RANCH,0.0,2006.0,3 - GOOD,0.0,2005.0,100.0,9 - MASONITE,1 - WDJST PLYW,9 - FRME 2X4,7 - PROPANE,1 - DRYWALL,7000006 - Outlying6,0.0,577.5,0.776148,9 - PRO PANEL,3 - SHED MED,Residence,0.0,POINT (242785.257 4243534.402)
1,Ouray County,1,213.672097,0.03228,0.038839,0.006272,0.0,0,403915300016,R004080,40.1202,False,2009.0,0 - N/A,1 - RANCH,0.0,1725.0,2 - VERY GOOD,0.0,2010.0,90.0,5 - STUCCO (F),4 - CONC/TILE,9 - FRME 2X4,3 - GAS,1 - DRYWALL,7000006 - Outlying6,560.0,0.0,0.776148,3 - DISTRESSED METAL,6 - GABEL MED,Residence,0.0,POINT (251212.196 4245423.292)
2,Ouray County,5,183.675975,0.063696,0.097281,0.033223,0.082861,0,403736200037,R002506,39.7779,False,1996.0,0 - N/A,7 - MTN CABIN,0.0,692.6,3 - GOOD,0.0,2005.0,100.0,2 - WOOD,5 - CONCRETE,0 - N/A,2 - WOOD,1 - DRYWALL,7000006 - Outlying6,0.0,0.0,0.0,5 - ASPH SHNGL,3 - SHED MED,Residence,0.0,POINT (244097.801 4241432.894)
3,Ouray County,5,101.386718,0.037428,0.219677,0.253634,0.000842,0,403925218002,R004877,16.5952,True,2004.0,0 - N/A,5 - MODULAR,0.0,894.0,3 - GOOD,0.0,2010.0,100.0,9 - MASONITE,1 - WDJST PLYW,9 - FRME 2X4,7 - PROPANE,1 - DRYWALL,7000006 - Outlying6,179.0,0.0,0.776148,9 - PRO PANEL,6 - GABEL MED,Residence,0.0,POINT (254078.896 4242776.879)
4,Ouray County,3,20.213243,0.056646,0.058065,0.112239,0.181261,2,404129300015,R004133,398.976,False,1958.0,1 - NONE,1 - RANCH,0.0,1262.0,3 - GOOD,0.0,1995.0,100.0,9 - MASONITE,1 - WDJST PLYW,9 - FRME 2X4,7 - PROPANE,3 - PANELING,7000002 - Outlying2,0.0,0.0,0.0,9 - PRO PANEL,6 - GABEL MED,Residence,0.0,POINT (257784.354 4241777.874)


In [12]:
# Create index col for future reference
gdf['index'] = gdf.index
# df['index'] = df.index

## Feature Selection
---
`wui_class` was dropped because of its correlation with LiDAR metrics.

In [None]:
# Drop unwanted cols
cols2drop = [
    'wui_class',
    'County',
    'Actual Year Built\n(AYB)',
    'Air Conditioning\n(AIRC)',
    'Architecture Style\n(ARCH)',
    'Effective Year Built\n(EYB)',
    'Exterior Percent\n(EXW_PERCENT)',
    'Floor\n(FLR)',
    'Frame\n(FRME)',
    'Interior Wall\n(INT)',
    'Neighborhood\n(NBHD)',
    'RMS\n(RMS)'
]

gdf.drop(columns=cols2drop, inplace=True)
gdf.columns

Index(['min_ssd', 'mean_cc0_2m', 'mean_cc2_4m', 'mean_cc4_8m', 'mean_cc8_40m',
       'intersections', 'PARCELNB', 'ACCOUNT', 'ACRES_calc', 'hiz_ext_out',
       'Area Acres\n(Area_ACRES)', 'Area SQFT\n(Area_SQFT)',
       'Condition\n(COND)', 'DGR - Detached Garage\n(SubArea_DGR)',
       'Exterior Wall\n(EXW)', 'Heating Fuel\n(HTFL)',
       'OPP - Open Porch\n(SubArea_OPP)', 'PTO - Patio\n(SubArea_PTO)',
       'Roof Cover\n(RCVR)', 'Roof Structure\n(RSTR)', 'Type',
       'WBL - Wood Balcony\n(SubArea_WBL)', 'geometry', 'index'],
      dtype='object')

In [14]:
print(gdf['Condition\n(COND)'].value_counts())

Condition\n(COND)
3 - GOOD             3384
2 - VERY GOOD         403
4 - AVERAGE            93
1 - NEW                89
6 - POOR               26
5 - BELOW AVERAGE       7
Name: count, dtype: int64


## Feature Transformations
---

In [None]:
# Create boolean col for presence of porch, patio, wood balcony
gdf['porch_balcony'] = (
    (gdf['OPP - Open Porch\n(SubArea_OPP)'] > 0) |
    (gdf['PTO - Patio\n(SubArea_PTO)'] > 0) |
    (gdf['WBL - Wood Balcony\n(SubArea_WBL)'] > 0)
)

# Create boolean col for presence of detached garage
gdf['detached_garage'] = (
    gdf['DGR - Detached Garage\n(SubArea_DGR)'] > 0
)

# Create boolean col for avg or below home condition
gdf['cond_below_avg'] = gdf['Condition\n(COND)'].isin(['4 - AVERAGE', '5 - BELOW AVERAGE', '6 - POOR'])



# Drop cols for which new cols were made above
gdf.drop(columns=[
    'OPP - Open Porch\n(SubArea_OPP)',
    'PTO - Patio\n(SubArea_PTO)',
    'WBL - Wood Balcony\n(SubArea_WBL)',
    'Condition\n(COND)',
    'DGR - Detached Garage\n(SubArea_DGR)',
])