# HARRIS COUNTY HOUSING STABILITY PIPELINE

## BUSINESS QUESTION

HOW STABLE IS THE HOUSING MARKET IN HARRIS COUNTY FOR HOUSEHOLDS BETWEEN THE 25% AND 50TH PERCENTILE OF INCOME FROM 2017 TO 2025

### SUBQUESTIONS

- How many houses there are for this demographic
- Determine the number of houses that remain in the same demographic category in 2025 and measure how long each of those houses has continuously stayed in that category
- Determine the number of houses that have changed categories and how many times in the last 8 years

## METRICS

<table>
  <tr>
    <th> Metric Name </th>
    <th> Definition </th>
    <th> Is Guardrail </th>
  </tr>
  <tr>
    <td> raw_rows (raw) </td>
    <td> Rows before transformation </td>
    <td> Yes </td>
  </tr>
  <tr>
    <td> clean_rows (after cleaning) </td>
    <td> Rows after transformation </td>
    <td> Yes </td>
  </tr>
  <tr>
    <td> transformation_run_time </td>
    <td> Duration of transformation </td>
    <td> Yes </td>
  </tr>
  <tr>
    <td> empty_name_list </td>
    <td> Rows without a registered owner </td>
    <td> No </td>
  </tr>
  <tr>
    <td> empty_street_rows </td>
    <td> Rows without street specified </td>
    <td> No </td>
  </tr>
  <tr>
    <td> empty_market_value_rows </td>
    <td> Rows without market value specified </td>
    <td> Yes </td>
  </tr>
  <tr>
    <td> invalid_zips </td>
    <td> Rows with invalid zip codes </td>
    <td> No </td>
  </tr>
  <tr>
    <td> unknown_or_out_of_county_zip_codes </td>
    <td> invalid or out of county zip codes in property table </td>
    <td> No </td>
  </tr>
  <tr>
    <td> zip_codes_without_properties_in_designated_state_class </td>
    <td> Zip codes with no population </td>
    <td> No </td>
  </tr>
  <tr>
    <td> accounts_not_in_owners </td>
    <td> accounts in owners table but not in property table </td>
    <td> Yes </td>
  </tr>
  <tr>
    <td> accounts_not_in_property </td>
    <td> accounts in property table but not in owners table </td>
    <td> Yes </td>
  </tr>
</table>

## FLOW DIAGRAM

%md
![this_image](/Workspace/h_c_notebooks/Screenshot 2026-02-03 121455.png)

## SCHEMAS

### BRONZE TABLES

OWNERS TABLE

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>acct</th>
    <th>STRING</th>
    <th>Unique identifier of property</th>
  </tr>
  <tr>
    <th>ln_num</th>
    <th>STRING</th>
    <th>Name of owner</th>
  </tr>
  <tr>
    <th>name</th>
    <th>STRING</th>
    <th>name of owner</th>
  </tr>
  <tr>
    <th>aka</th>
    <th>STRING</th>
    <th>alias of owner</th>
  </tr>
  <tr>
    <th>pct_own</th>
    <th>STRING</th>
    <th>Percentage of ownership held by the owner identifed by (name, ln_num) for a given account</th>
  </tr>

ZIP CODE TABLE

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>ZIP_Code</th>
    <th>STRING</th>
    <th>Zip code</th>
  </tr>
  <tr>
    <th>ClassificationClass</th>
    <th>STRING</th>
    <th>Classification label (standard, p.o.box)</th>
  </tr>
  <tr>
    <th>City</th>
    <th>STRING</th>
    <th>City corresponding to zip code</th>
  </tr>
  <tr>
    <th>random1</th>
    <th>STRING</th>
    <th>Population of delivery points in ZIP</th>
  </tr>
  <tr>
    <th>random2</th>
    <th>STRING</th>
    <th>Percentage of delivery points represented by that ZIP</th>
  </tr>
</table>

PROPERTY TABLE

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>acct</th>
    <th>STRING</th>
    <th>Unique Identifier of Property</th>
  </tr>
  <tr>
    <th>yr</th>
    <th>STRING</th>
    <th>Year of Entry</th>
  </tr>
  <tr>
    <th>mailto</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_addr_1</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_addr_2</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_city</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_state</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_zip</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>mail_country</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>undeliverable</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_pfx</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_num</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_num_sfx</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_sfx</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_sfx_dir</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>str_unit</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>site_addr_1</th>
    <th>STRING</th>
    <th>Address of Property</th>
  </tr>
  <tr>
    <th>site_addr_2</th>
    <th>STRING</th>
    <th>City of Property</th>
  </tr>
  <tr>
    <th>site_addr_3</th>
    <th>STRING</th>
    <th>Zip Code of Property</th>
  </tr>
  <tr>
    <th>state_class</th>
    <th>STRING</th>
    <th>Residential Structure Type</th>
  </tr>
  <tr>
    <th>school_dist</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>map_facet</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>key_map</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Neighborhood_Code</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Neighborhood_Grp</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Market_Area_1</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Market_Area_1_Dscr</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Market_Area_2</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Market_Area_2_Dscr</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>econ_area</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>econ_bld_class</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>center_code</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>yr_impr</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>yr_annexed</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>splt_dt</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>dsc_cd</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>nxt_bld</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>bld_ar</th>
    <th>STRING</th>
    <th>Building Area of Property</th>
  </tr>
  <tr>
    <th>land_ar</th>
    <th>STRING</th>
    <th>Land Area of Property</th>
  </tr>
  <tr>
    <th>acreage</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>Cap_acct</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>shared_cad</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>land_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>bld_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>x_features_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>ag_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>assessed_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>tot_appr_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>tot_mkt_val</th>
    <th>STRING</th>
    <th>Market Value of Property</th>
  </tr>
  <tr>
    <th>prior_land_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>prior_bld_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>prior_x_features_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>prior_ag_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>prior_tot_appr_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>prior_tot_mkt_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>new_construction_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>tot_rcn_val</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>value_status</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>noticed</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>notice_dt</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>protested</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>certified_dat</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>rev_dt</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>rev_by</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>new_own_dt</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>lgl_1</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>lgl_2</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>lgl_3</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>lgl_4</th>
    <th>STRING</th>
    <th></th>
  </tr>
  <tr>
    <th>jurs</th>
    <th>STRING</th>
    <th></th>
  </tr>
</table>

### SILVER TABLES

### Silver

Owners table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>dim_account_number</th>
    <th>STRING</th>
    <th>Unique identifier of property</th>
  </tr>
  <tr>
    <th>dim_name_list</th>
    <th>ARRAY[STRING] </th>
    <th>A list of all the owners registered to property (no aliases)</th>
  </tr>


### QUALITY CHECKS

Data integrity tests:
- All specified columns in schema must be included
- Must have at least one row with no nulls in columns: "acct" and "name" (from bronze owners table)

UC tests
- There must be at least 1,000,000 rows with a non empty "dim_name_list"
- There are no duplicate in column dim_account_number 
- there are no nulls in dim_account_number

Metrics
- raw_rows - incoming rows
- clean_rows - outgoing rows
- empty_name_list - amount of empty lists in column "dim_name_list"
- transformation_run_time - time to complete transformations


Comments: All strings are in lower case and trimmed

Zip code table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>dim_zip_code</th>
    <th>STRING</th>
    <th>Zip code</th>
  </tr>
  <tr>
    <th>dim_classification_class</th>
    <th>STRING</th>
    <th>Classification label (standard, p.o.box)</th>
  </tr>
  <tr>
    <th>dim_city</th>
    <th>STRING</th>
    <th>City corresponding to zip code</th>
  </tr>
</table>

### QUALITY CHECKS

Data integrity tests:
- All specified columns in schema must be included
- Must have at least one row with no nulls in columns: "ZIP_Code" and "City" (from bronze zip code table)

UC tests
- No nulls in column "dim_zip_code"
- No zip codes that are not 5 characters long
- No nulls in column "dim_city"


Metrics
- raw_rows - incoming rows
- clean_rows - outgoing rows
- transformation_run_time - time to complete transformations


Comments: All strings are in lower case and trimmed

Property table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column comment</th>
  </tr>
  <tr>
    <th>dim_account_number</th>
    <th>STRING</th>
    <th>Unique Identifier of Property</th>
  </tr>
  <tr>
    <th>dim_street</th>
    <th>STRING</th>
    <th>Address of Property</th>
  </tr>
  <tr>
    <th>dim_city</th>
    <th>STRING</th>
    <th>City of Property</th>
  </tr>
  <tr>
    <th>dim_zip_code</th>
    <th>STRING</th>
    <th>Zip Code of Property</th>
  </tr>
  <tr>
    <th>dim_state_class</th>
    <th>STRING</th>
    <th>Residential Structure Type</th>
  </tr>
  <tr>
    <th>m_building_area</th>
    <th>BIGINT</th>
    <th>Building Area of Property</th>
  </tr>
  <tr>
    <th>m_land_area</th>
    <th>BIGINT</th>
    <th>Land Area of Property</th>
  </tr>
  <tr>
    <th>m_total_market_value</th>
    <th>BIGINT</th>
    <th>Market Value of Property</th>
  </tr>
  <tr>
    <th>dim_year_date</th>
    <th>DATE</th>
    <th>Year of Entry</th>
  </tr>
</table>

Data integrity tests:
- All specified columns in schema must be included

UC tests
- At least 1,000,000 rows with all of the following:
    - "dim_account_number" not being 'unknown'
    - "dim_zip_code" not being '00000'
    - "dim_state_class" not being 'unknown'
    - "dim_city" not being 'unknown'
    - "m_total_market_value" not being 0
- No duplicates in "dim_account_number"
- No nulls in "dim_account_number"
- All state classes in specified range (what type of properties are selected, in this case, "single family homes" and "mobile homes")
- No values in column "m_total_market_value" outside 0 and 100,000,000 (this is arbitrary and makes sense when examining single family housing, not mutli-family housing)

Metrics
- raw_rows - incoming rows
- clean_rows - outgoing rows
- empty_street_rows - rows with "unknown" street rows
- empty_market_value_rows - rows with 0 market value
- invalid_zips - rows with '00000' in zipcode
- transformation_run_time - time to complete transformations


Comments: All strings are in lower case and trimmed

Joined table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column Commnet</th>
  </tr>
  <tr>
    <td>dim_account_number</td>
    <td>STRING</td>
    <td>Unique Identifier of Property</td>
  </tr>
  <tr>
    <td>dim_zip_code</td>
    <td>STRING</td>
    <td>Zip Code of Property</td>
  </tr>
  <tr>
    <td>dim_city</td>
    <td>STRING</td>
    <td>City of Property</td>
  </tr>
  <tr>
    <td>dim_street</td>
    <td>STRING</td>
    <td>Address of Property</td>
  </tr>
  <tr>
    <td>m_building_area</td>
    <td>FLOAT</td>
    <td>Building Area of Property</td>
  </tr>
  <tr>
    <td>m_land_area</td>
    <td>FLOAT</td>
    <td>Land Area of Property</td>
  </tr>
  <tr>
    <td>m_total_market_value</td>
    <td>FLOAT</td>
    <td>Market Value of Property</td>
  </tr>
  <tr>
    <td>dim_name_list</td>
    <td>ARRAY[STRING]</td>
    <td>A list of all the owners registered to property (no aliases)</td>
  </tr>
  <tr>
    <td>dim_state_class</td>
    <td>STRING</td>
    <td>Residential Structure Type</td>
  </tr>
  <tr>
    <td>dim_year_date</td>
    <td>DATE</td>
    <td>Year of Entry</td>
  </tr>
</table>

### QUALITY CHECKS

Metrics (Referential Integrity test)
- unknown_or_out_of_country_zip_codes - zip codes in property table that are outside harris county or have an invalid zip code
- zip_codes_without_properties_in_designated_staet_class - zip codes that are "empty" (do not have a property with the specified state class)
- accounts_not_in_owners - accounts in property table but not in owners table
- accounts_not_in_property - accounts in owners table but not in property table

### Gold

Scd table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column Comment</th>
  </tr>
  <tr>
    <td>dim_account_number</td>
    <td>STRING</td>
    <td>Unique Identifier of Property</td>
  </tr>
    <tr>
    <td>quartile</td>
    <td>STRING</td>
    <td>Income quartile to which the property owner belongs based on criteria explaiend below</td>
  </tr>
    <tr>
    <td>start_date</td>
    <td>DATE</td>
    <td>Start date when the property entered this quartile</td>
  </tr>
    <tr>
    <td>end_date</td>
    <td>DATE</td>
    <td>Start date when the property entered this quartile</td>
  </tr>
</table>

The quartile categorization is an arbitrary classification of the property based on the owner's income and the property's value. Quartile limits are derived from the 2025 U.S. household income quartiles and scaled by a factor of 4.

Properties in quartile table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column Comment</th>
  </tr>
  <tr>
    <td>quartile</td>
    <td>STRING</td>
    <td>Income quartile to which the property belongs to. Each row summarizes how many properties belong to this quartile</td>
  </tr>
  <tr>
    <td>amount_of_properties_in_quartile</td>
    <td>INT</td>
    <td>Quantity of houses in a quartile</td>
  </tr>
</table>

Stability categorization table

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column Comment</th>
  </tr>
  <tr>
    <td>stability_categorization</td>
    <td>STRING</td>
    <td>Group of properties classified by how frequently they change quartile. Each row summarizes how many properties in that category fall into each percentile range.</td>
  </tr>
  <tr>
    <td>q_0_25</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 0-25th percentile</td>
  </tr>
  <tr>
    <td>q_25_50</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 25-50th percentile</td>
  </tr>
  <tr>
    <td>q_50_75</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 50-75th percentile</td>
  </tr>
  <tr>
    <td>q_75_100</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 75-100th percentile</td>
  </tr>
</table>


The stability categories are
- "very very" stable - if a property has not changes quartiles in the last 8 years starting from then end of 2025 (2920 days)
- "very stable" - no quartile change in 4 years (1460 days)
- "stable" - no quartile changes in 2 years (730 days)
- "variable price" - a quartile change in the last 2 years (729 days)

Amount of quartile changes table 

<table>
  <tr>
    <th>Column Name</th>
    <th>Column Type</th>
    <th>Column Comment</th>
  </tr>
  <tr>
    <td>quartile_changes</td>
    <td>INT/td>
    <td>Group of properties classified by quanity of quartile changes from 2017 to 2025. Each row summarizes how many changes a property has had in each percentile range</td>
  </tr>
  <tr>
    <td>q_0_25</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 0-25th percentile</td>
  </tr>
  <tr>
    <td>q_25_50</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 25-50th percentile</td>
  </tr>
  <tr>
    <td>q_50_75</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 50-75th percentile</td>
  </tr>
  <tr>
    <td>q_75_100</td>
    <td>INT</td>
    <td>Number of properties in this stability category whose values fall in the 75-100th percentile</td>
  </tr>

</table>