# Transform

In [18]:
import geopandas as gpd

In [19]:
import warnings
warnings.simplefilter("ignore")

## Identify buildings in Houston's city limits

For the purpose of this analysis, we are aiming to segregate buildings in the Houston city limits from those in the surrounding suburbs and unincorporated areas of Harris County.

According to officials at the Harris County Appraisal District, this is accomplished by joining the "tax district" table to the building list and identifying those buildings taxed by the city government.

First read in the buildings

In [20]:
res = gpd.pd.read_csv("input/building_res.csv", dtype={"ACCOUNT": str})

Then read in the tax districts

In [21]:
jur = gpd.pd.read_csv("input/jur_value.csv", dtype={"ACCOUNT": str, "TAX_DISTRICT": str})

Trim down the tax district table to the crucial column

In [22]:
districts = jur[[
    'ACCOUNT',
    'TAX_DISTRICT'
]]

Filter the table down to properties listed in the Houston city tax district, which county officials said is encoded as "061."

In [23]:
in_houston = districts[districts.TAX_DISTRICT == '061']

Mark all buildings that are found in that filtered list 

In [24]:
res['IN_HOUSTON'] = res.ACCOUNT.isin(in_houston.ACCOUNT)

Trim down to the columns we want to keep

In [25]:
trimmed_buildings = res[[
    'ACCOUNT',
    'USE_CODE',
    'CLASS_STRUCTURE',
    'BUILDING_NUMBER',
    'DATE_ERECTED',
    'IN_HOUSTON'
]]

Add a decade column for later analysis

In [26]:
def get_decade(year):
    s = str(year)
    s = s[:-1]
    s += "0"
    return int(s)

In [30]:
trimmed_buildings['DECADE'] = trimmed_buildings.DATE_ERECTED.apply(get_decade)

Write out the buildings file with this extra data included

In [31]:
trimmed_buildings.to_csv("./output/buildings.csv", index=False)

## Transform the parcels map to the same projection as the flood zones

Read it in

In [32]:
parcels = gpd.read_file("./input/Parcels/Parcels.shp")

Trim to down to the columns we want to keep

In [33]:
trimmed_parcels = parcels[[
    'HCAD_NUM',
    'CONDO_FLAG',
    'CurrOwner',
    'LocAddr',
    'city',
    'zip',
    'geometry'
]].rename(columns={
    "CurrOwner": "OWNER",
    "HCAD_NUM": "ACCOUNT",
    "LocAddr": "ADDRESS",
    "city": "CITY",
    "zip": "ZIPCODE"
})

Reproject it and write it back out

In [34]:
trimmed_parcels.to_crs({'init': 'epsg:4269'}).to_file("./output/parcels.shp")

## Filter down to official FEMA flood zones

According to FEMA officials, the 100-year and 500-year flood zones are the ones best describes as flood prone areas according to their standards. Here we will read in the complete set of flood zones, segregate out those two types and write them to separate files.

In [35]:
flood_zones = gpd.read_file("./input/48201C_20171002/S_FLD_HAZ_AR.shp")

Filter down as instructed by FEMA staff in a phone interview.

In [36]:
one_hundred_year_zones = flood_zones[flood_zones.FLD_ZONE.isin(['AE', 'VE', 'A', 'AO'])]

In [37]:
five_hundred_year_zones = flood_zones[
    (flood_zones.FLD_ZONE == 'X') &
    (flood_zones.ZONE_SUBTY == '0.2 PCT ANNUAL CHANCE FLOOD HAZARD')
]

Write out the results

In [38]:
one_hundred_year_zones.to_file("./output/one-hundred-year-flood-zones.shp")

In [39]:
five_hundred_year_zones.to_file("./output/five-hundred-year-flood-zones.shp")