# TSV Maker for housing characteristics using PUMA data

**Contributors** : Tobi Adekanye

**Date Created** : January 14, 2020

Housing characteristics tsvs are presently created using EPW weather files. However, these are large areas with multiple cities and counties making up an EPW region. We use Public Use Microdata Areas (PUMA) to create smaller, more spatially relevant regions to represent some of the housing characteristics. This notebook restructures the PUMA data to create the relevant tsvs housing characteristics. 

There are two files that need to be downloaded for the tsv maker to work:

1. **PUMA data (csv)** : this file contains the different housing characteristics and household weights for the PUMA areas
2. **Data dictionary (xls)** : the PUMA data codes the different housing characteristics as integer data. This file   contains several sheets which map the integer data to the respective names. It also contains ResStock mapping in the case that the census mapping does not match that which presently is used in-house. 

# Inputs

Some inputs for this notebook.
- `write_tf(bool)`: Flag to write the created tsv files
- `project(string)`: the name of the project file where the tsv will be copied into. This input does not matter if `write_tf=False`.
- `dep_lst(list(str))`: list of the dependencies needed for the tsv file
- `option_col(string)`: the name of the field which the dependencies will be distributed against

**Note**: dep_lst and option_col need to be specified as named in the PUMA tsv file else the tsv maker will not run as appropriate. 

# PUMA fields list

The columns contained in the PUMA data csv file are as follows:

- **YEAR**: Census Year
- **MULTYEAR**: Actual year of survey, multiyear ACS
- **SAMPLE**: IPUMS sample identifier
- **SERIAL**: Household serial number
- **CBSERIAL**: Original Census Bureau household serial number
- **HHWT**: Household weight
- **STATEICP**: State (ICPSR code)
- **STATEFIP**: State (FIPS code)
- **COUNTYICP**: County (ICPSR code)
- **COUNTYFIP**: County (FIPS code)
- **PUMA**: Public Use Microdata Area
- **GQ**: Group quarters status
- **OWNERSHP**: Ownership of dwelling (tenure) (general version)
- **OWNERSHPD**: Ownership of dwelling (tenure) (detailed version)
- **COSTELEC**: Annual electricity cost
- **COSTGAS**: Annual gas cost
- **COSTWATER**: Annual water cost
- **COSTFUEL**: Annual home heating fuel
- **VACANCY**: Vacancy status
- **ROOMS**: Number of rooms
- **BUILTYR2**: Age of structure (decade)
- **UNITSSTR**: Units in structure
- **BEDROOMS**: Number of bedrooms
- **FUELHEAT**: Home heating fuel
- **PERNUM**: Person number in sample unit
- **PERWT**: Person weight


As stated above, `dep_lst` and `option_col` need to specified as one/more of the field descriptions in the PUMA data csv file as follows. The table below provides a mapping of some of the PUMA data csc columns to the present housing characteristic columns available. 

| PUMA description | Present housing xtic. description |
|------------------|-----------------------------------|
| UNITSSTR         | Geometry Building Type FPL        |
| BEDROOMS         | Bedrooms                          |
| BUILTYR2         | Vintage                           |
| BUILTYR2_FPL**   | Vintage FPL                       |
| FUELHEAT         | Heating Fuel                      |

*This column does not exist presently in the PUMA data csv file. However, `dep_lst` or `option_col` can be specifed as "BUILTYR2_FPL"*

# Import modules

In [24]:
import os
import sys
import pandas as pd
import numpy as np

In [25]:
# Try to get the tsv_maker if it exists, then reload
try:
    del sys.modules['tsv_maker_puma_v1'] 
except KeyError:
    pass
from tsv_maker_puma_v1 import TSVMaker # Class methods to create the spatial tsvs

# Write reformatted PUMA tsv to project folder

The table below provides the inputs that would be need to create some of the common housing characteristics tsvs:

| TSV File                   | dep_lst                              | option_col     |
|----------------------------|--------------------------------------|----------------|
| Geometry Building File FPL | ["PUMA"]                             | "UNITSSTR"     |
| Vintage FPL                | ["PUMA","UNITSSTR"]                  | "BUILTYR2_FPL" |
| Vintage                    | ["PUMA", "BUILTYR2_FPL"]             | "BUILTYR2"     |
| Heating Fuel               | ["PUMA", "UNITSSTR", "BUILTYR2_FPL"] | "FUELHEAT"     |



In [26]:
# writing single files to folder 
write_tf = True
project= "puma_tsvs" # select project folder

# for Geometry Building Type FPL, for example,run the following:
dep_lst = ['PUMA']
option_col = "UNITSSTR"

# Initialize PUMA tsv_maker object
puma_tsv = TSVMaker(project, dep_lst, option_col)

#puma_tsv.create_tsv_with_dependencies()
# Display
#display(puma_tsv.pivot_df.head(10))

# Write new housing characteristics to project folder
if write_tf:
    puma_tsv.write_tsv_to_projects()

Initializing PUMA TSVMaker
---------------------
Downloading data from s3
PUMA Files...
Loading PUMA files
All done! file(s) written into tsv paths!


In [23]:
## Alternatively, one could write multiple files to the folder - this takes a long while 

# To create Geometry Building Type FPL, Vintage FPL, and Heating Fuel tsvs, for example,run the following:
dep_opt_lst = (["PUMA"], "UNITSSTR"), (["PUMA","UNITSSTR"], "BUILTYR2_FPL"), (["PUMA", "UNITSSTR", "BUILTYR2_FPL"],"FUELHEAT")

for i in dep_opt_lst:
    dep_lst = i[0]
    option_col = i[1]
    puma_tsv = TSVMaker(project, dep_lst, option_col)
    
    if write_tf:
        puma_tsv.write_tsv_to_projects()

Initializing PUMA TSVMaker
---------------------
Downloading data from s3
PUMA Files...
Loading PUMA files
All done! file(s) written into tsv paths!
Initializing PUMA TSVMaker
---------------------
Downloading data from s3
PUMA Files...
Loading PUMA files
All done! file(s) written into tsv paths!
Initializing PUMA TSVMaker
---------------------
Downloading data from s3
PUMA Files...
Loading PUMA files
All done! file(s) written into tsv paths!
