# Find the comparables: fixtures.txt

The file `fixtures.txt` contains important property features like number of bedrooms, full baths, half baths, and more. It comes as a melted table, so we need to use the pivot_table method on the dataframe instance to shape it to a table with one observation per row (account number).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
import pickle

import pandas as pd

from src.definitions import ROOT_DIR
from src.data.utils import Table, save_pickle

In [3]:
fixtures_fn = ROOT_DIR / 'data/external/2016/Real_building_land/fixtures.txt'
assert fixtures_fn.exists()

## Load accounts of interest
Let's load only the account numbers that meet the free-standing single-family home criteria that we found while processing the `building_res.txt` file.

In [4]:
fixtures = Table(fixtures_fn, '2016')

In [5]:
skiprows = fixtures.get_skiprows()

In [6]:
fixtures_df = fixtures.get_df(skiprows=skiprows)

In [7]:
fixtures_df.head()

Unnamed: 0,acct,bld_num,type,type_dscr,units
0,1116690000018,1,RMT,Room: Total,5.0
1,1116690000018,1,STY,Story Height Index,1.0
2,1116690000019,1,RMT,Room: Total,5.0
3,1116690000019,1,STY,Story Height Index,2.0
4,1116690000020,1,RMT,Room: Total,5.0


In [8]:
fixtures_df['type_dscr'].value_counts()

Room:  Bedroom                  960725
Story Height Index              960644
Room:  Full Bath                960563
Room:  Total                    960473
Fixtures:  Total                960339
Fixtures:  Addl                 463957
Room:  Half Bath                393908
Room:  Rec                      372823
Fireplace: Metal Prefab         335608
Fireplace: Masonry Firebrick    203721
Masonry Trim                     40565
Fireplace: Direct Vent           27568
Fireplace:  Adl Open              3536
Elevator Stops                    2471
Atrium                            1238
Lower Level Rec                    188
Fireplace:  Open (1)                68
                                    20
Wall Height                          6
Interior Finish Percent              5
Bank:  Drive-Thru                    5
Elev:  Elect / Pass                  4
Pool:  Indoor Value                  4
A/C:  Central                        3
Elev:  Elect / Frght                 2
Fireplace:  Open (3)     

# Select columns and build pivot table
From the value count on the fixtures type description above we can tell that the first 10 types are prevalent in the data. Let's focus on these 10 in our evaluation.

In [9]:
cols = fixtures_df['type_dscr'].value_counts().head(10).index

In [10]:
cond0 = fixtures_df['type_dscr'].isin(cols)
fixtures_df = fixtures_df.loc[cond0, :]

In [11]:
fixtures_pivot = fixtures_df.pivot_table(index='acct', columns='type_dscr', values='units', fill_value=0)

In [12]:
fixtures_pivot.head()

type_dscr,Fireplace: Masonry Firebrick,Fireplace: Metal Prefab,Fixtures: Addl,Fixtures: Total,Room: Bedroom,Room: Full Bath,Room: Half Bath,Room: Rec,Room: Total,Story Height Index
acct,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
21440000001,0.0,0.0,2.0,12.0,3.0,2.0,1.0,1.0,8.0,2.0
21470000008,0.0,0.0,0.0,5.0,2.0,1.0,0.0,0.0,4.0,1.0
21480000002,0.0,0.0,0.0,5.0,3.0,1.0,0.0,0.0,6.0,1.0
21650000007,0.0,0.0,3.0,16.0,3.0,3.0,1.0,0.0,6.0,2.0
21650000011,0.0,1.0,0.0,8.0,3.0,2.0,0.0,0.0,5.0,1.0


add `acct` column to make easier the merging process ahead

In [13]:
fixtures_pivot.reset_index(inplace=True)

# Export fixtures_pivot

In [14]:
save_fn = ROOT_DIR / 'data/raw/2016/fixtures_comps.pickle'
save_pickle(fixtures_pivot, save_fn)