# Find the comparables: fixtures.txt

The file `fixtures.txt` contains important property features like number of bedrooms, full baths, half baths, and more. It comes as a melted table, so we need to use the pivot_table method on the dataframe instance to shape it to a table with one observation per row (account number).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
import pickle

import pandas as pd

from src.definitions import ROOT_DIR
from src.data.utils import Table, save_pickle

In [3]:
fixtures_fn = ROOT_DIR / 'data/external/2016/Real_building_land/fixtures.txt'
assert fixtures_fn.exists()

## Load accounts of interest
Let's load only the account numbers that meet the free-standing single-family home criteria that we found while processing the `building_res.txt` file.

In [4]:
fixtures = Table(fixtures_fn, '2016')

In [5]:
skiprows = fixtures.get_skiprows()

In [6]:
fixtures_df = fixtures.get_df(skiprows=skiprows)

In [7]:
fixtures_df.head()

Unnamed: 0,acct,bld_num,type,type_dscr,units
0,1142540050016,1,FXT,Fixtures: Total,8.0
1,1142540050019,1,FPM,Fireplace: Metal Prefab,1.0
2,1142540050019,1,FXT,Fixtures: Total,8.0
3,1142540050027,1,FPM,Fireplace: Metal Prefab,1.0
4,1142540050027,1,FXT,Fixtures: Total,8.0


In [8]:
fixtures_df['type_dscr'].value_counts()

Room:  Bedroom                  960729
Story Height Index              960648
Room:  Full Bath                960567
Room:  Total                    960477
Fixtures:  Total                960343
Fixtures:  Addl                 463957
Room:  Half Bath                393910
Room:  Rec                      372823
Fireplace: Metal Prefab         335609
Fireplace: Masonry Firebrick    203722
Masonry Trim                     40565
Fireplace: Direct Vent           27568
Fireplace:  Adl Open              3536
Elevator Stops                    2471
Atrium                            1238
Lower Level Rec                    188
Fireplace:  Open (1)                68
                                    20
Wall Height                          6
Bank:  Drive-Thru                    5
Interior Finish Percent              5
Elev:  Elect / Pass                  4
Pool:  Indoor Value                  4
A/C:  Central                        3
OH Door:  Motor RS                   2
Elev:  Elect / Frght     

# Select columns and build pivot table
From the value count on the fixtures type description above we can tell that the first 10 types are prevalent in the data. Let's focus on these 10 in our evaluation.
Also, we assume there is only one reported fixture per category per property. If there is more than one value reported we will take the maximum of those.

In [9]:
cols = fixtures_df['type_dscr'].value_counts().head(10).index

In [10]:
cond0 = fixtures_df['type_dscr'].isin(cols)
fixtures_df = fixtures_df.loc[cond0, :]

In [11]:
fixtures_pivot = fixtures_df.pivot_table(index='acct',
                                         columns='type_dscr',
                                         values='units',
                                         fill_value=0,
                                         aggfunc='max')

In [12]:
fixtures_pivot.head()

type_dscr,Fireplace: Masonry Firebrick,Fireplace: Metal Prefab,Fixtures: Addl,Fixtures: Total,Room: Bedroom,Room: Full Bath,Room: Half Bath,Room: Rec,Room: Total,Story Height Index
acct,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
21440000001,0,0,2,12,3,2,1,1,8,2.0
21470000008,0,0,0,5,2,1,0,0,4,1.0
21480000002,0,0,0,5,3,1,0,0,6,1.0
21650000007,0,0,3,16,3,3,1,0,6,2.0
21650000011,0,1,0,8,3,2,0,0,5,1.0


add `acct` column to make easier the merging process ahead

In [13]:
fixtures_pivot.reset_index(inplace=True)

# Describe and clean the columns

Now we must describe each column by answering:

* Meaning
* Descriptive statistics or value counts
* Data type

There is no explicit document provided by HCAD explaining all the variables, but most are easy to guess for using their name.

## Fix column names
We would like the column names to be all lower case, with no spaces nor non-alphanumeric characters.

In [14]:
from src.data.utils import fix_column_names

In [15]:
fixtures_pivot.columns

Index(['acct', 'Fireplace: Masonry Firebrick', 'Fireplace: Metal Prefab',
       'Fixtures:  Addl', 'Fixtures:  Total', 'Room:  Bedroom',
       'Room:  Full Bath', 'Room:  Half Bath', 'Room:  Rec', 'Room:  Total',
       'Story Height Index'],
      dtype='object', name='type_dscr')

In [16]:
fixtures_pivot = fix_column_names(fixtures_pivot)

In [17]:
fixtures_pivot.columns

Index(['acct', 'fireplace_masonry_firebrick', 'fireplace_metal_prefab',
       'fixtures_addl', 'fixtures_total', 'room_bedroom', 'room_full_bath',
       'room_half_bath', 'room_rec', 'room_total', 'story_height_index'],
      dtype='object')

### Find duplicated rows

In [18]:
cond0 = fixtures_pivot.duplicated()
fixtures_pivot.loc[cond0, :]

Unnamed: 0,acct,fireplace_masonry_firebrick,fireplace_metal_prefab,fixtures_addl,fixtures_total,room_bedroom,room_full_bath,room_half_bath,room_rec,room_total,story_height_index


## Fixtures
The following property fixtures are represented by their total count:

1. fireplace_masonry_firebrick: Number of masonry firebrick fireplaces in the property.
2. fireplace_metal_prefab: Number of prefabricated metal fireplaces in the property.
3. fixtures_addl: Number of additional fixtures in the property.
4. fixtures_total: Total number fixtures in the property.
5. room_bedroom: Number of bedrooms in the property.
6. room_full_bath: Number of full baths in the property.
7. room_half_bath: Number of half baths in the property.
8. room_rec: Number of recreational rooms in the property.
9. room_total: Total number of rooms in the property.
10. story_height_index: Number of stories in the property.

Confusingly, not all the values are integers for most of the fixtures, so I decided to leave them as floats and not try to cast them as categories.

### fireplace_masonry_firebrick

In [19]:
from src.data.utils import fix_fixtures

In [20]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'fireplace_masonry_firebrick')

Values less than 0: 0


The new data type is: float32
Head:
0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
Name: fireplace_masonry_firebrick, dtype: float32


fireplace_masonry_firebrick normalized value counts: First 20
0.0    0.787326
1.0    0.204959
2.0    0.006546
3.0    0.000953
4.0    0.000169
5.0    0.000032
6.0    0.000013
7.0    0.000002
Name: fireplace_masonry_firebrick, dtype: float64


The number of null values is: 0




### fireplace_metal_prefab

In [21]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'fireplace_metal_prefab')

Values less than 0: 0


The new data type is: float32
Head:
0    0.0
1    0.0
2    0.0
3    0.0
4    1.0
Name: fireplace_metal_prefab, dtype: float32


fireplace_metal_prefab normalized value counts: First 20
0.0    0.650146
1.0    0.340515
2.0    0.007925
3.0    0.001127
4.0    0.000221
5.0    0.000049
6.0    0.000011
7.0    0.000004
8.0    0.000001
Name: fireplace_metal_prefab, dtype: float64


The number of null values is: 0




### fixtures_addl

In [22]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'fixtures_addl')

Values less than 0: 0


The new data type is: float32
Head:
0    2.0
1    0.0
2    0.0
3    3.0
4    0.0
Name: fixtures_addl, dtype: float32


fixtures_addl normalized value counts: First 20
0.0     0.515790
2.0     0.174383
3.0     0.124497
1.0     0.090731
4.0     0.061421
5.0     0.018742
6.0     0.008203
7.0     0.002792
8.0     0.001763
10.0    0.000567
9.0     0.000554
11.0    0.000163
12.0    0.000117
13.0    0.000065
15.0    0.000056
14.0    0.000052
16.0    0.000038
17.0    0.000016
18.0    0.000016
20.0    0.000014
Name: fixtures_addl, dtype: float64


The number of null values is: 0




### fixtures_total

In [23]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'fixtures_total')

Values less than 0: 0


The new data type is: float32
Head:
0    12.0
1     5.0
2     5.0
3    16.0
4     8.0
Name: fixtures_total, dtype: float32


fixtures_total normalized value counts: First 20
8.0     0.257450
5.0     0.138424
10.0    0.130209
12.0    0.081768
11.0    0.069250
13.0    0.066917
9.0     0.049375
7.0     0.048893
14.0    0.031181
16.0    0.030800
17.0    0.027129
15.0    0.024251
18.0    0.013494
19.0    0.007609
20.0    0.005854
21.0    0.003991
22.0    0.002839
23.0    0.001964
24.0    0.001608
6.0     0.001538
Name: fixtures_total, dtype: float64


The number of null values is: 0




### room_bedroom

In [24]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'room_bedroom')

Values less than 0: 0


The new data type is: float32
Head:
0    3.0
1    2.0
2    3.0
3    3.0
4    3.0
Name: room_bedroom, dtype: float32


room_bedroom normalized value counts: First 20
3.0     0.540799
4.0     0.314901
2.0     0.092008
5.0     0.044068
6.0     0.003792
1.0     0.003705
7.0     0.000457
8.0     0.000135
0.0     0.000070
9.0     0.000042
10.0    0.000022
14.0    0.000001
Name: room_bedroom, dtype: float64


The number of null values is: 0




### room_full_bath

In [25]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'room_full_bath')

Values less than 0: 0


The new data type is: float32
Head:
0    2.0
1    1.0
2    1.0
3    3.0
4    2.0
Name: room_full_bath, dtype: float32


room_full_bath normalized value counts: First 20
2.0     0.654748
1.0     0.189543
3.0     0.127473
4.0     0.021527
5.0     0.004872
6.0     0.001246
7.0     0.000339
0.0     0.000147
8.0     0.000072
9.0     0.000026
10.0    0.000005
11.0    0.000001
Name: room_full_bath, dtype: float64


The number of null values is: 0




### room_half_bath

In [26]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'room_half_bath')

Values less than 0: 0


The new data type is: float32
Head:
0    1.0
1    0.0
2    0.0
3    1.0
4    0.0
Name: room_half_bath, dtype: float32


room_half_bath normalized value counts: First 20
0.0    0.589179
1.0    0.398573
2.0    0.011202
3.0    0.000903
4.0    0.000109
5.0    0.000031
6.0    0.000002
7.0    0.000001
Name: room_half_bath, dtype: float64


The number of null values is: 0




### room_rec

In [27]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'room_rec')

Values less than 0: 0


The new data type is: float32
Head:
0    1.0
1    0.0
2    0.0
3    0.0
4    0.0
Name: room_rec, dtype: float32


room_rec normalized value counts: First 20
0.0     0.611029
1.0     0.329514
2.0     0.046319
3.0     0.011530
4.0     0.001218
5.0     0.000230
6.0     0.000123
7.0     0.000019
8.0     0.000009
9.0     0.000004
10.0    0.000002
12.0    0.000001
13.0    0.000001
11.0    0.000001
Name: room_rec, dtype: float64


The number of null values is: 0




### room_total

In [28]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'room_total')

Values less than 0: 0


The new data type is: float32
Head:
0    8.0
1    4.0
2    6.0
3    6.0
4    5.0
Name: room_total, dtype: float32


room_total normalized value counts: First 20
6.0     0.305940
7.0     0.211937
5.0     0.161923
8.0     0.157356
9.0     0.073413
10.0    0.038381
4.0     0.028815
11.0    0.011950
12.0    0.004756
3.0     0.002273
13.0    0.001284
14.0    0.000675
2.0     0.000388
15.0    0.000284
0.0     0.000240
16.0    0.000143
1.0     0.000084
17.0    0.000051
18.0    0.000039
20.0    0.000033
Name: room_total, dtype: float64


The number of null values is: 0




### story_height_index

In [29]:
fixtures_pivot = fix_fixtures(fixtures_pivot, 'story_height_index')

Values less than 0: 0


The new data type is: float32
Head:
0    2.0
1    1.0
2    1.0
3    2.0
4    1.0
Name: story_height_index, dtype: float32


story_height_index normalized value counts: First 20
1.0    0.644237
2.0    0.329187
3.0    0.017760
1.5    0.006352
4.0    0.002195
2.5    0.000200
5.0    0.000042
0.0    0.000021
3.5    0.000006
Name: story_height_index, dtype: float64


The number of null values is: 0




# Export fixtures_pivot

In [30]:
save_fn = ROOT_DIR / 'data/raw/2016/fixtures_comps.pickle'
save_pickle(fixtures_pivot, save_fn)