# Find the comparables: real_acc.txt

The file `real_acc.txt` contains important property information like number total appraised value (the target on this exercise), neighborhood, school district, economic group, land value, and more. Let's load this file and grab a subset with the important columns to continue our study.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from pathlib import Path
import pickle

import pandas as pd

from src.definitions import ROOT_DIR
from src.data.utils import Table, save_pickle

In [None]:
real_acct_fn = ROOT_DIR / 'data/external/2016/Real_acct_owner/real_acct.txt'
assert real_acct_fn.exists()

In [None]:
real_acct = Table(real_acct_fn, '2016')

In [None]:
real_acct_df = real_acct.get_df()

# Load accounts of interest
Let's remove the account numbers that don't meet free-standing single-family home criteria that we found while processing the `building_res.txt` file.

In [None]:
one_bld_in_acct_fn = ROOT_DIR / 'data/raw/2016/one_bld_in_acct.pickle'

In [None]:
with open(one_bld_in_acct_fn, 'rb') as f:
    one_bld_in_acct = pickle.load(f)

In [None]:
cond0 = real_acct_df['acct'].isin(one_bld_in_acct)
real_acct_df = real_acct_df.loc[cond0, :]

In [None]:
real_acct_df.head()

In [None]:
real_acct_df.columns

# Select columns
The columns above show a lot of value information along property groups that might come in handy when predicting the appraised value. Now let's get a slice of some of the important columns.

In [None]:
cols = [
    'acct',
    'site_addr_3', # Zip
    'school_dist',
    'Neighborhood_Code',
    'Market_Area_1_Dscr',
    'Market_Area_2_Dscr',
    'center_code',
    'bld_ar',
    'land_ar',
    'acreage',
    'land_val',
    'tot_appr_val', # Target
    'prior_land_val',
    'prior_tot_appr_val',
    'new_own_dt',  # New owner date
]

In [None]:
real_acct_df = real_acct_df.loc[:, cols]

In [None]:
real_acct_df.head()

Double check if the there is only one account number per row

In [None]:
assert real_acct_df['acct'].is_unique

# Export real_acct

In [None]:
save_fn = ROOT_DIR / 'data/raw/2016/real_acct_comps.pickle'
save_pickle(real_acct_df, save_fn)