# Counting households using Tax-Calculator

Using the `h_seq` field to aggregate by household.

*Data: CPS  |  Tax year: 2017  |  Author: Max Ghenis  |  Date run: 2018-04-08*

## Setup

### Imports

In [1]:
import taxcalc as tc
import pandas as pd
import numpy as np

In [2]:
tc.__version__

'0.18.0'

### Settings

In [3]:
# Show two decimals in tables.
pd.set_option('precision', 1)

## Data

In [4]:
recs = tc.Records.cps_constructor()
calc = tc.Calculator(records=recs, policy=tc.Policy(), verbose=False)
calc.advance_to_year(2017)
calc.calc_all()

In [5]:
tu = calc.dataframe(['s006', 'h_seq', 'expanded_income'])
tu['filers'] = 1
tu['expanded_income_b'] = tu.expanded_income * tu.s006 / 1e9

### Create household-level dataset

In [6]:
hh = tu.groupby(['h_seq']).sum()

## Check aggregates

### Verify expanded income total

In [7]:
tu.expanded_income_b.sum()

13222.433843295543

In [8]:
hh.expanded_income_b.sum()

13222.433843295543

### Count households

Two approaches.

#### Normalize each household's weight by number of filers

Is this right? For each household, assign weight as:

$weight = \frac{\sum_{filers}weight}{n_{filers}}$

In [9]:
hh['hh_s006'] = hh.s006 / hh.filers

#### Match to Census household count

https://fred.stlouisfed.org/series/TTLHH estimates 126,224,000 total households in 2017.

Multiply this by a household's total *share* of `s006`.

In [10]:
hh['s006_share'] = hh.s006 / hh.s006.sum()
TOTAL_HHS = 126224e3
hh['hh_s006_census'] = hh.s006_share * TOTAL_HHS

#### Compare

In millions.

In [11]:
hh[['hh_s006', 'hh_s006_census']].sum() / 1e6

hh_s006            52.9
hh_s006_census    126.2
dtype: float64