# Manipulate `Records` after advancing year

Sample run adding 10% to wages (`e00200`) in tax year 2018.

## Setup

### Imports

In [1]:
import taxcalc as tc
import pandas as pd
import numpy as np
import os

In [2]:
tc.__version__

'0.20.1'

## Generate data

### Load data

Generate a set of normal CPS records for 2018 using `Calculator`, then extract the dataframe.

In [3]:
base_calc = tc.Calculator(records=tc.Records.cps_constructor(), 
                          policy=tc.Policy(), verbose=False)
base_calc.advance_to_year(2018)
base_calc.calc_all()  # Is this necessary?

In [4]:
cps_raw_cols = list(pd.read_csv(
    os.path.join(tc.Records.CUR_PATH, 'cps.csv.gz')).columns)

In [5]:
base_calc.data_year

2014

In [6]:
df = base_calc.dataframe(cps_raw_cols + ['aftertax_income'])

### Manipulate

In [7]:
df['e00200_orig'] = df.e00200

In [8]:
df['e00200'] = df.e00200 * 1.1
df['e00200p'] = df.e00200p * 1.1
df['e00200s'] = df.e00200s * 1.1

### Clean data

Fix `e00900` pending https://github.com/open-source-economics/Tax-Calculator/issues/2024.

In [9]:
df['e00900'] = df.e00900p + df.e00900s

## Calculate

Run Tax-Calculator.

Adapt [`cps_constructor` code](https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/records.py#L187).

In [10]:
recs = tc.Records(data=df, 
                  start_year=2018,
                  weights=tc.Records.CPS_WEIGHTS_FILENAME,
                  adjust_ratios=tc.Records.CPS_RATIOS_FILENAME,
                  benefits=tc.Records.CPS_BENEFITS_FILENAME)

In [11]:
calc = tc.Calculator(records=recs, policy=tc.Policy())
calc.calc_all()

You loaded data for 2018.
Your data include the following unused variables that will be ignored:
  aftertax_income
  e00200_orig
Tax-Calculator startup automatically extrapolated your data to 2018.


## Verify

Fields should match df.

In [12]:
calc_df = calc.dataframe(cps_raw_cols + ['aftertax_income'])

In [13]:
df.e00200.equals(calc_df.e00200)

True

### Compare to baseline

After-tax income should only increase.

In [14]:
calc_df['e00200_precalc'] = df.e00200
calc_df['aftertax_income_precalc'] = df.aftertax_income
calc_df['e00200_orig'] = df.e00200_orig

In [15]:
diff = df[cps_raw_cols].iloc[2] - calc_df[cps_raw_cols].iloc[2]
full_diff = df.iloc[2] - calc_df.iloc[2]

In [16]:
calc_df.FLPDYR[0]

2018.0

In [17]:
df.FLPDYR[0]

2014.0

In [18]:
cps_raw_cols_no_flpdyr = cps_raw_cols.copy()
cps_raw_cols_no_flpdyr.remove('FLPDYR')

In [19]:
df[cps_raw_cols_no_flpdyr].equals(calc_df[cps_raw_cols_no_flpdyr])

True

No records have `aftertax_income` fall.

In [20]:
calc_df[(calc_df.aftertax_income - calc_df.aftertax_income_precalc) 
        < -0.01].shape

(0, 71)

In [21]:
calc_df[['e00200_orig', 'e00200', 'aftertax_income_precalc', 
         'aftertax_income']].head()

Unnamed: 0,e00200_orig,e00200,aftertax_income_precalc,aftertax_income
0,48543.437088,53397.780797,42278.741719,46179.206889
1,16884.522625,18572.974888,20460.856644,22020.142309
2,0.0,0.0,93093.497901,93093.497901
3,23216.073763,25537.681139,25068.333532,26609.184348
4,0.0,0.0,41630.21429,41630.21429


#### Aggregate change

As expected, the increase to `aftertax_income` is less than the increase to `e00200`.

In [22]:
(calc_df.e00200.sum() - calc_df.e00200_orig.sum()) / 1e9

3.9379521997057267

In [23]:
(calc_df.aftertax_income.sum() - calc_df.aftertax_income_precalc.sum()) / 1e9

2.6958270729730454

In [24]:
calc_df[['e00200_orig', 'e00200', 'aftertax_income_precalc', 
         'aftertax_income']].sum()

e00200_orig                3.937952e+10
e00200                     4.331747e+10
aftertax_income_precalc    5.956881e+10
aftertax_income            6.226464e+10
dtype: float64