# Weighting in taxcalc_helpers

## Setup

In [1]:
import numpy as np
import pandas as pd

import taxcalc as tc
import microdf as mdf

In [2]:
tc.__version__

'3.0.0'

## Load data

Start with a `DataFrame` with `nu18` and `XTOT`, and also calculate `XTOT_m`.

In [3]:
df = mdf.calc_df(group_vars=["nu18"], metric_vars=["XTOT"])
df.columns

Index(['s006', 'other_ben', 'snap_ben', 'aftertax_income', 'mcaid_ben',
       'mcare_ben', 'ssi_ben', 'e02300', 'nu18', 'expanded_income',
       'housing_ben', 'vet_ben', 'wic_ben', 'e02400', 'tanf_ben', 'XTOT',
       'market_income', 'bens', 'tax', 's006_m', 'XTOT_m'],
      dtype='object')

From this we can calculate the number of people and tax units by the tax unit's number of children.

In [4]:
df.groupby("nu18")[["s006_m", "XTOT_m"]].sum()

Unnamed: 0_level_0,s006_m,XTOT_m
nu18,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,152.988772,209.816367
1.0,22.688253,54.11585
2.0,18.859945,68.880292
3.0,7.438481,34.795527
4.0,2.371111,13.539261
5.0,0.744276,5.015182
6.0,0.216158,1.688063
7.0,0.090332,0.790239
8.0,0.026501,0.258552
9.0,0.012238,0.13432


What if we also want to calculate the total number of *children* by the tax unit's number of children?

For this we can use `add_weighted_metrics`, the function called within `calc_df`.

In [5]:
mdf.add_weighted_metrics(df, ["nu18"])

Now we can do the same thing as before, with the new `nu18_m` column.

In [6]:
df.groupby("nu18")[["nu18_m"]].sum()

Unnamed: 0_level_0,nu18_m
nu18,Unnamed: 1_level_1
0.0,0.0
1.0,22.688253
2.0,37.719889
3.0,22.315444
4.0,9.484444
5.0,3.721381
6.0,1.296949
7.0,0.632325
8.0,0.212008
9.0,0.110139


We can also calculate weighted sums without adding the weighted metric.

In [7]:
total_children = mdf.weighted_sum(df, "nu18", "s006")
# Fix this decimal.
"Total children: " + str(round(total_children / 1e6)) + "M."

'Total children: 98M.'

We can also calculate the weighted mean and median.

In [8]:
mdf.weighted_mean(df, "nu18", "s006")

0.4782626894263673

In [9]:
mdf.weighted_median(df, "nu18", "s006")

TypeError: weighted_quantile() missing 1 required positional argument: 'quantiles'

We can also look at more quantiles.

*Note that weighted quantiles have a different interface.*

In [None]:
decile_bounds = np.arange(0, 1.1, 0.1)
deciles = mdf.weighted_quantile(df, "nu18", "s006", decile_bounds)
pd.DataFrame(deciles, index=decile_bounds)