In [1]:
import pandas as pd
import numpy as np
from ipfn import ipfn # you will likely need to run 'pip install ipfn' in your terminal

Create raw data. This example will be a 2 dimensional table of states and ages.

In [2]:
df = pd.DataFrame({'age_1':[6,9,3], 
                   'age_2':[8,4,1], 
                   'state':['MN','WI','MA']})

df.head()

Unnamed: 0,age_1,age_2,state
0,6,8,MN
1,9,4,WI
2,3,1,MA


We need to melt this table from wide to long inorder to best work with this algorithm. Same data, just a different table format.

In [3]:
df_melt = df.melt(id_vars = 'state', value_vars=['age_1','age_2'])
df_melt.rename(columns= {'value':'total', 'variable':'age'},inplace = True)
df_melt

Unnamed: 0,state,age,total
0,MN,age_1,6
1,WI,age_1,9
2,MA,age_1,3
3,MN,age_2,8
4,WI,age_2,4
5,MA,age_2,1


At this point we need to collect the current marginal totals and save them as a variable. This step is equivalent to summing down each row and each column in the original table (prior to melting).

In [4]:
state_mt = df_melt.groupby('state')['total'].sum() 
age_mt = df_melt.groupby('age')['total'].sum()

At this point we need to create new target marginal totals. These numbers are the numbers that our original table will adjust to. It is important to note that the target marginal totals for each dimension need to equal eachother [i.e. (36+42+27) = (62+43)]

In [5]:
# state target marginal totals
state_mt.loc['MA'] = 36
state_mt.loc['MN'] = 42
state_mt.loc['WI'] = 27

# age target marginal totals
age_mt.loc['age_1'] = 62
age_mt.loc['age_2'] = 43

Pass arguments and dimensions into IPF algorithm

In [6]:
aggregates = [state_mt, age_mt] # targets 
dimensions = [['state'], ['age']]

IPF = ipfn.ipfn(df_melt, aggregates, dimensions)
adjusted_df = IPF.iteration()

return your adjusted values!

In [7]:
adjusted_df

Unnamed: 0,age,state,total
0,age_1,MN,17.245493
1,age_1,WI,18.261914
2,age_1,MA,26.492593
3,age_2,MN,24.754892
4,age_2,WI,8.737968
5,age_2,MA,9.50714
