# Family Budget Calculator

Goal: Measure monthly income needed by a family to maintain a reasonable standard of living.
Family size options: 1-2 adults, 0-4 children (view assumptions about families in [source](https://www.epi.org/publication/family-budget-calculator-documentation/))

Budget Components:
- Housing
- Food
- Transportation
- Child care
- Health care
- Taxes
- Other necessities



In [1]:
import numpy as np
import pandas as pd
from datetime import date

today = date.today()

In [2]:
!java --version

java 15.0.2 2021-01-19
Java(TM) SE Runtime Environment (build 15.0.2+7-27)
Java HotSpot(TM) 64-Bit Server VM (build 15.0.2+7-27, mixed mode, sharing)


## Food
Data sources:
- National average food costs [Official USDA Food Plans: Cost of Food Reports](https://www.fns.usda.gov/cnpp/usda-food-plans-cost-food-reports-monthly-reports)  
Note: "USDA suggests making the following adjustments to account for differences in returns to scale:
 - One-person family: add 20 percent
 - Two-person family: add 10 percent
 - Three-person family: add 5 percent
 - Five-person family: subtract 5 percent
 - Six-person family: subtract 5 percent"
  
- County-level multipliers [Feeding America, Map the Meal Gap](https://map.feedingamerica.org/)

### USDA Food Plans: Cost of Food

In [3]:
groups = ['child', 'male', 'female', 'family of 2', 'family of 4']
types = ['thrifty', 'low cost', 'moderate', 'liberal']

# relative areas on pdf of weekly data
# if error occurs it's most likely because these areas don't fit the read pdf
areas = [ 
    #t  #l  #b  #r
    [25, 24, 31, 59], # child
    [34, 24, 42, 59], # male (adult)
    [43, 24, 50, 59], # female (adult)
    [55, 24, 60, 59], # family (2)
    [65, 24, 70, 59], # family (4)
]

#add monthly
width = 35
n = len(areas)
for area in areas[:n]:
    areas.append([area[0], area[1] + width, area[2], area[3] + width])

#get keys of subtables
areas += [
    [25, 8, 31, 24], # child
    [34, 8, 42, 24], # male (adult)
    [43, 8, 50, 24], # female (adult)
    [54, 8, 59, 24], # family (2)
    [65, 8, 70, 24], # family (4)
]

assert len(areas) == 3*n

In [4]:
import requests
import tabula
import pandas as pd
from dateutil.relativedelta import relativedelta

lastDate = today
increment = relativedelta(months = 1)

print('Scanning for latest data...')

for i in range(10):
    print('>', str.ljust(lastDate.strftime('%B %Y'), 16), end = '')
    
    file_end = f'{lastDate.strftime("%b")}{today.year}'
    url = f'https://fns-prod.azureedge.net/sites/default/files/media/file/CostofFood{file_end}.pdf'
    req = requests.get(url, stream=True)
    
    print('status', req.status_code)
    
    if req.status_code == 200:
        break
    else:
        lastDate -= increment

print(f'Read "{url}"...', end='')

raw = tabula.read_pdf('CostofFoodJan2021.pdf',
                         pages = 1,
                         area = areas,
                         relative_area = True,
                         pandas_options = {
                             'header' : None,
                             'dtype' : float,
                         })
print('done')

Scanning for latest data...
> March 2021      status 404
> February 2021   status 404
> January 2021    status 200
Read "https://fns-prod.azureedge.net/sites/default/files/media/file/CostofFoodJan2021.pdf"...done


In [5]:
import re

#add columns and indices to dataframes in list - dirty
def parse (dfs:list, keys:list) -> list:
    return [
        df.set_index(key[0].values) \
          .set_axis(types, axis = 1) \
          .applymap(lambda x : float(re.sub(r'^[.\D]', '', x))) \
        for key, df in zip(keys, dfs)]
        
#key raw data
foodCosts = {
    'week' : dict(zip(groups, parse(raw[:n], raw[-n:]))),
    'month' : dict(zip(groups, parse(raw[n:-n], raw[-n:]))),
}

def getRowFromAge(df:pd.DataFrame, age:int) -> pd.Series:
    for idx in df.index:
        ages = re.findall(r'\d+', idx)
        ages = list(map(int, ages))
        
        assert len(ages) <= 2
        
        if ages[0] <= age <= ages[-1]: #age in range of index
            return df.loc[idx]

USDA suggested scaling relative to household size

In [6]:
adjustment = {
    1 : 1.2,
    2 : 1.1,
    3 : 1.05,
    4 : 1,
    5 : 0.95,
    6 : 0.95,
}

# note: max age 71
def getFoodCost(people:list, timeframe:str = 'week', plan = 'low cost') -> pd.Series:

    assert timeframe in ('week', 'month', 'year')
    
    df = foodCosts['week' if timeframe == 'week' else 'month']
    total = 0
    for group, age in people:
        total += getRowFromAge(df[group], age)
    
    if timeframe == 'year':
        total *= 12
    
    total.name = f'Food Costs ({timeframe}ly)'
    
    return total * adjustment[len(people)]

Test food calculator with sample 3-person family

In [7]:
getFoodCost([['male',30],['female',30],['child', 5]], timeframe = 'year')

thrifty      6051.78
low cost     7731.36
moderate     9589.86
liberal     11895.66
Name: Food Costs (yearly), dtype: float64