# Family Budget Calculator

Goal: Measure monthly income needed by a family to maintain a reasonable standard of living.
Family size options: 1-2 adults, 0-4 children (view assumptions about families in [source](https://www.epi.org/publication/family-budget-calculator-documentation/))

Budget Components:
- Housing
- Food
- Transportation
- Child care
- Health care
- Taxes
- Other necessities



In [94]:
import numpy as np
import pandas as pd
from datetime import date

today = date.today()

In [2]:
!java --version

java 15.0.2 2021-01-19
Java(TM) SE Runtime Environment (build 15.0.2+7-27)
Java HotSpot(TM) 64-Bit Server VM (build 15.0.2+7-27, mixed mode, sharing)


## Food
Data sources:
- National average food costs [Official USDA Food Plans: Cost of Food Reports](https://www.fns.usda.gov/cnpp/usda-food-plans-cost-food-reports-monthly-reports)  
Note: "USDA suggests making the following adjustments to account for differences in returns to scale:
 - One-person family: add 20 percent
 - Two-person family: add 10 percent
 - Three-person family: add 5 percent
 - Five-person family: subtract 5 percent
 - Six-person family: subtract 5 percent"
  
- County-level multipliers [Feeding America, Map the Meal Gap](https://map.feedingamerica.org/)

In [115]:
households = ['Individuals', 'Families']

key = 'Age-gender groups'
types = ['Thrifty', 'Low-cost', 'Moderate', 'Liberal']
cols = [key, *types]

In [122]:
import requests
import tabula
import pandas as pd
from dateutil.relativedelta import relativedelta

lastDate = today
increment = relativedelta(months = 1)

print('Scanning for latest data...')

for i in range(10):
    print('>', str.ljust(lastDate.strftime('%B %Y'), 16), end = '')
    
    file_end = f'{lastDate.strftime("%b")}{today.year}'
    url = f'https://fns-prod.azureedge.net/sites/default/files/media/file/CostofFood{file_end}.pdf'
    req = requests.get(url, stream=True)
    
    print('status', req.status_code)
    
    if req.status_code == 200:
        break
    else:
        lastDate -= increment

print(f'Read "{url}"...', end='')
tables = tabula.read_pdf(url,
                        pages = 1,
                        multiple_tables = False,
                        guess = True,
                        stream = True,
                        pandas_options = {'header':1},
                       )
print('done')
df = tables[0]

Scanning for latest data...
> March 2021      status 404
> February 2021   status 404
> January 2021    status 200
Read "https://fns-prod.azureedge.net/sites/default/files/media/file/CostofFoodJan2021.pdf"...done


In [120]:
import re

assert key in df.columns[0]

pattern = re.compile(r'[^(.$\d)]')

na = df[df.isna().any(axis = 1)].iloc[:,0]
na = na.to_numpy()

weekly = dict()
monthly = dict()
inKey = ''
newKey = ''
keys = df.iloc[:,0].dropna()

for i, row in zip(keys.index, keys.values):
    
    flag = row in na
    ignore = any([household in row for household in households])
    
    if not ignore:
        
        if flag: # get demographic
            inKey += ' ' + row.encode('ascii','ignore').decode() #strip non-ascii chars
        
        else:
            if inKey != '': # create df from demographic
                newKey = inKey.strip(' :4')
                weekly[newKey] = pd.DataFrame(columns = types)
                monthly[newKey] = pd.DataFrame(columns = types)
                inKey = ''
            
            string = df.iloc[i,1:].to_string(header=False, index=False)
            string = string.strip(' $')
            values = re.sub(pattern, '', string).split('$')
            
            assert (len(values) == 2*len(types))
            
            newWeekly = pd.DataFrame(index = [row], columns = types, data = [values[:len(types)]])
            newMonthly = pd.DataFrame(index = [row], columns = types, data = [values[len(types):]])
            
            weekly[newKey] = pd.concat([weekly[newKey], newWeekly])
            monthly[newKey] = pd.concat([monthly[newKey], newMonthly])