# Motivation
The goal of this project is to gather insight into the SNAP Benefits program while showcasing my data wrangling, statistic calculation, data visualization, and data analysis skills.

# Initial Questions
To guide my project I have decided to answer the following questions.
1. How has the national SNAP participation changed over time?
2. How has SNAP participation changed in each state over time?
3. How has the national average SNAP benefit changed over time?
4. How has the average SNAP benefit in each state changed over time?
5. Which states provide the largest SNAP benefit?
6. Which states have the highest SNAP participation
7. How did each state's SNAP participation change during 2008-2013?
8. How did each state's average SNAP benefit change during 2008-2013? 

# Data Sources
1. SNAP Benefits: **ZIP File: FY 1969 - FY 2021** [https://www.fns.usda.gov/pd/supplemental-nutrition-assistance-program-snap](https://www.fns.usda.gov/pd/supplemental-nutrition-assistance-program-snap) 
2. State Population Data: [Annual Estimates of the Population for the U.S. and States, and for Puerto Rico](https://fred.stlouisfed.org/release?rid=118)
3. US Population Data Monthly: [Population](https://fred.stlouisfed.org/series/POPTHM)
4. US Food Prices by Region [BLS](https://data.bls.gov/cgi-bin/srgate)
    * Series Names and IDS
        * Food at home in Northeast urban, all urban consumers, not seasonally adjusted, CUUR0100SAF11
        * Food at home in Midwest urban, all urban consumers, not seasonally adjusted, CUUR0200SAF11
        * Food at home in South urban, all urban consumers, not seasonally adjusted, CUUR0300SAF11
        * Food at home in West urban, all urban consumers, not seasonally adjusted, CUUR0400SAF11
5. US National Food Prices [Consumer Price Index for All Urban Consumers: Food in U.S. City Average](https://fred.stlouisfed.org/series/CPIUFDNS)
6. Food Regions [https://www.bls.gov/cpi/regional-resources.htm](https://www.bls.gov/cpi/regional-resources.htm)

# Goal of Statistic Calculation
The goal of the data statistic calculation notebook is to calculate the measures necessary to answer the above questions.

In [43]:
import pandas as pd
import numpy as np
import xlwings as xw

### Statistic Calculation for Question 1
To answer this question I would like two columns of data: one, the monthly national SNAP participation amount and two, the monthly national SNAP participation rate (i.e. SNAP participation adjusted for population)  
  
I begin by importing data from the Data by Measure excel file.

In [44]:
book = xw.Book(r"archive\Clean Data\Data by Measure.xlsx")
sheet = book.sheets['US Data']

In [45]:
US_data = sheet.range('A1').options(pd.DataFrame,expand = 'table').value

The below DataFrame contains all of the information I need to answer question 1.

In [46]:
nat_par_rates = pd.DataFrame(np.array([US_data.loc[:,'Individual Participation'],US_data.loc[:,'Individual Participation']/US_data.Population]).transpose(),index = US_data.index,columns = ['Individual Participation','Individual Participation Rate'])
nat_par_rates

Unnamed: 0_level_0,Individual Participation,Individual Participation Rate
dates,Unnamed: 1_level_1,Unnamed: 2_level_1
1968-07-01,2472921.0,0.012315
1968-08-01,2613458.0,0.013002
1968-09-01,2607571.0,0.012961
1968-10-01,2657938.0,0.013199
1968-11-01,2665809.0,0.013227
...,...,...
2019-05-01,37381135.0,0.113874
2019-06-01,37532817.0,0.114289
2019-07-01,37602856.0,0.114447
2019-08-01,37777171.0,0.114919


### Statistic Calculation for Question 2
To answer this question I would like the monthly individual participation rates for each state. I need the participation rates rather than the participation because the states have widely different populations. To add more context I will also include the US monthly individual participation rate. I was only able to collect annual rather than monthly data for the state populations, so I will need to adjust my participation data accordingly.

In [47]:
pop_sheet = book.sheets['State and National Populations']
par_sheet = book.sheets['Individual Participation']
pop_df = pop_sheet.range('A1').options(pd.DataFrame,expand = 'table').value
par_df = par_sheet.range('A1').options(pd.DataFrame,expand = 'table').value
pop_df.index = pd.to_datetime(pop_df.index)
par_df.index = pd.to_datetime(par_df.index)
par_df_time_index = par_df.index #save this for later
pop_df.insert(0,'year',pop_df.index.year)
par_df.insert(0,'year',par_df.index.year)
pop_df.set_index('year',inplace = True)
par_df.set_index('year',inplace = True)
pop_df = pop_df.iloc[:-1,:]
state_par_rates = par_df/pop_df
state_par_rates.index = par_df_time_index

The below DataFrame contains all of the information necessary to answer question 2.

In [48]:
state_par_rates

Unnamed: 0_level_0,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,...,Tennessee,Texas,United States,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1988-10-01,0.106272,0.044861,0.070156,0.094595,0.059584,0.061937,0.033254,0.044419,0.052532,0.074614,...,0.101253,0.092565,0.075274,0.053542,0.057727,0.053997,0.066035,0.137301,0.061539,0.053206
1988-11-01,0.105912,0.045806,0.070397,0.096128,0.060356,0.061781,0.033394,0.044486,0.053280,0.074800,...,0.102235,0.093276,0.075225,0.054160,0.060362,0.053582,0.066195,0.137422,0.060581,0.054266
1988-12-01,0.109011,0.046734,0.071632,0.097862,0.061410,0.065620,0.033453,0.046448,0.053697,0.075785,...,0.103456,0.095164,0.076347,0.054195,0.062176,0.055564,0.067671,0.142243,0.060976,0.057265
1989-01-01,0.108960,0.046846,0.071446,0.097628,0.060168,0.065863,0.033989,0.044840,0.052092,0.075604,...,0.103635,0.095205,0.076053,0.057138,0.060508,0.055179,0.067771,0.145950,0.060956,0.060424
1989-02-01,0.107387,0.048434,0.070947,0.100096,0.060545,0.066107,0.034412,0.044832,0.051427,0.075661,...,0.104810,0.096929,0.076016,0.056884,0.063497,0.054682,0.068615,0.146032,0.060398,0.061439
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-05-01,0.146087,0.118369,0.106030,0.114040,0.092204,0.077467,0.102202,0.130592,0.129322,0.130849,...,0.130467,0.113564,0.113784,0.053344,0.109378,0.081849,0.107104,0.168373,0.104907,0.044455
2019-06-01,0.146204,0.119011,0.107492,0.114295,0.095741,0.077327,0.101956,0.128762,0.129773,0.130397,...,0.129542,0.115205,0.114246,0.052926,0.108335,0.081857,0.106901,0.169527,0.104848,0.044539
2019-07-01,0.146399,0.118054,0.108439,0.114803,0.099270,0.077212,0.102100,0.127462,0.129339,0.128507,...,0.130214,0.116057,0.114459,0.052557,0.107582,0.081645,0.106324,0.168834,0.104699,0.044003
2019-08-01,0.146965,0.117482,0.109409,0.115721,0.101115,0.076515,0.102272,0.128231,0.128898,0.129116,...,0.129426,0.116492,0.114990,0.052686,0.107676,0.081789,0.106388,0.170248,0.104742,0.043696


### Statistic Calculation for Question 3
To answer this question it is best to control the benefit per person by the cost of food. In doing this, I am able to control for food price inflation. Note that the numbers resulting from this calculation do not represent anything meaningful because Food Prices is an index value representing the cost of a basket of food. The important insights from this data are drawn from time series comparisons of values within the data.

In [49]:
national_benefit_per_person_inflation_adjusted = pd.DataFrame(US_data.loc[:,'Benefit Per Person'].astype(float)/US_data.loc[:,'Food Prices'],index = US_data.index,columns=['Benefit Per Person Controlled For Inflation'])
national_benefit_per_person_inflation_adjusted

Unnamed: 0_level_0,Benefit Per Person Controlled For Inflation
dates,Unnamed: 1_level_1
1968-07-01,0.181936
1968-08-01,0.187086
1968-09-01,0.179031
1968-10-01,0.174320
1968-11-01,0.173965
...,...
2019-05-01,0.502830
2019-06-01,0.502710
2019-07-01,0.502744
2019-08-01,0.502201


### Statistic Calculation for Question 4
To answer this question I will follow a similar process as the prior question. I will need to adjust each states benefit per person by the regional cost of food in order to provide a meaningful answer.

In [50]:
regions = book.sheets['Food Prices by Region']
region_prices = regions.range('A1').options(pd.DataFrame,expand = 'table').value
state_benefit_sheet = book.sheets['Benefit Per Person']
state_benefit = state_benefit_sheet.range('A1').options(pd.DataFrame,expand = 'table').value
state_benefit = state_benefit.loc[:,state_benefit.columns != 'United States']

In [51]:
region_conversion = {"Alabama":'South',"Alaska":'West',"Arizona":'West',
                     "Arkansas":'South',"California":'West',"Colorado":'West',
  "Connecticut":'Northeast',"Delaware":'South',"Florida":'South',"Georgia":'South',
                     "Hawaii":'West',"Idaho":'West',"Illinois":'Midwest',
  "Indiana":'Midwest',"Iowa":'Midwest',"Kansas":'Midwest',"Kentucky":'South',
                     "Louisiana":'South',"Maine":'Northeast',"Maryland":'South',
  "Massachusetts":'Northeast',"Michigan":'Midwest',"Minnesota":'Midwest',
                     "Mississippi":'South',"Missouri":'Midwest',"Montana":'West',
  "Nebraska":'Midwest',"Nevada":'West',"New Hampshire":'Northeast',"New Jersey":'Northeast',
                     "New Mexico":'West',"New York":'Northeast',
  "North Carolina":'South',"North Dakota":'Midwest',"Ohio":'Midwest',"Oklahoma":'South',
                     "Oregon":'West',"Pennsylvania":'Northeast',
  "Rhode Island":'Northeast',"South Carolina":'South',"South Dakota":'Midwest',
                     "Tennessee":'South',"Texas":'South',"Utah":'West',
  "Vermont":'Northeast',"Virginia":'South',"Washington":'West',
                     "West Virginia":'South',"Wisconsin":'Midwest',"Wyoming":'West'}

In [52]:
state_benefit_inflation_adjusted = pd.DataFrame()
for col in state_benefit.columns:
    benefit_series = state_benefit.loc[:,col]
    region_price_series = region_prices.loc[:,region_conversion[col]]
    sbpi = benefit_series/region_price_series
    state_benefit_inflation_adjusted.loc[:,col] = sbpi

In [53]:
state_benefit_inflation_adjusted

Unnamed: 0_level_0,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1988-10-01,0.453970,0.627880,0.474007,0.398285,0.296042,0.453212,0.325785,0.433128,0.489817,0.443243,...,0.444269,0.443872,0.481010,0.463358,0.355595,0.439905,0.429939,0.465692,0.381731,0.452340
1988-11-01,0.447398,0.648056,0.473426,0.407302,0.298189,0.452393,0.328615,0.434614,0.484221,0.442275,...,0.436735,0.440660,0.481525,0.457442,0.359746,0.437418,0.421431,0.461669,0.380552,0.453740
1988-12-01,0.452724,0.660924,0.476421,0.410653,0.298139,0.439816,0.326554,0.463546,0.484553,0.444396,...,0.444113,0.442281,0.482592,0.467244,0.356221,0.439819,0.417502,0.462525,0.378691,0.451699
1989-01-01,0.440135,0.642966,0.457381,0.391886,0.290708,0.430486,0.311841,0.459579,0.472937,0.434518,...,0.433204,0.435216,0.465767,0.438088,0.346053,0.432215,0.405757,0.456764,0.370084,0.441654
1989-02-01,0.432637,0.652892,0.449705,0.403465,0.288714,0.425949,0.316367,0.454508,0.472058,0.429079,...,0.423105,0.433600,0.463543,0.441331,0.347646,0.430282,0.409183,0.451614,0.368438,0.436454
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-05-01,0.491189,0.668719,0.477079,0.450596,0.526654,0.472167,0.525247,0.473888,0.491037,0.513175,...,0.551146,0.501243,0.484596,0.452036,0.481924,0.492253,0.481379,0.448200,0.455650,0.456846
2019-06-01,0.495032,0.675601,0.481212,0.459534,0.519407,0.474040,0.525094,0.477822,0.494545,0.511876,...,0.550313,0.503550,0.491325,0.453091,0.481500,0.496498,0.480391,0.449594,0.457162,0.457105
2019-07-01,0.493506,0.683189,0.483430,0.455275,0.508650,0.472597,0.527203,0.472837,0.495327,0.507430,...,0.552364,0.503203,0.490975,0.453981,0.479035,0.492140,0.480347,0.452911,0.493313,0.449310
2019-08-01,0.496077,0.682219,0.483976,0.458173,0.505104,0.474337,0.525932,0.476715,0.495884,0.513448,...,0.550383,0.510645,0.486658,0.459016,0.481412,0.495466,0.479988,0.453560,0.473580,0.454018


### Statistic Calculation for Question 5 and 6
Questions 5 and 6 can be answered using the dataframes from the prior two statistic calculations.

### Statistic Calculation for Question 7
To calculate the information necessary for this question, I will use the ``par_rates`` DataFrame.

In [54]:
years_of_interest = [2008,2009,2010,2011,2012,2013]

In [55]:
fin_cris_chg_par = state_par_rates.loc[state_par_rates.index.year.isin(years_of_interest),:].iloc[[0,-1],:].pct_change().iloc[-1,:]

In [56]:
fin_cris_chg_par = fin_cris_chg_par.loc[fin_cris_chg_par.index != 'United States']
fin_cris_chg_par = pd.DataFrame(list(fin_cris_chg_par),index = fin_cris_chg_par.index,columns = ['Change in Participation Rate'])

Change the index to state abbreviations for graphing purposes.

In [57]:
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

In [58]:
abbrev_index = [us_state_abbrev[i] for i in fin_cris_chg_par.index]
fin_cris_chg_par.index = abbrev_index
fin_cris_chg_par.index.name = 'State'

In [59]:
fin_cris_chg_par

Unnamed: 0_level_0,Change in Participation Rate
State,Unnamed: 1_level_1
AL,0.571535
AK,0.452108
AZ,0.655128
AR,0.272347
CA,0.902731
CO,0.87844
CT,0.931755
DE,0.957574
FL,1.391667
GA,1.004737


### Statistic Calculation for Question 8


In [60]:
fin_cris_chg_ben = state_benefit_inflation_adjusted.loc[state_benefit_inflation_adjusted.index.year.isin(years_of_interest),:].iloc[[0,-1],:].pct_change().iloc[-1,:].to_frame()
fin_cris_chg_ben.columns = ['Change in Benefit Per Person']
abbrev_index = [us_state_abbrev[i] for i in fin_cris_chg_ben.index]
fin_cris_chg_ben.index = abbrev_index
fin_cris_chg_ben.index.name = 'State'
fin_cris_chg_ben

Unnamed: 0_level_0,Change in Benefit Per Person
State,Unnamed: 1_level_1
AL,0.107554
AK,0.075597
AZ,0.041352
AR,0.0359
CA,0.153922
CO,0.070931
CT,0.105794
DE,0.083103
FL,0.131198
GA,-0.059378


### Write Statistics to file

In [61]:
statistics = [nat_par_rates,state_par_rates,national_benefit_per_person_inflation_adjusted,
              state_benefit_inflation_adjusted,fin_cris_chg_par,fin_cris_chg_ben]
sheet_names = ['Nat Par Pop Adj','Sta Par Pop Adj','Nat Ben Per Infl','Sta Ben Per Infl',
               'Fin Crisis Par Chg','Fin Crisis Ben Chg']
sheet_titles = np.array([['Sheet Descriptions'],['National Participation Rate Population Adjusted'],['State Participation Rate Population Adjusted'],
                ['National Benefit Per Person Food Inflation Adjusted'],['State Benefit Per Person Food Inflation Adjusted'],
               ['Financial Crisis State Change in Participation Rate Population Adjusted'],
               ['Financial Crisis State Change in Benefit Per Person Food Inflation Adjusted']])

In [62]:
book = xw.Book(r"archive\Clean Data\Statistics.xlsx")
description_sheet = book.sheets[0]
description_sheet.name = 'Data Description'

In [63]:
description_sheet.range('A1').value = sheet_titles

In [64]:
for i,stat in enumerate(statistics):
    sheet = book.sheets(sheet_names[i])
    sheet.range('A1').options(pd.DataFrame,expand = 'table').value = stat