# Consumer Expenditure Trends 2006-2020 
## Ben Warzel
### 5/8/2025

*Note: generative AI was used in this project to assist in data analysis*

### Introduction
I chose this particular set of consumer expenditure data because it covers the majority of my childhood and adolescence, and I was curious to see the impacts of events like the 2008 housing crisis on the average American's spending habits. The rise and prevalence of credit cards into the 2010s is something else that might affect how people spend. 

According to the Bureau of Labor Statistics, by 2005 consumers were spending 97% of their income. During the Great Recession, that number dropped to 92% due to higher mortage debt and lower home equity. People had to try to save more of their money, which we can see thanks to this graph of St Louis household savings over the years. The BLS also has data on employment statistics for the same time period, which we will examine later.

The main categories I examined here are Housing, Healthcare, Food, and Transportation. I also attempt to answer questions about extraneous spending; how the consumption of alcohol and tobacco changes over time for example, or the section labeled "Entertainment" which includes things like going to the movies or concerts. We would assume that as the housing market struggled, people began spending less on non-essential items. 

### Methods
The original datasets were obtained from the BLS website, where I was able to download documents containing detailed line-by-line breakdowns of spending by year. I chose to select only the documents containing information from 2006-2012 and 2013-2020. These sheets were loosely formatted with a larger section ("Food") contianing some amount of rows beneath it with more specific items like "Meat", which could also contain subsets like "Pork". I decided that for my purposes I was not interested in the particulars of the spending, just the bigger picture. Because of this my first step was to clean the datasets to contain only the most important information. To do this, I first manually edited the sheets to remove non-monetary data. This included things like "Average number in consumer unit" (meaning household size) or the percent distribution of various demographic information of the reference individual. This information may be interesting but it will not serve a dataset that we want to manipulate in Python. After doing this, I took my version of the dataset and began working with Claude AI to further clean and organize the data. I instructed it to look for the Items at the beginning of a row break, as each main section is separated this way, and only include that first term. This worked really well and I was left with the categories:

1	Average Annual Expenditures

2	Food

3	Alcoholic beverages

4	Housing

5	Apparel and services

6	Transportation

7	Healthcare

8	Entertainment

9	Personal care products and services

10	Reading

11	Education

12	Tobacco products and smoking supplies

13	Miscellaneous

14	Cash contributions

15	Personal insurance and pensions

16	Sources of income and personal taxes:

17	Money income before taxes a/

18	Personal taxes (missing values not imputed) a/

19	Income after taxes a/

20	Addenda:

21	Net change in total assets and liabilities

22	Other financial information:

23	Income before taxes

24	Personal taxes (contains some imputed values)

25	Income after taxes

A few of these items have no data associated with them, but as far as I could tell it did not affect the computation of the results, so I decided I was okay with keeping some blank rows. 

After doing this for both sets of data and double checking to make sure all the categories lined up, I used Python to merge the two cleaned csv files. Next, in order to make the dataframe a bit more readable for Python, I used the command '.long' to transform the dataframe into a "long" format as opposed to a "wide" one; meaning each item for each year is given its own row, as oppposed to four columns of years by the amount of items. Somehow this also removed the rows with no value, which was a wonderful bonus. Then I began exploring the data and looking for trends, as well as using Claude to help me examine individual items or specific areas. 



### Main Analysis
To begin with, there is some interesting information not contained in the datasets I created. Luckily the kind of information it is does not require in-depth processing to understand. Over all 14 years, the biggest gap between the male-female ration was 46:54 in 2006. In the following years it stayed at 47:53 into and through the 2010s with a few 48:52 seemingly at random. The distribution by race is much less balanced; almost all of the years on record here the White population representing around 87% of the reference group, with black or latino only being about 13%. 

Another extremely relevant statistic recorded in these documents was the percentage of homeowners surveyed who had a mortgage. The housing crisis of 2007-2008 was due to the "bursting" of the "housing bubble", essentially meaning that housing prices rose steeply and quickly which led to many more people missing payments, which in turn hurt the housing market and caused prices to plummet. Because of this, there was supposedly less desire or need for mortages. Indeed, we can see this reflected in a chart of the percentages of homeowners and renters:
<div align="center">
  <img
    src="../assets/img/homeowners_trends.png"
    width="60%"
    style="border: 2px solid"
    alt="Homeowner Trends"
  />
  <figcaption style="font-style: italic">
    Percent of Homeowners with mortage, without mortage, or renting by Year
  </figcaption>
</div>
Not only does the amount of homeowners paying a mortage go down, but renters actually overtake the homeowners with mortgages in 2015. Another really interesting bit of information is that the percent of people who had completed college rose steadily from 59% in 2006 to 69% in 2020.

As for the actual expenditures, I made a few broad charts showing the items with the most budget share and the percent increase or decrease each year.
<div align="center">
  <img
    src="../assets/img/categories-budgetshare.png"
    width="60%"
    style="border: 2px solid"
    alt="Top categories"
  />
  <figcaption style="font-style: italic">
    The categories with the largest budget share.
  </figcaption>
</div>

This shows very clearly that, although the housing market took a hit after 2008, housing prices quickly began to climb again, and now we are spending more than ever before on housing. The other four categories Transportation, Apparel, Insurance, and Healthcare also show a gradual rise over the years, aside from minor dips in 2009-2010 and 2019-2020 in the Transportation and Apparel categories. 

<div align="center">
  <img
    src="../assets/img/percent-change.png"
    width="60%"
    style="border: 2px solid"
    alt="% Change in budget share"
  />
  <figcaption style="font-style: italic">
    The change in the percent of total budget share for top 5 categories.
  </figcaption>
</div>

This shows the actual percent change by year for the main 6 categories. With this chart it is easier to see that people began spending about 18% less on alcohol following the Great Recession, while they actually spent about 18% *more* on tobacco. 