# Overview, motivation, and research questions #

As an experienced home cook, I frequently find myself thinking about the nutritional content of the food that I cook, and how I can make informed decisions to nourish my body. As such, I'm interested in exploring the factors that influence household food purchasing, food accesss, and overall dietary health. I believe that this will be an interesting and useful topic to cover, due to the alarming obesity epidemic in the US, and its relevance for US citizens in general. From a human-centered persepctive, this will shed light on food inequality across the United States and the difficulties that underserved communities face regarding access to healthy and affordable food.

To further learn about this issue, I hope to study the following research questions:
* What is the relationship between food insecurity and dietary quality in the United States?
* How does this relationship vary by demographic and socioeconomic characteristics?
* What factors affect food choices and purchasing behavior?

# Data selected for analysis #

#### USDA National Household Food Acquisition and Purchase Survey (FoodAPS) ####

I have selected the FoodAPS dataset for my analysis, as it is relevant to my research questions regarding food access and purchasing habits. It is a comprehensive survey of over 4,800 US households that contains information on food acquisition, dietary quality, food insecurity, health outcomes, food spending, demographics, and socioeconomic status. This makes it suitable for this project, as I can compare different survey categories and results to understand the impacts of food access and any important correlations that can be seen across these categories. However, I still need to respect the confidentiality of respondents, by avoiding listing individual and possibly identifiable data. I'll also take care when working with demographic data to emphasize the limitations of survey results (correlation vs. causation) and avoid reinforcing biases regarding food security in underserved communities.

The dataset can be found at the following link: https://www.ers.usda.gov/data-products/foodaps-national-household-food-acquisition-and-purchase-survey/foodaps-national-household-food-acquisition-and-purchase-survey/#Public-Use%20Data%20Files%20and%20Codebooks. Here, the data can be downloaded as a CSV along with any relevant documentation.

The data is publicly available on the USDA ERS website, and is released under the Creative Commons Attribution 4.0 International License.

# Background #

The relationship between food insecurity and dietary quality in the United States has been covered extensively, though I was unable to find any research that specifically used the FoodAPS dataset, hence why I chose it for this project. Past research such as a study by Cook and Frank found that food insecurity led to a unhealthier food choices--processed and high-fat foods instead of fruits and vegetables. Moreover, this effect was more common among traditionally underserved populations (low-income households, minorities, etc.). A similar study conducted by Seligman et al. found that food insecurity led to diseases such as hypertension and diabetes, as a result of high-sugar and high-sodium foods being more accessible than healthy options. I believe that the FoodAPS dataset is very relevant to this phenomenon and I hope to compare my results to these studies to further reinforce their results and potential reveal new patterns related to dietary health.



Sources: 

Cook, J. T., & Frank, D. A. (2008). Food security, poverty, and human development in the United States. Annals of the New York Academy of Sciences, 1136, 193–209. https://doi.org/10.1196/annals.1425.001

Seligman, H. K., Davis, T. C., Schillinger, D., & Wolf, M. S. (2010). Food insecurity is associated with hypoglycemia and poor diabetes self-management in a low-income sample with diabetes. Journal of health care for the poor and underserved, 21(4), 1227–1233. https://doi.org/10.1353/hpu.2010.0921

# Methodology #

To investigate my research questions regarding food inequality, I plan on using descriptive statistics:
* distribution
* mean/median
* range/standard deviation/variance

for my wider summarizations surrounding:
* demographics (household size, race, income)
* food expenditures
* dietary quality
* health status
* food access

I will also use regression analysis to explore the relationship between food insecurity and dietary quality through these variables, examining them for correlations. I will conduct this analysis using least squares regressions, which is commonly used with economic data.


To present my findings, I plan on mostly using scatterplots, as they are effective at visualizing relationships between multiple variables, and I will use color to show how dietary quality varies across different demographics. If appropriate, I may also include a few line graphs or bar charts. Finally, I'll present my regression results in a table as a way to communicate the significance of each variable in my analysis.

Overall, my rationale behind choosing these methods was the ensure that my project was balanced in multiple regards--analysis, presentation, and comprehension. Using both descriptive statistics (breadth) and regression analysis (depth) will give me flexibility in how I process the data, as the FoodAPS dataset is quite large and may suit one method over the other. Using a variety of visualizations of my data will aid in communicating my results quickly and accurately through the use of charts, while the regression table gives my research technical credibility due to a quantitatively backed analysis.

# Unknowns and dependencies #

My main unknown at this point in time is what *exactly* I want to focus on with my research. As the FoodAPS dataset is large and versatile, I've given myself a few options to follow but I hope to further explore the dataset before locking in my hypothesis/research questions. I also haven't scanned the data for bias (duplicated/incomplete/misleading/unrepresentative data) yet, which I will do before A5.

I will not need any datasets outside of FoodAPS or to contact FoodAPS researchers, so there will be no external pressure that affects my timeline. Overall, I'm confident that I will complete this project in the time allotted barring any extraordinary circumstances.

## Feedback ##


After peer reviewing my proposal with a classmate, I cut down and refined my research question candidates to the three listed, I added a discussion on bias to the unknowns, and I emphasized the relevance of this data in my introduction.