# Final Report: Child Food Insecurity Predictions

## Introduction

Food insecurity is a huge problem for the United States. More than 1 in 7 families experienced food insecurity at some point in 2012 (Hunger in America). Children who facing hunger are more likely to struggle in school, experience developmental impairments, and have more social and behavior problems (Feeding America). Some studies have also shown that child food insecurity and paradoxically lead to child obesity (Olson). However, there are many resources available to help combat food insecurities. There are food banks, meal programs, and grocery programs. For children specifically, there are school breakfast, lunch, and snack programs in place across the country. In the report I aim to look at what factors contribute most to child food insecurity.

## What is food insecurity?
The United States Department of Agriculture (USDA) defines food insecurity as follows:
> Food-insecure households were unable, at times during the year, to provide adequate food for one or more household members because the household lacked money and other resources for food.

The data on food insecurity was collected through a survery:
> The food security survey asks one adult respondent in each household a series of questions about experiences and behaviors that indicate food insecurity. The food security status of the household was assessed based on the number of food-insecure conditions reported (such as being unable to afford balanced meals, cutting the size of meals because of too little money for food, or being hungry because of too little money for food).

## Data Exploration

To begin exploring the data, let's look at the distribution of child food insecurity in the United States by counties. Hover over the counties below to get more information about food insecurity in that county.

![map](choropleth_full_usa.png)

Below is the distribution of child vs. adult food insecurity. Both have a fairly normal distribution with children food insecurity generally higher than adult food insecurity. The average food insecurity for children is 21% while it is 14% for adults.

![insec_distribution](insec.png)

Looking now at income and poverty rates, I believed these would be the biggest predictors of food insecurity. Below are histograms of median income and poverty rates. The average median income in this dataset is about $48,600. The average poverty rate is about 16%.

The graph on the right plots these two measurements against each other. Clearly there is a correlation between median household income, and poverty rate.

![income](income.png)

Next I began to look at things I suspected would affect food insecurity and child food insecurity. Below are plots of median household income, poverty rate, and percent white population against adult food insecurity. Although the project mainly looks at child food insecurity, I was curious to see if there would be differences. 

I found that there is a negative correlation between median household income and adult food insecurity, and a positive correlation for poverty rate. 

The plot on the right shows the percentage of the population that's white versus the food insecurity rating. There does seem to be some correlation, but the most noticable feature of this graph is how there is more of a spread of data the less the percentage is. That spread wasn't consistent when plotting child food insecurity.

![adult](adult.png)

Below are the same plots but for child food insecurity. Note that for all three plots, the shapes are roughly the same, but the plots are all slightly higher. This is because there is more child food insecurity than adult food insecurity.

Something that I found interesting with these plots is the differences in the far right plot for children versus adults. In the plot above, as the percent of white population decreases, the spread of food insecurity increases. However, for children food insecurity, the spread doesn't increase. This leads me to believe that there is more of a correlation between child food insecurity and percent white population than there is for adult food insecurity. 

![child](child.png)

We can also look at how the distribution of various government food programs relates to the children food insecurity levels. This dataset includes data on SNAP and WIC programs.

In general places with more people using resources have higher food insecurity. This correlation isn't the strongest, you can see that the lunch program and summer food program ones are almost horizontal lines. But for the in school breakfast, WIC, and SNAP plots, the increase is more apparent.

![snap](snap.png)

## Methods
I used K Nearest Neighbors (KNN), Decision Trees, Neural Networks, & ADA Boost to test our hypothesis. I chose these methods because of their well documented use cases and wealth of a knowledge base to draw from.  For determining various hyper parameters I first tried all the possible hyper parameters and then narrowed them down based on which ones were drawing a higher prediction rather than a lower prediction and then, based on the documentation, tried to draw conclusions from why that might be. For instance, with Neural Networks, the most complex model, I had to do a lot of fine tuning to the ‘hidden layer sizes’ to even get near an answer. At first I thought that an incredibly large number of hidden layers would be beneficial, however, after reading into the documentation found that hidden layers for the type of data I have (mostly continuous with low variability) would skew the predictions to be more like an average rather than an actual prediction. 

With the parameters determined I decided to narrow down to a single model, the one with the highest accuracy score - Decision Trees. I hypothesize this is due to the large amount of predictive data that is mostly likely related to one another in some way. The dataset has several years of the same variable and several instances of a variable in a different form such as average cost of pretzels & cheese compared to the average cost of milk. They serve similar purposes and might have helped the tree become constructed in a more consistent way.

## Results
Through testing the methods and tuning various hyperparameters, there was one model that stood out above the rest.  Measuring the accuracy of each model using a sklearn function, in every test Decision Trees far outperformed ADA Boost, Neural Networks, and K Nearest Neighbors.  Initially Decision Trees produced about a 70% accuracy score when running on a set of chosen features, but when run against all features of our training dataset the model performed at a whopping 99.54% accuracy against our testing dataset.  I was shocked by these results as I had not anticipated such a strong relationship between the 118 features chosen out of the 300+ feature dataset, and set out to ensure that the results were not simply chance.  

To validate the accuracy of the model even further, I decided to vary the size of the training, validation, and testing datasets.  I initially set aside 70% of the data for training and validation, and 30% for testing, and decided to test the model using 90%/10%, 80%/20%, and 60%/40%.  Surprisingly, at each split size the models ran at approximately the same accuracy, ranging from 97% to 99% accuracy.  At the end of the day, I happened to chose the best split size the first time I ran the model, so kept the split at 70%/30%.

To mention the results of the other models, Neural Networks came in second place scoring 63.8%, ADA Boost came in third scoring 36.15%, and K Nearest Neighbors came in fourth scoring 16.46%.  No matter the hyperparameters or split size, none of these models scored higher than 70%, so I decided to leave them behind.

Below is a plot of the actual food insecurity scores versus the predicted ones. Note that since food insecurity is an integer score, the data is mostly stacked on top of each other, making it appear like there is less data than there is.

![accuracy](accuracy.png)

## Projecting Forward
I concluded the analysis with very strong results - 99.54% accuracy using Decision Trees.  Though the model produced incredibly accurate results against the training data, I think this means very little about future preditions.  Because the data is hiscorical and is updated multiple years after the data is recorded, I do not feel very certain that the predictions will match what is happening in the real world.  There are so many changes in policy, demographics, income, etc. every year that historical data like this is not very useful for predictions.  With these findings, though I do not feel strongly about the ability to make predictions about future food insecurity, I do feel strongly that this data can be utilized by policymakers to analyze where resources are needed nationwide so policies can be written to assist counties in need.  I hope that this analysis will shed light on some of the issues counties across the United States are having with food insecurity, and hope policymakers will take into consideration how strongly linked variables like poverty rates, distance to a grocery store, and access to fresh vegetables are to food insecurity.