# Data Wrangling

This first capstone is intended to forecast the consumption of red meat in the United States over the next 10 years. We will look for patterns in potential social, economic, and environmental indicators that could be predictors of consumption. 

In this notebook, we will inspect and clean our datasets for this project. The way that FAO stores their data, they have two separate sheets for data before 2013 and after. Since we have the same number of columns in each, our first step will be to combine these into one.

In [1]:
import pandas as pd
fao08 = pd.read_csv('faostat_08.csv')
fao14 = pd.read_csv('faostat_14.csv')

#First we concat our dataframes together, since they have the same column structure. Then we reset the index and drop the old one.
fao_all = pd.concat([fao08,fao14])
fao_all.reset_index(inplace=True)
fao_all.drop(labels='index', axis=1, inplace=True)
fao_all[:10]

Unnamed: 0,Domain,Area,Element,Item,Year,Unit,Value
0,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2008,1000 tonnes,12163.0
1,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2009,1000 tonnes,11891.0
2,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2010,1000 tonnes,12046.0
3,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2011,1000 tonnes,11921.0
4,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2012,1000 tonnes,11811.0
5,"Food Balances (-2013, old methodology and popu...",United States of America,Production,Bovine Meat,2013,1000 tonnes,11719.0
6,"Food Balances (-2013, old methodology and popu...",United States of America,Import Quantity,Bovine Meat,2008,1000 tonnes,1235.0
7,"Food Balances (-2013, old methodology and popu...",United States of America,Import Quantity,Bovine Meat,2009,1000 tonnes,1277.0
8,"Food Balances (-2013, old methodology and popu...",United States of America,Import Quantity,Bovine Meat,2010,1000 tonnes,1135.0
9,"Food Balances (-2013, old methodology and popu...",United States of America,Import Quantity,Bovine Meat,2011,1000 tonnes,1017.0


For the sake of ease, let's see if we need all the columns we downloaded. 'Domain' seems ambiguous - let's look at the unique values.

In [2]:
fao_all['Domain'].unique()

array(['Food Balances (-2013, old methodology and population)',
       'Food Balances (2014-)'], dtype=object)

In [3]:
fao_all.drop(labels='Domain', axis=1, inplace=True)

In [8]:
fao_all.Element.value_counts(), fao_all.Item.value_counts(), fao_all.Unit.value_counts()

(Other uses (non-food)    55
 Export Quantity          55
 Production               55
 Stock Variation          55
 Losses                   55
 Residuals                55
 Import Quantity          55
 Tourist consumption      55
 Processing               55
 Name: Element, dtype: int64,
 Pigmeat               99
 Poultry Meat          99
 Meat, Other           99
 Bovine Meat           99
 Mutton & Goat Meat    99
 Name: Item, dtype: int64,
 1000 tonnes    311
 Name: Unit, dtype: int64)

In [None]:
import matplotlib.pyplot as plt
plt.scatter(fao_all['Year'], fao_all['Value'])
plt.show()