
Data is from [this](https://docs.google.com/spreadsheets/d/14G6CjF6NblTGf6kkQclpXp3XZ3D4Nkw1I92DB4fOjXo/pub?gid=0#) URL on [gapminder](https://www.gapminder.org/data/).

The data is taken from [gapminder](https://www.gapminder.org/data/). The data represents daily available food supply, defined [here](https://docs.google.com/spreadsheets/d/14G6CjF6NblTGf6kkQclpXp3XZ3D4Nkw1I92DB4fOjXo/pub?gid=0#):

> The total supply of food available in a country, divided by the population and 365 (the number of days in the year).

This represents the amount of available food per person in a country. There are 151 countries and 47 years, ranging from 1961 to 2007. 

An analysis of this data is visible in [this](https://github.com/def-mycroft/gapminder_data/blob/master/analysis_food_kilocalories_country_inequality.ipynb) notebook.


# Cleaning Data

Operations done in a database:

* Read into a dataframe
* Removed of countries which don't have observations dating back to 1961
* A series for the average and one standard deviation the values are created 
* The dataframe and series are written to a file

In [185]:
import pandas as pd
import numpy as np
import pickle_funcs as pk
import matplotlib.pyplot as plt
%matplotlib inline

In [186]:
data = pd.read_csv('food_kilocalories.csv')

In [187]:
# Reset the first column name to 'Country'
columns = list(data.columns)
columns[0] = 'Country'
data.columns = columns

In [188]:
# Drop any row which contains a null. Could add more countries by dropping columns to a certain point.
data.dropna(axis=0, inplace=True)

In [189]:
# Change the index to the 'Country' column
data.index = data['Country']
data.drop('Country', axis=1, inplace=True) # And drop the 'Country' column.

# Creating Average and Stdev Columns

In [190]:
# Create two lists, one for years and one for values.
mean_values = []
mean_years = []
for col in range(len(data.columns)):
    mean = (data.iloc[:, col].mean())
    year = (data.columns[col])
    mean_values.append(mean)
    mean_years.append(year)

In [191]:
# Create two lists, one for years and one for values.
stdev_values = []
stdev_years = []
for col in range(len(data.columns)):
    deviation = (data.iloc[:, col].std())
    year = (data.columns[col])
    stdev_values.append(deviation)
    stdev_years.append(year)

In [192]:
# Create a Series from the years and values
averages = pd.Series(mean_values, index=mean_years, name='Average')

In [193]:
# Create a Series from the years and values
stdevs = pd.Series(stdev_values, index=stdev_years, name='Stdev')

# Write Objects to File

In [194]:
data_list = {
    'data':data,
    'averages':averages,
    'stdevs':stdevs
}

In [195]:
# Write object file with pickle
pk.pickle_object(data_list, 'data', test=False)

In [196]:
# Load the main data table back in...
loaded_dict = pk.unpickle_object('data')['data']

In [197]:
data = np.array(data)

In [198]:
test_results = data != loaded_data

In [199]:
# ...and test to make sure that they are equivalent (should equal zero if correct):
test_results.sum()

0