Sources
MEAT FOOD SUPPLY QUANTITY
Variable description	Average supply of meat across the population, measured in kilograms per person per year.

Variable time span	1961 – 2018
Data published by	United Nations Food and Agricultural Organization (FAO)
Link	http://www.fao.org/faostat/en/#data/FBS
This dataset is sourced from the UN Food and Agriculture Organization (FAO) and combines data from its Food Balance Sheets into a complete series from 1961 to 2018. 

In the original FAO dataset, food supply data from 1961 to 2013 is stored under its 'old methodology' variable set. Data from 2014 to 2018 is stored under its 'new methodology' for food balance sheets.

I have combined this data to give a complete series from 1961 onwards. No transformations have been made to the original data.

Food supply is defined as food available for human consumption. At country level, it is calculated as the food remaining for human use after deduction of all non-food utilizations.
For example using the formula food = production + imports + stock withdrawals − exports − industrial use − animal feed – seed – wastage − additions to stock). 
Wastage includes losses of usable products occurring along distribution chains from farm gate (or port of import) up to the retail level. However, such values do not include consumption-level waste (i.e. retail, restaurant and household waste) and therefore overestimates the average amount of food actually consumed.(FAO Source)



In [7]:
# import packages that we will use
import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib
plt.style.use('ggplot')
from matplotlib.pyplot import figure

%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12,8)

pd.options.mode.chained_assignment = None


The 2 datasets named Meat_Data and Meat_Historic are uploaded on MongoDb Atlas.
MongoDB is one of the most used NoSQL databases due to its speed, efficiency and ease of use. 
MongoDB is also accessible with all the latest programming languages ​​such as Java, Python and Node.js.

We chose the MongoDB Atlas service because we can test it for free by choosing a server type with shared RAM and CPU and with a storage capacity of 512MB which is not really bad for development applications! It also allows to have a MongoDB environment to test without the need for any type of installation and management at the infrastructure level.

In [8]:
df = pd.read_csv('Meat_Data.csv')

print(df.shape)
print(df.dtypes)

# select numeric columns
df_numeric = df.select_dtypes(include=[np.number])
numeric_cols = df_numeric.columns.values
print(numeric_cols)

# select non numeric columns
df_non_numeric = df.select_dtypes(exclude=[np.number])
non_numeric_cols = df_non_numeric.columns.values
print(non_numeric_cols)

(292839, 12)
Area Code         int64
Area             object
Item Code         int64
Item             object
Element Code      int64
Element          object
Unit             object
Y2014           float64
Y2015           float64
Y2016           float64
Y2017           float64
Y2018           float64
dtype: object
['Area Code' 'Item Code' 'Element Code' 'Y2014' 'Y2015' 'Y2016' 'Y2017'
 'Y2018']
['Area' 'Item' 'Element' 'Unit']


From these results, we learn that the dataset has 292,839 rows and 12 columns. We also identify whether the features are numeric or categorical variables. These are all useful information.

In [11]:
df = pd.read_csv('Meat_Historic.csv')

print(df.shape)
print(df.dtypes)

# select numeric columns
df_numeric = df.select_dtypes(include=[np.number])
numeric_cols = df_numeric.columns.values
print(numeric_cols)

# select non numeric columns
df_non_numeric = df.select_dtypes(exclude=[np.number])
non_numeric_cols = df_non_numeric.columns.values
print(non_numeric_cols)

(238560, 60)
Area Code         int64
Area             object
Item Code         int64
Item             object
Element Code      int64
Element          object
Unit             object
Y1961           float64
Y1962           float64
Y1963           float64
Y1964           float64
Y1965           float64
Y1966           float64
Y1967           float64
Y1968           float64
Y1969           float64
Y1970           float64
Y1971           float64
Y1972           float64
Y1973           float64
Y1974           float64
Y1975           float64
Y1976           float64
Y1977           float64
Y1978           float64
Y1979           float64
Y1980           float64
Y1981           float64
Y1982           float64
Y1983           float64
Y1984           float64
Y1985           float64
Y1986           float64
Y1987           float64
Y1988           float64
Y1989           float64
Y1990           float64
Y1991           float64
Y1992           float64
Y1993           float64
Y1994           float64
Y19

From these results, we learn that the second dataset has 238, 560 rows and 60 columns. We also identify whether the features are numeric or categorical variables. These are all useful information.

In [3]:
import pymongo
import pandas as pd
from pymongo import MongoClient

client = pymongo.MongoClient("mongodb+srv://mongo:mongo@cluster0.laxah.mongodb.net/mynewdb?retryWrites=true&w=majority")
db = client.ermesa_db
print(db)
coll = db.meat_data
print(coll)

Database(MongoClient(host=['cluster0-shard-00-01.laxah.mongodb.net:27017', 'cluster0-shard-00-00.laxah.mongodb.net:27017', 'cluster0-shard-00-02.laxah.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-i83ime-shard-0', ssl=True), 'ermesa_db')
Collection(Database(MongoClient(host=['cluster0-shard-00-01.laxah.mongodb.net:27017', 'cluster0-shard-00-00.laxah.mongodb.net:27017', 'cluster0-shard-00-02.laxah.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-i83ime-shard-0', ssl=True), 'ermesa_db'), 'meat_data')


In [4]:
# import the data in MongoDB
import pandas as pd

df = pd.read_csv('Meat_Data.csv')

coll.insert_many(df.apply(lambda x: x.to_dict(), axis=1).to_list())

<pymongo.results.InsertManyResult at 0x25033b10940>

In [5]:
coll = db.meat_historic_data
print(coll)
df = pd.read_csv('Meat_Historic.csv')
coll.insert_many(df.apply(lambda x: x.to_dict(), axis=1).to_list())

Collection(Database(MongoClient(host=['cluster0-shard-00-01.laxah.mongodb.net:27017', 'cluster0-shard-00-00.laxah.mongodb.net:27017', 'cluster0-shard-00-02.laxah.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-i83ime-shard-0', ssl=True), 'ermesa_db'), 'meat_historic_data')


<pymongo.results.InsertManyResult at 0x25033b11c00>

If we use MongoDB Compass or another tool to connect to the collection you just created, we'll see that MongoDB also generated an _id value in each document for you. This is because MongoDB requires every document to have a unique _id, but you didn't provide one.

In [9]:
coll = db.meat_data
df1 = pd.DataFrame(list(coll.find({}, {'_id':0})))
df1.head()

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Unit,Y2014,Y2015,Y2016,Y2017,Y2018
0,2,Afghanistan,2501,Population,511,Total Population - Both sexes,1000 persons,33371.0,34414.0,35383.0,36296.0,37172.0
1,2,Afghanistan,2501,Population,5301,Domestic supply quantity,1000 tonnes,0.0,0.0,0.0,0.0,0.0
2,2,Afghanistan,2901,Grand Total,664,Food supply (kcal/capita/day),kcal/capita/day,2095.0,2044.0,2034.0,2051.0,2040.0
3,2,Afghanistan,2901,Grand Total,674,Protein supply quantity (g/capita/day),g/capita/day,58.18,56.29,56.13,56.16,55.52
4,2,Afghanistan,2901,Grand Total,684,Fat supply quantity (g/capita/day),g/capita/day,31.23,31.4,31.0,31.39,31.91


In [10]:
coll = db.meat_historic_data
df2 = pd.DataFrame(list(coll.find({}, {'_id':0})))
df2.head()

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Unit,Y1961,Y1962,Y1963,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,2,Afghanistan,2501,Population,511,Total Population - Both sexes,1000 persons,8954.0,9142.0,9340.0,...,24019.0,24861.0,25631.0,26349.0,27032.0,27708.0,28398.0,29105.0,29825.0,30552.0
1,2,Afghanistan,2901,Grand Total,664,Food supply (kcal/capita/day),kcal/capita/day,2999.0,2917.0,2698.0,...,1967.0,1948.0,1966.0,2046.0,2041.0,2081.0,2104.0,2107.0,2100.0,2090.0
2,2,Afghanistan,2901,Grand Total,674,Protein supply quantity (g/capita/day),g/capita/day,84.91,82.98,77.12,...,55.24,53.51,53.46,56.0,56.96,57.79,58.14,58.91,58.91,58.25
3,2,Afghanistan,2901,Grand Total,684,Fat supply quantity (g/capita/day),g/capita/day,37.51,37.61,38.57,...,34.95,36.75,31.13,32.09,29.72,30.72,33.88,33.08,33.37,33.52
4,2,Afghanistan,2903,Vegetal Products,664,Food supply (kcal/capita/day),kcal/capita/day,2752.0,2672.0,2438.0,...,1726.0,1715.0,1762.0,1839.0,1831.0,1871.0,1888.0,1891.0,1883.0,1873.0
