# Open Food Facts: the carbon “food-print” we do not eat

## Abstract
<i>Everything we do has a carbon footprint, and our diet is no exception. From growing, farming, processing and packaging our food, energy and organic resources are consumed and released, which reflects in the emission of greenhouse gases, like CO<sub>2</sub>. In our project, we analyze the processed foods industry - its manufacturing, product composition, and sales - for the main sources of carbon emissions, using the Open Food Facts dataset. We explain the carbon footprint repartition, starting on an understanding of the products, followed by the breakdown of production countries as well as point of sales and evaluating trends in diet composition, with a special focus on nutritionally high marked products in France and the UK. 

With this study, we want to provide a better understanding of the agri-food industry, and eventually help reducing carbon emissions.</i>

In this notebook, we are performingt the above analysis on the OpenFoodFacts database, which we pre-processed using the __Open Food Facts - Cleanse Data__ notebook in the main directory.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Abstract" data-toc-modified-id="Abstract-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Abstract</a></span></li><li><span><a href="#Import-cleansed-data" data-toc-modified-id="Import-cleansed-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Import cleansed data</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Set-column-data-types" data-toc-modified-id="Set-column-data-types-2.0.1"><span class="toc-item-num">2.0.1&nbsp;&nbsp;</span>Set column data types</a></span></li></ul></li></ul></li><li><span><a href="#Analyse-data" data-toc-modified-id="Analyse-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Analyse data</a></span><ul class="toc-item"><li><span><a href="#Production-/-manufacture-impact" data-toc-modified-id="Production-/-manufacture-impact-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Production / manufacture impact</a></span><ul class="toc-item"><li><span><a href="#Global-distribution-of-global-food-producers" data-toc-modified-id="Global-distribution-of-global-food-producers-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Global distribution of global food producers</a></span><ul class="toc-item"><li><span><a href="#Which-are-the-dominant-global-food-producers-and-manufacturers?" data-toc-modified-id="Which-are-the-dominant-global-food-producers-and-manufacturers?-3.1.1.1"><span class="toc-item-num">3.1.1.1&nbsp;&nbsp;</span>Which are the dominant global food producers and manufacturers?</a></span></li><li><span><a href="#How-is-this-distribution-impacted-when-we-consider-neutral-and-large-carbon-footprint-products?" data-toc-modified-id="How-is-this-distribution-impacted-when-we-consider-neutral-and-large-carbon-footprint-products?-3.1.1.2"><span class="toc-item-num">3.1.1.2&nbsp;&nbsp;</span>How is this distribution impacted when we consider neutral and large carbon footprint products?</a></span></li></ul></li><li><span><a href="#Case-study:-Palm-oil" data-toc-modified-id="Case-study:-Palm-oil-3.1.2"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Case study: Palm oil</a></span><ul class="toc-item"><li><span><a href="#Can-we-observe-any-trend-in-the-number-of-products-including-this-oil-(assuming-a-strong-dependence-between-date-the-product-was-added-to-the-database-and-data-the-product-was-invented)?" data-toc-modified-id="Can-we-observe-any-trend-in-the-number-of-products-including-this-oil-(assuming-a-strong-dependence-between-date-the-product-was-added-to-the-database-and-data-the-product-was-invented)?-3.1.2.1"><span class="toc-item-num">3.1.2.1&nbsp;&nbsp;</span>Can we observe any trend in the number of products including this oil (assuming a strong dependence between date the product was added to the database and data the product was invented)?</a></span></li><li><span><a href="#Which-country-use-palm-oils-for-production?" data-toc-modified-id="Which-country-use-palm-oils-for-production?-3.1.2.2"><span class="toc-item-num">3.1.2.2&nbsp;&nbsp;</span>Which country use palm oils for production?</a></span></li></ul></li></ul></li><li><span><a href="#Good-nutrition-impact" data-toc-modified-id="Good-nutrition-impact-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Good nutrition impact</a></span><ul class="toc-item"><li><span><a href="#High-nutrional-products" data-toc-modified-id="High-nutrional-products-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>High-nutrional products</a></span><ul class="toc-item"><li><span><a href="#Has-there-been-a-surge-in-high-graded-Products-in-the-UK-/-France-over-the-past-years?" data-toc-modified-id="Has-there-been-a-surge-in-high-graded-Products-in-the-UK-/-France-over-the-past-years?-3.2.1.1"><span class="toc-item-num">3.2.1.1&nbsp;&nbsp;</span>Has there been a surge in high graded Products in the UK / France over the past years?</a></span></li><li><span><a href="#What-are-those-products-made-of?" data-toc-modified-id="What-are-those-products-made-of?-3.2.1.2"><span class="toc-item-num">3.2.1.2&nbsp;&nbsp;</span>What are those products made of?</a></span></li><li><span><a href="#Where-do-these-product-come-from-and-where-are-they-manufactured?" data-toc-modified-id="Where-do-these-product-come-from-and-where-are-they-manufactured?-3.2.1.3"><span class="toc-item-num">3.2.1.3&nbsp;&nbsp;</span>Where do these product come from and where are they manufactured?</a></span></li><li><span><a href="#Where-are-those-products-sold?" data-toc-modified-id="Where-are-those-products-sold?-3.2.1.4"><span class="toc-item-num">3.2.1.4&nbsp;&nbsp;</span>Where are those products sold?</a></span></li></ul></li><li><span><a href="#Carbon-footprint-of-nutrionally-high-graded-products" data-toc-modified-id="Carbon-footprint-of-nutrionally-high-graded-products-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Carbon footprint of nutrionally-high graded products</a></span><ul class="toc-item"><li><span><a href="#Are-expansive-and-polluting-products-performing-more-poorly-in-the-nutrition-mark?" data-toc-modified-id="Are-expansive-and-polluting-products-performing-more-poorly-in-the-nutrition-mark?-3.2.2.1"><span class="toc-item-num">3.2.2.1&nbsp;&nbsp;</span>Are expansive and polluting products performing more poorly in the nutrition mark?</a></span></li><li><span><a href="#Can-we-establish-a-meaningful-correlation-between-these-product-and-the-carbon-footprint--or-an-estimated-price-(using-another-dataset-or-creating-our-own-with-web-scraping)?" data-toc-modified-id="Can-we-establish-a-meaningful-correlation-between-these-product-and-the-carbon-footprint--or-an-estimated-price-(using-another-dataset-or-creating-our-own-with-web-scraping)?-3.2.2.2"><span class="toc-item-num">3.2.2.2&nbsp;&nbsp;</span>Can we establish a meaningful correlation between these product and the carbon footprint  or an estimated price (using another dataset or creating our own with web scraping)?</a></span></li><li><span><a href="#Is-there-a-general-correlation-between-high-carbon-footprint-and-price?" data-toc-modified-id="Is-there-a-general-correlation-between-high-carbon-footprint-and-price?-3.2.2.3"><span class="toc-item-num">3.2.2.3&nbsp;&nbsp;</span>Is there a general correlation between high carbon footprint and price?</a></span></li></ul></li><li><span><a href="#Plot-distribution-of-availability-of-products-in-countries" data-toc-modified-id="Plot-distribution-of-availability-of-products-in-countries-3.2.3"><span class="toc-item-num">3.2.3&nbsp;&nbsp;</span>Plot distribution of availability of products in countries</a></span></li></ul></li></ul></li></ul></div>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
from datetime import datetime

import json
import pickle

import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)
    
%load_ext autoreload
%autoreload 2
    
import libs.exploring as explore
import libs.visualising as visualize
import libs.cleansing as cleanse

## Import cleansed data

In [2]:
# Import data
open_food_facts_csv_file = "./data/openfoodfacts_clean.csv"

food_facts_pd = pd.read_csv(open_food_facts_csv_file,
                            delimiter=",")

  interactivity=interactivity, compiler=compiler, result=result)


AttributeError: 'list' object has no attribute 'replace'

In [None]:
food_facts_pd.origins_cleaned= food_facts_pd.origins_cleaned.fillna("['Unknown']")
food_facts_pd.manufacturing_place_cleaned= food_facts_pd.manufacturing_place_cleaned.fillna("['Unknown']")
food_facts_pd.purchase_places_cleaned= food_facts_pd.purchase_places_cleaned.fillna("['Unknown']")
food_facts_pd = food_facts_pd.fillna('')

In [None]:
food_facts_pd.origins_cleaned = \
    food_facts_pd.origins_cleaned.apply(lambda l: cleanse.read(l))

food_facts_pd.manufacturing_place_cleaned = \
    food_facts_pd.manufacturing_place_cleaned.apply(lambda l: cleanse.read(l))

food_facts_pd.purchase_places_cleaned = \
                        food_facts_pd.purchase_places_cleaned.apply(lambda l: cleanse.read(l))


In [None]:
food_facts_pd.head(10)

#### Set column data types

In [None]:
food_facts_pd['carbon-footprint_100g'] = food_facts_pd['carbon-footprint_100g'].apply(pd.to_numeric, args=('coerce',))
food_facts_pd['energy_100g'] = food_facts_pd['energy_100g'].apply(pd.to_numeric, args=('coerce',))
#food_facts_pd['price_per_100g'] = food_facts_pd['price_per_100g'].apply(pd.to_numeric, args=('coerce',))
food_facts_pd['created_datetime'] = food_facts_pd['created_datetime'].apply(pd.to_datetime, args=('coerce',))
food_facts_pd.dtypes

## Analyse data

### Production / manufacture impact

#### Global distribution of global food producers

##### Which are the dominant global food producers and manufacturers?

From where are those products originating?

In [None]:
visualize.plot_occurences_of_distinct_values(food_facts_pd, 'origins_cleaned')

Where are those products manufactured?

In [None]:
visualize.plot_occurences_of_distinct_values(food_facts_pd, 'manufacturing_place_cleaned')

Where are those products bought?

In [None]:
visualize.plot_occurences_of_distinct_values(food_facts_pd, 'purchase_places_cleaned')

##### How is this distribution impacted when we consider neutral and large carbon footprint products? 

In [None]:
#dataset carbon footprint coming from Eaternity

#### Case study: Palm oil

##### Can we observe any trend in the number of products including this oil (assuming a strong dependence between date the product was added to the database and data the product was invented)?

In [None]:
#extracting products with palm oil 
palm_oil_pd = food_facts_pd[food_facts_pd.ingredients_text.str.contains("palm").fillna(value=False)]

In [None]:
print('{0} % of the products in the dataset contain palm oil'.format(palm_oil_pd.shape[0]/food_facts_pd.shape[0]))

In [None]:
palm_oil_pd

In [None]:
#palm_oil_pd.groupby('main_category')

##### Which country use palm oils for production?

In [None]:
a = palm_oil_pd.origins_cleaned.groupby(palm_oil_pd.origins_cleaned).sum

### Good nutrition impact

#### High-nutrional products

##### Has there been a surge in high graded Products in the UK / France over the past years?

In [None]:
nutrition_fr = food_facts_pd[['created_datetime','nutrition-score-fr_100g']]
nutrition_fr = nutrition_fr[nutrition_fr['nutrition-score-fr_100g'] != '']
nutrition_over_time = nutrition_fr.sort_values(by = 'created_datetime')
nutrition_over_time.dropna()

##### What are those products made of?
What is the composition? Do they contain many additives?  Where are these products sold? 

##### Where do these product come from and where are they manufactured?

##### Where are those products sold?

#### Carbon footprint of nutrionally-high graded products
Common sense would suggest most nutritionally-high graded products are organic (plant, fruit, vegetables, …) and are therefore not manufactured, thus having a small footprint.

In [None]:
carbon_footprints = food_facts_pd[food_facts_pd['carbon-footprint_100g'].notna()]

First, we should sensibilize for the data that we are dealing with. Therefore we visualize the origin and composition of the products.

In [None]:
visualize.plot_column_composition(carbon_footprints, columns=['countries_en', 'stores', 'main_category', 'labels_en'])

##### Are expansive and polluting products performing more poorly in the nutrition mark?

##### Can we establish a meaningful correlation between these product and the carbon footprint  or an estimated price (using another dataset or creating our own with web scraping)? 

In [None]:
# Food calories over carbon-foot print
visualize.plot_cluster_by_tags(df=carbon_footprints,
                                 plot2D_features = ["carbon-footprint_100g", "price_per_100g"],
                                 cluster="main_category")

##### Is there a general correlation between high carbon footprint and price? 

#### Plot distribution of availability of products in countries

In [None]:
# Find all distinct countries
countries_set = set()
for index, row in food_facts_pd.iterrows():
    for country in row.countries_en:
        countries_set.add(country)

# Count the number of time each country appear in the dataframe
country_count = dict()
for country in list(countries_set):
    country_count[country] = food_facts_pd.countries_en.apply({country}.issubset).sum()

In [None]:
country_count_pd = pd.DataFrame(list(country_count.items()), columns=['Country', 'Value'])
country_count_pd['Country'] = country_count_pd['Country'].str.upper()
country_count_pd

In [None]:
import folium

## Country coordinates for plotting
country_geo = './data/world-countries.json'

map = folium.Map(location=[0, 0], tiles='Mapbox Bright', zoom_start=1.5)

# choropleth maps bind Pandas Data Frames and json geometries.
map.choropleth(geo_data=country_geo,
               data=country_count_pd,
               columns=['Country', 'Value'],
               fill_color='YlGnBu', 
               key_on='feature.id',
               fill_opacity=0.7, 
               line_opacity=0.2,
               )
map