# Are we consuming more local?

## Research questions

1. Where are the products we consume in our everyday life coming from?

    - Which countries produce the primary resources (ground ingredients) consumed in Switzerland?
    - Which countries manufacture most of the products consumed in Switzerland?


2. Is there a trend over time to consume more local products?

    - Are new products mostly using primary resources from Switzerland? Or from other countries inside Europe?
    - Are new products mostly manufactured in Switzerland? Or from other countries inside Europe?
    - Is there a trend over time to local products to promote their origin?

## Datasets

Open Food Facts (https://world.openfoodfacts.org/data)

Additional datasets “Evolution de la consommation de denrées alimentaires en Suisse” (https://opendata.swiss/fr/dataset/entwicklung-des-nahrungsmittelverbrauches-in-der-schweiz-je-kopf-und-jahr1) and “Dépenses fédérales pour l’agriculture et l’alimentation” (https://opendata.swiss/fr/dataset/bundesausgaben-fur-die-landwirtschaft-und-die-ernahrung1) from https://opendata.swiss/fr/group/agriculture

## TODO

   - Cleaning/Exploring dataset
   - Descriptive analysis
   - We should determine the list of products that are sold in Switzerland
   - We should then classify these products under different categories:
        - products entirely originating from Switzerland
        - products partially originating from Switzerland (manufactured in Switzerland but ingredients are from another country)
        - products not originating from Switzerland
   - Draw statistics from the importation balances, determining which are the countries producing most of the ground ingredients and manufacturing the products consumed in Switzerland.


In [9]:
#imports
import re
import pandas as pd
import numpy as np
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt

import findspark
findspark.init()
import pyspark

from pyspark.sql import *
from pyspark.sql import functions as F
from pyspark.sql import SQLContext
from pyspark.sql.functions import *
from pyspark.sql.functions import min
from pyspark.sql.functions import to_date, last_day,date_add
from datetime import timedelta

spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext

In [2]:
DATA_FOLDER = 'data'

In [13]:
dataset = pd.read_csv(DATA_FOLDER+"/en.openfoodfacts.org.products.csv", sep='\t')


  interactivity=interactivity, compiler=compiler, result=result)


In [15]:
dataset.head()

Unnamed: 0,code,url,creator,created_t,created_datetime,last_modified_t,last_modified_datetime,product_name,generic_name,quantity,...,carbon-footprint_100g,nutrition-score-fr_100g,nutrition-score-uk_100g,glycemic-index_100g,water-hardness_100g,choline_100g,phylloquinone_100g,beta-glucan_100g,inositol_100g,carnitine_100g
0,17,http://world-en.openfoodfacts.org/product/0000...,kiliweb,1529059080,2018-06-15T10:38:00Z,1529059204,2018-06-15T10:40:04Z,Vitória crackers,,,...,,,,,,,,,,
1,31,http://world-en.openfoodfacts.org/product/0000...,isagoofy,1539464774,2018-10-13T21:06:14Z,1539464817,2018-10-13T21:06:57Z,Cacao,,130 g,...,,,,,,,,,,
2,123,http://world-en.openfoodfacts.org/product/0000...,kiliweb,1535737982,2018-08-31T17:53:02Z,1535737986,2018-08-31T17:53:06Z,Sauce Sweety chili 0%,,,...,,,,,,,,,,
3,291,http://world-en.openfoodfacts.org/product/0000...,kiliweb,1534239669,2018-08-14T09:41:09Z,1534239732,2018-08-14T09:42:12Z,Mendiants,,,...,,,,,,,,,,
4,949,http://world-en.openfoodfacts.org/product/0000...,kiliweb,1523440813,2018-04-11T10:00:13Z,1523440823,2018-04-11T10:00:23Z,Salade de carottes râpées,,,...,,,,,,,,,,


In [31]:
df = spark.read.csv(DATA_FOLDER+"/en.openfoodfacts.org.products.csv", header=True, mode="DROPMALFORMED", sep = '\t')

In [30]:
df.printSchema()

root
 |-- code: string (nullable = true)
 |-- url: string (nullable = true)
 |-- creator: string (nullable = true)
 |-- created_t: string (nullable = true)
 |-- created_datetime: string (nullable = true)
 |-- last_modified_t: string (nullable = true)
 |-- last_modified_datetime: string (nullable = true)
 |-- product_name: string (nullable = true)
 |-- generic_name: string (nullable = true)
 |-- quantity: string (nullable = true)
 |-- packaging: string (nullable = true)
 |-- packaging_tags: string (nullable = true)
 |-- brands: string (nullable = true)
 |-- brands_tags: string (nullable = true)
 |-- categories: string (nullable = true)
 |-- categories_tags: string (nullable = true)
 |-- categories_en: string (nullable = true)
 |-- origins: string (nullable = true)
 |-- origins_tags: string (nullable = true)
 |-- manufacturing_places: string (nullable = true)
 |-- manufacturing_places_tags: string (nullable = true)
 |-- labels: string (nullable = true)
 |-- labels_tags: string (nullable 

In [41]:
#Additional datasets “Evolution de la consommation de denrées alimentaires en Suisse”
df_ev_conso = pd.read_excel(DATA_FOLDER+"/je-f-07.06.02.xlsx", header=4, sheet_name='Dès 2007')
df_ev_conso = df_ev_conso.dropna(how='all')

In [42]:
df_ev_conso

Unnamed: 0,Etat des produits,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016 p
Céréales,Grain,98.013754,93.737296,96.762305,93.884316,99.974945,91.893762,90.099344,89.972571,92.58782,87.750283
Pommes de terre,"Fraîches, non parées",41.7786,46.869346,46.268682,47.817157,44.481201,48.713089,51.708254,41.844096,50.635046,47.44042
Sucre,Sucre raffiné,43.167207,43.590842,40.97456,37.465394,37.445127,36.370639,40.214113,37.856708,38.065779,36.721525
Miel,Miel,1.348237,1.210427,1.288076,1.321601,1.432108,1.158042,1.392348,1.136757,1.435543,1.132838
Légumes 1,"Frais, non parés",103.237925,105.738089,108.198836,107.22766,108.150723,106.543807,105.200511,104.577329,104.155867,102.678961
Fruits 1,"Frais, non parés",118.882028,123.689979,124.171271,119.810187,117.02879,121.106067,119.261679,114.949381,115.087259,115.868359
jus de légumes et de fruits,Jus,29.25341,32.029329,29.792858,26.973705,25.599959,28.426779,27.256378,23.147393,22.56969,22.312123
Huiles et graisses végétales,Huile,16.16735,16.141488,17.108515,17.395802,17.695043,16.176501,17.683895,17.861493,16.843297,17.421464
Viande,Viande désossée,51.799507,52.616272,51.150578,52.402102,52.228737,50.350228,50.541678,50.680753,49.827563,49.227692
de boeuf,Viande désossée,11.104512,11.678009,10.972415,11.11641,11.078811,10.827222,11.337183,11.140308,10.954513,10.917562


In [43]:
#Additional datasets “Dépenses fédérales pour l’agriculture et l’alimentation” 
dep_fed_al = pd.read_excel(DATA_FOLDER+"/je-f-07.02.03.02.04.xlsx", header=3, sheet_name='T 07.02.03.02.04')
dep_fed_al = dep_fed_al.dropna(how='all')

In [44]:
dep_fed_al

Unnamed: 0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
En millions de francs,2513.28,2902.51,2966.54,3223.44,3304.16,3355.62,3764.54,3728.49,3742.99,4028.13,...,3550.87,3692.28,3665.7,3663.02,3711.11,3705.97,3692.51,3667.27,3658.02,3651.97
Indice 1990 = 100,100,115.487,118.034,128.256,131.468,133.515,149.786,148.351,148.928,160.273,...,141.284,146.911,145.853,145.746,147.66,147.455,146.92,145.915,145.547,145.307
"Administration, exécution et contrôle",25.5316,27.9445,34.5693,34.9648,34.5059,33.7511,33.9551,35.3384,37.5954,40.9842,...,103.99,110.646,115.068,118.362,122.594,121.58,122.639,121.9,119.066,115.32
Administration,25.5316,27.9445,34.5693,34.9648,34.5059,33.7511,33.9551,35.3384,37.5954,40.9842,...,47.7666,51.8475,55.2193,55.1342,54.5769,54.2367,55.8413,54.6635,53.7946,51.8627
Vulgarisation,0,0,0,0,0,0,0,0,0,0,...,11.3264,11.15,12.1774,12.0389,12,11.9972,11.9907,11.8702,11.5978,11.6199
Exécution et contrôle,0,0,0,0,0,0,0,0,0,0,...,44.8971,47.6487,47.6711,51.1889,56.017,55.3459,54.8072,55.3661,53.6737,51.8376
Amélioration des bases de production,228.481,258.363,214.028,225.648,213.423,164.163,156.527,152.18,163.718,157.664,...,200.212,177.771,178.787,143.227,189.71,187.406,181.907,157.544,144.163,135.075
Améliorations structurelles,156.629,181.957,132.111,146.864,137.128,92.8777,91.0377,88.0971,95.9802,95.8635,...,139.501,129.792,132,95.9998,141,138.808,134.225,109.943,96.6955,86.3919
Améliorations de l'élevage,39.6888,41.864,46.002,45.1904,44.1103,42.3632,40.5382,38.8595,39.6513,34.2381,...,49.6232,45.8841,45.1557,45.7281,46.8031,46.4854,45.627,46.2907,46.9538,46.5638
Protection des plantes,32.1631,34.5424,35.9143,33.5941,32.1853,28.922,24.9516,25.2233,28.086,27.5623,...,11.088,2.09423,1.63093,1.49867,1.90723,2.11288,2.05525,1.31005,0.514118,2.11973
