# SPARK GROUP C ASSIGNMENT

**Purpose:** Data Analysis to find out the <font color=blue >**EATING HABITS IN EUROPEAN COUNTRIES**</font> <br>
**Focus Country:** <font color=blue >Germany</font>

**Author:** S2-3 (Group C) <br>
**Contact:** <br> pierremulliez@student.ie.edu <br> mate.vilic@student.ie.edu <br> elsaarnaiz@student.ie.edu <br> olga.frech@student.ie.edu <br> alienor@student.ie.edu <br> ggerman.souza@student.ie.edu <br> andreimenshchikov@student.ie.edu <br> jorge.campos@student.ie.edu

**Client:** Prof. Raúl Marín & <font color=red >ACME CORPORATION</font>  

**Code created:** 2021-02-21 <br>
**Last updated:** 2021-03-07

**Comment:** It is all about Food Products 


## Agenda
1. PySpark **environment setup**
2. Data source and **Spark data abstraction** - DataFrame **set up**
3. Exploratory **data analysis**
4. **Subset** data analysis
    1. Find the **oldest product**
    2. Find the **newest product**
    3. **Average product age**, where age means how long the product has been in the system
    4. **List of other countries** where products are sold too
    5. Identify **category of products** and the compute:
  1. **Number** of **products** by category
  2. **List** containing names of **products by category** 
    6. Identify traces and compute:
  1. **Number** of products **by trace**
  2. **List** containing names of **products by trace**
    7. Data quality analysis on **fields of interest** (see appendix 1):
  1. Number of products with **complete info**
  2. % of products **without complete analysis per 100g**
  3. % of products **without additives info**
  4. % of products **without traces info**
    8. **Data profiling** on fields of interest (see appendix 1):
  1. Stats on **analysis per 100g fields**
5. Analysis to find out **which products are safe**
  1. **Authorized additives** in the EU
  2. Products with **not authorized additives** by the EU 
  3. Introducing a **food traffic light**
  4. **Ratio** for each food traffic light
  5. **Extra analysis** of the most famous German product: **Wurst**

Let's start:

## 1. PySpark environment setup

In [1]:
#Importing APIs

import pandas as pd
import findspark
findspark.init()

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType, StringType
from pyspark.sql.window import Window
from IPython.display import display, Markdown
from pyspark.sql import SQLContext
from pyspark.sql import functions as F

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

sqlContext = SQLContext(spark)

## 2. Data source and Spark data abstraction - DataFrame setup

In [2]:
#Load dataset

#noTT.csv is the first dataset without titles, we cocatenated that one with title.csv to do the Analysis 
#complete.csv is the updated dataset with titles, we ran our codes again with this dataset and received slightly different results 

resto = spark.read \
                 .option("inferSchema", "true") \
                 .option("delimiter", "\t") \
                 .option("mode","PERMISSIVE") \
                 .option("header", "false") \
                 .csv("noTT.csv")
concate = spark.read \
                 .option("inferSchema", "true") \
                 .option("delimiter", "\t") \
                 .option("mode","PERMISSIVE") \
                 .option("header", "true") \
                 .csv("complete.csv")
tt = spark.read \
                 .option("inferSchema", "true") \
                 .option("delimiter", ";") \
                 .option("mode","PERMISSIVE") \
                 .option("header", "true") \
                 .csv("title.csv")

In [5]:
resto.show(3)
rest = resto.drop("C_185")
#.drop("C_173").drop("C_174").drop("C_175")
rest.show(1)
tt.show(1)
concate_with_title = tt.union(rest)
 

+----------+--------------------+-------------+----------+-------------------+----------+-------------------+--------------------+----+----------------+-----+---------+---------+----+------------------+------------------+--------------------+--------------------+--------------------+----+----+----+-------+-------+--------------------+--------------------+--------------------+----+----+----+----+----+----+----+-----------+----------+-------+--------------------+--------------------+----+--------------------+--------------------+--------------------+------+-----+----+----+----+--------------------+--------------------+----+----+----+----+----+----+----+----+----+-------------+-------+--------------------+--------------------+--------------------+----+--------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+------+-----+------+----+----+---------------+----+----+----+----+----+----+----+----+--

## 3. Exploratory **data analysis**

In [6]:
concate.show(2)

+----+--------------------+-------------+----------+-------------------+---------------+----------------------+--------------------+------------------------+------------+--------+---------+--------------+--------------+------------------+------------------+-----------+---------------+-------------+-------+------------+----------+--------------------+-------------------------+-------------+----------------+-------------+---------+--------------+------------------------+------+-----------+---------------+------+---------+--------------+------------+----------------+---------+------------+------+-----------+---------+------------+----------------+-------------+-----------+---------+--------------+------------+---------------------------+-------------------------+------------------------------+---------------------------------------+-------------------------------------+------------------------------------------+----------------+----------------+----------+-------------+-------------+-----

In [7]:
pdcont = concate.toPandas()

In [8]:
pd.set_option('display.max_columns', None)
pdcont.head(5)

Unnamed: 0,code,url,creator,created_t,created_datetime,last_modified_t,last_modified_datetime,product_name,abbreviated_product_name,generic_name,quantity,packaging,packaging_tags,packaging_text,brands,brands_tags,categories,categories_tags,categories_en,origins,origins_tags,origins_en,manufacturing_places,manufacturing_places_tags,labels,labels_tags,labels_en,emb_codes,emb_codes_tags,first_packaging_code_geo,cities,cities_tags,purchase_places,stores,countries,countries_tags,countries_en,ingredients_text,allergens,allergens_en,traces,traces_tags,traces_en,serving_size,serving_quantity,no_nutriments,additives_n,additives,additives_tags,additives_en,ingredients_from_palm_oil_n,ingredients_from_palm_oil,ingredients_from_palm_oil_tags,ingredients_that_may_be_from_palm_oil_n,ingredients_that_may_be_from_palm_oil,ingredients_that_may_be_from_palm_oil_tags,nutriscore_score,nutriscore_grade,nova_group,pnns_groups_1,pnns_groups_2,states,states_tags,states_en,brand_owner,main_category,main_category_en,image_url,image_small_url,image_ingredients_url,image_ingredients_small_url,image_nutrition_url,image_nutrition_small_url,energy-kj_100g,energy-kcal_100g,energy_100g,energy-from-fat_100g,fat_100g,saturated-fat_100g,-butyric-acid_100g,-caproic-acid_100g,-caprylic-acid_100g,-capric-acid_100g,-lauric-acid_100g,-myristic-acid_100g,-palmitic-acid_100g,-stearic-acid_100g,-arachidic-acid_100g,-behenic-acid_100g,-lignoceric-acid_100g,-cerotic-acid_100g,-montanic-acid_100g,-melissic-acid_100g,monounsaturated-fat_100g,polyunsaturated-fat_100g,omega-3-fat_100g,-alpha-linolenic-acid_100g,-eicosapentaenoic-acid_100g,-docosahexaenoic-acid_100g,omega-6-fat_100g,-linoleic-acid_100g,-arachidonic-acid_100g,-gamma-linolenic-acid_100g,-dihomo-gamma-linolenic-acid_100g,omega-9-fat_100g,-oleic-acid_100g,-elaidic-acid_100g,-gondoic-acid_100g,-mead-acid_100g,-erucic-acid_100g,-nervonic-acid_100g,trans-fat_100g,cholesterol_100g,carbohydrates_100g,sugars_100g,-sucrose_100g,-glucose_100g,-fructose_100g,-lactose_100g,-maltose_100g,-maltodextrins_100g,starch_100g,polyols_100g,fiber_100g,-soluble-fiber_100g,-insoluble-fiber_100g,proteins_100g,casein_100g,serum-proteins_100g,nucleotides_100g,salt_100g,sodium_100g,alcohol_100g,vitamin-a_100g,beta-carotene_100g,vitamin-d_100g,vitamin-e_100g,vitamin-k_100g,vitamin-c_100g,vitamin-b1_100g,vitamin-b2_100g,vitamin-pp_100g,vitamin-b6_100g,vitamin-b9_100g,folates_100g,vitamin-b12_100g,biotin_100g,pantothenic-acid_100g,silica_100g,bicarbonate_100g,potassium_100g,chloride_100g,calcium_100g,phosphorus_100g,iron_100g,magnesium_100g,zinc_100g,copper_100g,manganese_100g,fluoride_100g,selenium_100g,chromium_100g,molybdenum_100g,iodine_100g,caffeine_100g,taurine_100g,ph_100g,fruits-vegetables-nuts_100g,fruits-vegetables-nuts-dried_100g,fruits-vegetables-nuts-estimate_100g,collagen-meat-protein-ratio_100g,cocoa_100g,chlorophyl_100g,carbon-footprint_100g,carbon-footprint-from-meat-or-fish_100g,nutrition-score-fr_100g,nutrition-score-uk_100g,glycemic-index_100g,water-hardness_100g,choline_100g,phylloquinone_100g,beta-glucan_100g,inositol_100g,carnitine_100g
0,17.0,http://world-en.openfoodfacts.org/product/0000...,kiliweb,1591989744,2020-06-12 21:22:24,1609478763,2021-01-01 06:26:03,Vitória crackers,,,,barquette,barquette,,,,,,,,,,,,,,,,,,,,,,Allemagne,en:germany,Germany,,,,,,,,,,,,,,,,,,,,,,,unknown,unknown,"en:to-be-completed, en:nutrition-facts-complet...","en:to-be-completed,en:nutrition-facts-complete...","To be completed,Nutrition facts completed,Ingr...",,,,,,,,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,,375.0,1569.0,,7.0,3.08,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,70.099998,15.0,,,,,,,,,,,,7.8,,,,1.4,0.56,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,5.0,http://world-en.openfoodfacts.org/product/0000...,waistline-app,1559391422,2019-06-01 14:17:02,1609173092,2020-12-28 17:31:32,Katsuobushi (Dried and smoked bonito flakes),,,,,,,Wadakyu Europe S.L,wadakyu-europe-s-l,Katsoubushi,en:katsoubushi,Katsoubushi,,,,,,Made in Spain,en:made-in-spain,Made in Spain,,,,,,,,Germany,en:germany,Germany,,,,,,,100g,100.0,,,,,,,,,,,,1.0,b,,unknown,unknown,"en:to-be-completed, en:nutrition-facts-complet...","en:to-be-completed,en:nutrition-facts-complete...","To be completed,Nutrition facts completed,Ingr...",,en:katsoubushi,Katsoubushi,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,,369.0,1540.0,,5.2,1.7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1,0.1,,,,,,,,,,,,80.6,,,,0.3,0.12,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,
2,111111.0,http://world-en.openfoodfacts.org/product/0000...,prepperapp,1613302516,2021-02-14 12:35:16,1613302516,2021-02-14 12:35:16,blabblub,,,100g,,,,lecker schmecker,lecker-schmecker,,,,,,,,,,,,,,,,,,,en:germany,en:germany,Germany,,,,,,,,,,,,,,,,,,,,,,,unknown,unknown,"en:to-be-completed, en:nutrition-facts-complet...","en:to-be-completed,en:nutrition-facts-complete...","To be completed,Nutrition facts completed,Ingr...",,,,,,,,,,,111.0,464.0,,222.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,222.0,12.0,,,,,,,,,,,,12.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,20424640.0,http://world-en.openfoodfacts.org/product/0000...,twoflower,1489527029,2017-03-14 22:30:29,1547121879,2019-01-10 13:04:39,Belgische Pralinen,,Pralinenmischung,250 g,21 PAP,21-pap,,J. D. Gross,j-d-gross,"Imbiss, Süßwaren, Konfekt, Schokoladenkonfekt,...","en:snacks,en:sweet-snacks,en:confectioneries,e...","Snacks,Sweet snacks,Confectioneries,Chocolate ...",,,,Belgien,belgien,"Nachhaltige Agrikultur, UTZ Certified, UTZ Cer...","en:sustainable-farming,en:utz-certified,en:utz...","Sustainable farming,UTZ Certified,UTZ Certifie...",,,,,,,Lidl,Deutschland,en:germany,Germany,"Zucker, Kakaomasse, Kakaobutter, _Vollmilchpul...","en:milk,en:nuts,en:soybeans,de:Pisatazien",,"en:eggs,en:gluten,en:nuts,de:Alkohol","en:eggs,en:gluten,en:nuts,de:alkohol","Eggs,Gluten,Nuts,de:alkohol","12,5 g",12.5,,3.0,,"en:e322,en:e420,en:e422","E322 - Lecithins,E420 - Sorbitol,E422 - Glycerol",0.0,,,0.0,,,26.0,e,4.0,Sugary snacks,Sweets,"en:to-be-checked, en:complete, en:nutrition-fa...","en:to-be-checked,en:complete,en:nutrition-fact...","To be checked,Complete,Nutrition facts complet...",,en:bonbons,Bonbons,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,https://static.openfoodfacts.org/images/produc...,2257.0,,2257.0,,33.3,21.1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,53.8,51.5,,,,,,,,,,,,4.7,,,,0.09,0.036,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,26.0,,,,,,,,
4,29035175.0,http://world-en.openfoodfacts.org/product/0000...,waistline-app,1549218790,2019-02-03 19:33:10,1549218790,2019-02-03 19:33:10,Schoko Duo,,,,,,,Biscotto,biscotto,,,,,,,,,,,,,,,,,,,en:DE,en:germany,Germany,,,,,,,14g,14.0,,,,,,,,,,,,,,,unknown,unknown,"en:to-be-completed, en:nutrition-facts-complet...","en:to-be-completed,en:nutrition-facts-complete...","To be completed,Nutrition facts completed,Ingr...",,,,,,,,,,,500.0,2090.0,,24.3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,62.9,,,,,,,,,,,,,7.14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [9]:
pdcont.describe()

Unnamed: 0,code,created_t,last_modified_t,serving_quantity,additives_n,ingredients_from_palm_oil_n,ingredients_that_may_be_from_palm_oil_n,nutriscore_score,nova_group,energy-kj_100g,energy-kcal_100g,energy_100g,energy-from-fat_100g,fat_100g,saturated-fat_100g,-caprylic-acid_100g,-capric-acid_100g,-lauric-acid_100g,-myristic-acid_100g,-palmitic-acid_100g,-stearic-acid_100g,-arachidic-acid_100g,-behenic-acid_100g,-cerotic-acid_100g,monounsaturated-fat_100g,polyunsaturated-fat_100g,omega-3-fat_100g,-alpha-linolenic-acid_100g,-eicosapentaenoic-acid_100g,-docosahexaenoic-acid_100g,omega-6-fat_100g,-linoleic-acid_100g,-arachidonic-acid_100g,-gamma-linolenic-acid_100g,-dihomo-gamma-linolenic-acid_100g,omega-9-fat_100g,-oleic-acid_100g,-gondoic-acid_100g,trans-fat_100g,cholesterol_100g,carbohydrates_100g,sugars_100g,-sucrose_100g,-glucose_100g,-fructose_100g,-lactose_100g,-maltose_100g,-maltodextrins_100g,starch_100g,polyols_100g,fiber_100g,-soluble-fiber_100g,-insoluble-fiber_100g,proteins_100g,salt_100g,sodium_100g,alcohol_100g,vitamin-a_100g,beta-carotene_100g,vitamin-d_100g,vitamin-e_100g,vitamin-k_100g,vitamin-c_100g,vitamin-b1_100g,vitamin-b2_100g,vitamin-pp_100g,vitamin-b6_100g,vitamin-b9_100g,folates_100g,vitamin-b12_100g,biotin_100g,pantothenic-acid_100g,silica_100g,bicarbonate_100g,potassium_100g,chloride_100g,calcium_100g,phosphorus_100g,iron_100g,magnesium_100g,zinc_100g,copper_100g,manganese_100g,fluoride_100g,selenium_100g,chromium_100g,molybdenum_100g,iodine_100g,caffeine_100g,taurine_100g,ph_100g,fruits-vegetables-nuts_100g,fruits-vegetables-nuts-dried_100g,fruits-vegetables-nuts-estimate_100g,collagen-meat-protein-ratio_100g,cocoa_100g,carbon-footprint_100g,carbon-footprint-from-meat-or-fish_100g,nutrition-score-fr_100g,choline_100g,phylloquinone_100g,beta-glucan_100g,inositol_100g,carnitine_100g
count,79593.0,79593.0,79593.0,19453.0,36131.0,36131.0,36131.0,35579.0,31690.0,19326.0,47594.0,55418.0,5.0,54943.0,51422.0,1.0,1.0,3.0,1.0,2.0,1.0,7.0,2.0,1.0,477.0,475.0,146.0,40.0,3.0,6.0,29.0,5.0,3.0,1.0,1.0,1.0,1.0,2.0,251.0,241.0,55129.0,54726.0,13.0,9.0,11.0,99.0,3.0,1.0,30.0,113.0,14055.0,11.0,8.0,54786.0,49904.0,49904.0,2536.0,377.0,3.0,271.0,449.0,30.0,706.0,400.0,296.0,345.0,440.0,266.0,14.0,444.0,176.0,285.0,5.0,41.0,362.0,110.0,1217.0,179.0,535.0,599.0,190.0,56.0,54.0,70.0,47.0,21.0,18.0,113.0,46.0,8.0,7.0,346.0,13.0,540.0,1.0,489.0,22.0,156.0,35579.0,5.0,3.0,2.0,5.0,1.0
mean,2.86099e+35,1552983000.0,1592304000.0,134.226385,1.212615,0.011126,0.026404,8.883752,3.210603,1129.487759,266.262468,1095.838304,269.22,13.580277,5.307997,7.4,6.2,47.433333,18.9,4.051,3.0,0.005184,0.000665,0.0,23.776689,11.889275,7.297462,5.571745,0.082667,0.6894667,15.658966,0.356994,0.000544,2.8,7.3,1.1,5.9,0.6787,0.045135,0.14673,25.421314,12.033902,10.276925,7.321111,12.16455,1.101111,7.3664,0.008,10.176,59.403363,4.170846,2.9,2.25,8.160791,1.362798,0.545087,3.813683,8.138353,0.003327,0.1194782,0.321905,0.633593,1.371714,1.429291,0.013529,0.171558,1.071977,0.8656025,0.000167,0.01529034,0.241619,0.092201,0.118171,0.076468,1.97362,4.117882,1.456739,3.671081,0.213604,0.560881,0.09873,0.008614,0.044992,0.249075,0.297907,1.904876,1.222282,1.880639,1.012103,0.5575,6.214286,40.29659,46.888462,49.003611,15.0,45.944172,130.81929,574.119103,8.883752,0.0524,35.2,4.0,0.01854,0.03
std,5.266148e+37,51139930.0,27632470.0,856.584175,1.881728,0.105682,0.190619,8.980991,1.090647,840.502532,667.686511,2604.932654,212.270917,18.219801,8.462249,,,2.205297,,5.726151,,0.007581,0.000191,,23.790815,15.110729,10.973833,9.958093,0.025813,1.277207,13.41793,0.514587,0.00083,,,,,0.216799,0.34388,2.144605,27.479587,18.349152,10.238663,5.257924,15.65314,2.93978,12.673154,,17.82637,35.768674,6.440675,2.805352,2.866058,10.002972,11.982079,4.789758,9.442618,70.039311,0.002348,0.9115365,2.29693,3.468861,17.199828,20.836311,0.093183,1.421749,20.467098,10.6355,0.000187,0.161029,2.322659,0.650806,0.227955,0.107582,31.544547,38.389042,27.196349,44.44781,2.759272,5.444018,0.566431,0.035138,0.257959,2.031172,2.042105,8.728689,4.709868,10.201105,5.215204,0.684893,1.909126,38.106603,36.472959,27.405687,,21.532093,141.201078,681.034262,8.980991,0.0545,30.507703,0.0,0.02667,
min,1.0,1329036000.0,1421521000.0,0.0,0.0,0.0,0.0,-14.0,1.0,0.0,0.0,0.0,59.0,0.0,0.0,7.4,6.2,45.0,18.9,0.002,3.0,0.0,0.00053,0.0,0.0,0.0,0.0,2e-05,0.053,3.5e-09,0.52,0.00027,1.2e-05,2.8,7.3,1.1,5.9,0.5254,0.0,0.0,0.0,0.0,2.4e-05,2e-08,1.1e-07,0.0,0.0032,0.008,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0016,0.0,1e-06,4e-06,0.0,0.0,0.0,0.0,0.0,2e-08,1.7e-05,0.0,1e-06,0.0,2.6e-05,3.2e-05,0.0,5e-06,0.0,2e-10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5e-07,3e-06,0.0,0.004,0.04,2.4,0.0,2.0,0.0,15.0,1.0,0.0,1.96,-14.0,0.011,1e-06,4.0,0.0039,0.03
25%,4002240000000.0,1528108000.0,1585485000.0,30.0,0.0,0.0,0.0,1.0,3.0,399.0,88.0,352.0,74.1,0.8,0.2,7.4,6.2,46.5,18.9,2.0265,3.0,0.00048,0.000597,0.0,2.8,2.2,1.035,0.0985,0.074,0.00365,6.0,0.0047,6.6e-05,2.8,7.3,1.1,5.9,0.60205,0.0,0.0,3.8,1.0,4.7,4.79,3.895,0.0,0.0496,0.008,2.1,21.0,0.5,1.0,0.75,1.4,0.06,0.024,0.0,0.0,0.00199,7.5e-07,0.0025,5e-06,0.008,0.00023,0.00021,0.0032,0.00028,3.3475e-05,4.8e-05,3.8e-07,8e-06,0.0012,0.00183,0.0226,0.000318,0.0014,0.016,0.203,0.000705,0.00347,0.0021,0.0002,2.3e-05,1.6e-05,1.1e-05,8e-06,1e-05,3.1e-05,0.0285,0.35,5.95,0.0,13.4,25.0,15.0,30.0,0.431098,158.875,1.0,0.012,25.800001,4.0,0.0049,0.03
50%,4056489000000.0,1563968000.0,1600088000.0,100.0,0.0,0.0,0.0,8.0,4.0,1041.0,240.0,987.0,251.0,5.5,1.7,7.4,6.2,48.0,18.9,4.051,3.0,0.0012,0.000665,0.0,16.0,6.0,3.05,0.16,0.095,0.02945,9.5,0.08,0.00012,2.8,7.3,1.1,5.9,0.6787,0.0,0.0,12.2,3.9,8.8,6.0,6.0,0.1,0.096,0.008,2.75,64.0,2.5,2.0,2.0,5.6,0.45,0.18,0.0,0.00012,0.00238,2.5e-06,0.009,2.1e-05,0.0213,0.000445,0.000455,0.0064,0.000595,7.18e-05,0.000119,6.3e-07,1.7e-05,0.002,0.032,0.034,0.002905,0.00395,0.12,0.289,0.0027,0.041,0.00252,0.000515,0.000535,6.6e-05,2e-05,8.2e-06,4.3e-05,0.0001,0.035,0.4,7.0,30.0,49.0,50.0,15.0,40.0,71.0,487.35,8.0,0.015,51.6,4.0,0.0069,0.03
75%,4316269000000.0,1590832000.0,1612203000.0,125.0,2.0,0.0,0.0,15.0,4.0,1645.0,379.0,1582.0,418.0,22.0,7.3,7.4,6.2,48.65,18.9,6.0755,3.0,0.007065,0.000733,0.0,36.0,16.0,7.975,6.425,0.0975,0.6695,19.0,0.5,0.00081,2.8,7.3,1.1,5.9,0.75535,0.0,0.005,49.0,13.0,10.0,10.0,10.5,0.1,11.048,0.008,9.3225,96.0,5.685,3.95,2.0,11.9,1.4,0.56,4.8,0.000551,0.00419,7.5e-06,0.02,5.3e-05,0.044975,0.001,0.0013,0.014,0.001565,0.00019855,0.000214,2.1e-06,5e-05,0.006,0.032,0.0652,0.25,0.020625,0.208,0.425,0.0048,0.13,0.004812,0.00155,0.00275,0.000875,5.3e-05,8e-05,0.000127,0.002,0.07375,0.4,7.25,78.875,76.0,70.0,15.0,60.0,259.25,636.4,15.0,0.108,52.8,4.0,0.011,0.03
max,1.043065e+40,1614040000.0,1614042000.0,100000.0,19.0,2.0,3.0,40.0,4.0,31246.0,137000.0,573000.0,544.0,405.5,100.0,7.4,6.2,49.3,18.9,8.1,3.0,0.02,0.0008,0.0,78.0,72.0,50.6,46.0,0.1,3.2,54.5,1.2,0.0015,2.8,7.3,1.1,5.9,0.832,5.0,33.3,1400.0,545.0,34.0,15.0,44.0,12.0,22.0,0.008,77.0,100.0,100.0,10.0,9.0,804.0,1710.0,682.0,100.0,800.0,0.006,8.4,28.0,19.0,430.0,382.0,1.2,16.0,429.0,166.0,0.000727,2.5,28.0,7.4,0.525,0.415,600.0,400.0,800.0,595.0,60.9,94.0,5.556,0.25,1.84,17.0,14.0,40.0,20.0,91.0,35.0,2.22,7.7,100.0,95.0,100.0,15.0,100.0,440.0,3580.0,40.0,0.116,54.0,4.0,0.066,0.03


In [10]:
#pdfood = pd.read_csv("nooT.csv", sep = "\t")

In [11]:
#checking the right position of the columns - concate.filter
#for ingredients we have the right position

concate.filter(col("ingredients_text") != "").select("ingredients_text").show(25)
concate_with_title.filter(col("ingredients_text") != "").select("ingredients_text").show(25)

+--------------------+
|    ingredients_text|
+--------------------+
|Zucker, Kakaomass...|
|Zucker,Palmöl, Ha...|
|Proteinmischung (...|
|Proteinmischung (...|
|100% Soja-Protein...|
|Molkenproteinkonz...|
|Molkenproteinkonz...|
|Molkenproteinkonz...|
|sugar, corn syrup...|
|sugar, corn syrup...|
|Natürliches Miner...|
|Proteinmischung (...|
|100% Soja-Protein...|
|Enthält: 10 mg Me...|
|Tapioca Syrup, Ve...|
|         quinoa 100%|
|Sojaproteinisolat...|
|Heringsfilets (88...|
|Cucumbers, water,...|
|Composition pour ...|
|corn syrup, sugar...|
|water, soybean oi...|
|Branntweinessig, ...|
|Distilled Vinegar...|
|Branntweinessig, ...|
+--------------------+
only showing top 25 rows

+--------------------+
|    ingredients_text|
+--------------------+
|Zucker, Kakaomass...|
|Zucker,Palmöl, Ha...|
|Proteinmischung (...|
|Proteinmischung (...|
|100% Soja-Protein...|
|Molkenproteinkonz...|
|Molkenproteinkonz...|
|Molkenproteinkonz...|
|sugar, corn syrup...|
|sugar, corn syrup...|
|Natürli

- Creator
- Created_datetime
- Last_modified_datetime
- Product_name
- Countries_en
- Traces_en
- Additives_tags
- Main_category_en
- Image_url
- Quantity
- Packaging_tags
- Categories_en
- Ingredients_text
- Additives_en
- Energy-kcal_100g
- Fat_100g
- Saturated-fat_100g
- Sugars_100g
- Salt_100g/sodium_100g

In [4]:
prepared_df = concate.select("creator",
"created_t",                             
"last_modified_t",
"product_name",
"countries_en",
"traces_en",
"additives_en",
"main_category",
"image_url",
"quantity",
"packaging_tags",
"categories",
"ingredients_text",
"additives_tags",
"energy-kj_100g",
"fat_100g",
"saturated-fat_100g",
"sugars_100g",
"salt_100g",
"sodium_100g" )

#note: could not find additive, replaced with additives_en

In [5]:
prepared_df.show(3)

+-------------+----------+---------------+--------------------+------------+---------+------------+--------------+--------------------+--------+--------------+-----------+----------------+--------------+--------------+--------+------------------+-----------+---------------+----------------+
|      creator| created_t|last_modified_t|        product_name|countries_en|traces_en|additives_en| main_category|           image_url|quantity|packaging_tags| categories|ingredients_text|additives_tags|energy-kj_100g|fat_100g|saturated-fat_100g|sugars_100g|      salt_100g|     sodium_100g|
+-------------+----------+---------------+--------------------+------------+---------+------------+--------------+--------------------+--------+--------------+-----------+----------------+--------------+--------------+--------+------------------+-----------+---------------+----------------+
|      kiliweb|1591989744|     1609478763|    Vitória crackers|     Germany|     null|        null|          null|          

In [6]:
prepared_df.filter(col("salt_100g").isNotNull()).count()

49904

# 4. Subset data analysis 

## Find the Newest product:

In [7]:
prepared_df.select("created_t", "product_name").filter(col("product_name").isNotNull()).sort(col("created_t"),ascending = False).show(1,0)

+----------+-------------------+
|created_t |product_name       |
+----------+-------------------+
|1614038706|Vegetarische Wiener|
+----------+-------------------+
only showing top 1 row



In [10]:
#oldest date 
new = 1614038706 / 60 / 60 / 24 / 365

In [13]:
new = 1970 + new

In [14]:
new

2021.1808316210045

## Find the Oldest product:

In [15]:
prepared_df.select("created_t", "Product_name").sort(col("created_t"),ascending = True).show(1,0)

+----------+----------------+
|created_t |Product_name    |
+----------+----------------+
|1329035567|Milka Ganze Nuss|
+----------+----------------+
only showing top 1 row



In [16]:
#oldest date 
old = 1329035567 / 60 / 60 / 24 / 365
old = 1970 + old
old

2012.1434413685947

## Average product age, where age means how long the product has been in the system:

In [17]:
#convertion from unix format seconds to days to years = 3600*24*365
prepared_df.filter(col("created_t") > 0 ).agg((avg("created_t")/31536000).alias("average time (years)")).show()


+--------------------+
|average time (years)|
+--------------------+
|   49.24477740652864|
+--------------------+



In [21]:
average_years = 2021 - (1970 + 49)
print("Average years in since creation is " + str(average_years))

Average years in since creation is 2


## List of other countries where products are sold too:

In [19]:
prepared_df.select(explode(split(col("countries_en"), ",")).alias("indivcountry"))  \
            .filter((col("indivcountry") != "Allemagne") & (col("indivcountry") != "Germany") & (col("indivcountry") != "Deutschland") & ( col("indivcountry") != "en:DE") & ( col("indivcountry") != "en:de")) \
            .select("indivcountry").distinct() \
            .show(15,0) 
   

+----------------------+
|indivcountry          |
+----------------------+
|Middle-east-africa    |
|Côte d'Ivoire         |
|Luxemburgo            |
|Czech-republic-čeština|
|Greece-ελληνικά       |
|Russia                |
|Paraguay              |
|Estados-unidos        |
|Hungary-magyar        |
|Romania-romană        |
|Hong-kong-粵語        |
|Malaysia-中文         |
|Suiza                 |
|Senegal               |
|Tschechien            |
+----------------------+
only showing top 15 rows



In [20]:
#Since we have various different names for the same country, we receive 350 distinct countries but it is at least an indicator that many countries share the same products  

from pyspark.sql.functions import countDistinct, avg, stddev

prepared_df.select(explode(split(col("countries_en"), ",")).alias("indivcountry"))  \
            .filter((col("indivcountry") != "Allemagne") & (col("indivcountry") != "Germany") & (col("indivcountry") != "Deutschland") & ( col("indivcountry") != "en:DE") & ( col("indivcountry") != "en:de")) \
            .select(countDistinct("indivcountry").alias("Distinct Countries")).show()

+------------------+
|Distinct Countries|
+------------------+
|               350|
+------------------+



## Identify category of products and compute:
 - Number of products by category
 - List containing names of products by category 

In [21]:
prepared_df.filter(col("categories") != 'null').groupby("categories") \
            .agg(count("product_name").alias("product_total")) \
            .sort("product_total",ascending = False) \
            .show(40,0)

+----------------------------------------------------------------------------------------------------------------------------------------+-------------+
|Categories                                                                                                                              |Product_total|
+----------------------------------------------------------------------------------------------------------------------------------------+-------------+
|Milchprodukte, Fermentierte Lebensmittel, Fermentierte Milch, Joghurt                                                                   |394          |
|Milchprodukte, Fermentierte Lebensmittel, Fermentierte Milch, Käse                                                                      |354          |
|Milchprodukte, Fermentierte Lebensmittel, Fermentierte Milch, Käse, Frischkäse                                                          |266          |
|Fleisch, Zubereitetes Fleisch, Würste                                            

In [46]:
#List containing names of products by category
(prepared_df
  .filter(col("categories") != 'null')
  .groupby("categories")
  .agg(F.collect_list("product_name").alias("product_list"),count("product_name").alias("product_total"))
  .sort("product_total",ascending =False)
  .show(10))

+--------------------+--------------------+-------------+
|          categories|        product_list|product_total|
+--------------------+--------------------+-------------+
|Milchprodukte, Fe...|[LAC Fruchtjoghur...|          394|
|Milchprodukte, Fe...|[Butterkäse cremi...|          354|
|Milchprodukte, Fe...|[Frischkäse Kräut...|          266|
|Fleisch, Zubereit...|[Bierschinken, Pr...|          260|
|Lebensmittel, Saucen|[Plum sauce, Scha...|          240|
|      Brotaufstriche|[Reissirup-Aufstr...|          215|
|Imbiss, Süßwaren,...|[Digestives, Crea...|          207|
|Pflanzliche Leben...|[Gabel Spaghetti,...|          205|
|Imbiss, Süßwaren,...|[Mandel Orange, S...|          198|
|Fleisch, Zubereit...|[Salami geräucher...|          185|
+--------------------+--------------------+-------------+
only showing top 10 rows



## Identify traces and compute: 
 - Number of products by trace 
 - List containing names of products by trace

In [47]:
prepared_df.filter(col("traces_en") != 'null').groupby("traces_en") \
            .agg(count("product_name").alias("product_total")) \
            .sort("product_total",ascending =False) \
            .show(40,0)

+---------------------------------------+-------------+
|traces_en                              |product_total|
+---------------------------------------+-------------+
|Nuts                                   |1339         |
|Nuts,Peanuts                           |472          |
|Soybeans                               |468          |
|Milk                                   |459          |
|Celery,Mustard                         |451          |
|Sesame seeds                           |326          |
|Celery                                 |249          |
|Gluten,Nuts                            |248          |
|Gluten                                 |239          |
|Milk,Nuts                              |229          |
|Nuts,Soybeans                          |205          |
|Eggs                                   |181          |
|Mustard                                |155          |
|Nuts,Sesame seeds                      |150          |
|Eggs,Gluten,Nuts,Peanuts               |126    

In [48]:
#List containing names of products by trace
(prepared_df
  .filter(col("traces_en") != 'null')
  .groupby("traces_en")
  .agg(F.collect_list("product_name").alias("product_list"),count("product_name").alias("product_total"))
  .sort("product_total",ascending =False)
  .show(10))

+--------------+--------------------+-------------+
|     traces_en|        product_list|product_total|
+--------------+--------------------+-------------+
|          Nuts|[Baumkuchen, Frui...|         1339|
|  Nuts,Peanuts|[Noisettes grille...|          472|
|      Soybeans|[Hühner-Nudeltopf...|          468|
|          Milk|[Gauda jung, Vita...|          459|
|Celery,Mustard|[Sülzkotelett, Sa...|          451|
|  Sesame seeds|[Flûtes Salées, S...|          326|
|        Celery|[Weißkrautsalat k...|          249|
|   Gluten,Nuts|[Flips Cacahuètes...|          248|
|        Gluten|[Yogo Drink, Lins...|          239|
|     Milk,Nuts|[Frischkäse, Anan...|          229|
+--------------+--------------------+-------------+
only showing top 10 rows



## Data quality analysis on fields of interest (see appendix 1): 
 - Number of products with complete info
 - % of products without complete analysis per 100g
 - % of products without additives info
 - % of products without traces info

In [51]:
#df filtering relevant columns 

prepared_df.filter((col("creator").isNotNull()) & 
                   (col("last_modified_t").isNotNull()) &
                  (col("product_name").isNotNull()) & 
                   (col("countries_en").isNotNull()) & 
                   (col("traces_en").isNotNull()) &
                   (col("additives_en").isNotNull()) & 
                   (col("additives_tags").isNotNull()) &
                   (col("packaging_tags").isNotNull()) & 
                   (col("categories").isNotNull()) & 
                   (col("ingredients_text").isNotNull()) & 
                   (col("energy-kcal_100g").isNotNull()) & 
                   (col("fat_100g").isNotNull()) &
                   (col("saturated-fat_100g").isNotNull()) & 
                   (col("sugars_100g").isNotNull()) & 
                   (col("salt_100g").isNotNull()) &
                   (col("sodium_100g").isNotNull())).count()


3955

In [50]:
#% of products without complete analysis per 100g

productsid = prepared_df.select(col("created_t")).count()
null100 = prepared_df.filter((col("energy-kcal_100g").isNotNull()) & 
                   (col("fat_100g").isNotNull()) &
                   (col("saturated-fat_100g").isNotNull()) & 
                   (col("sugars_100g").isNotNull()) & 
                   (col("salt_100g").isNotNull()) &
                   (col("sodium_100g").isNotNull()) ).count()
(100 * null100 / productsid)

52.82499717311824

In [52]:
#% of products without additives info

nulladd = prepared_df.filter( col("additives_tags").isNull()).count()
print("Ratio of null additive to all product")
(100 * nulladd / productsid)

Ratio of null additive to all product


78.98307640119106

In [53]:
#% of products without traces info

nulltrace = prepared_df.filter( col("traces_en").isNull()).count()
print("Ratio of null traces to all product")
(100 * nulltrace / productsid)

Ratio of null traces to all product


84.44209917957609

In [29]:
#dictanalysis = [prepared_df.filter(col(column).isNotNull()).count() for column in prepared_df.schema.names]

## Data profiling on fields of interest (see appendix 1): 
 - Stats on analysis per 100g fields

In [54]:
print ("Summary of columns Fat_100g, Saturated Sugars_100g, Salt_100g and Sodium_100g:")
prepared_df.select("Fat_100g","Saturated-fat_100g","Sugars_100g","Salt_100g","Sodium_100g").summary().show()

print("Checking for nulls on columns Fat_100g, Saturated Sugars_100g, Salt_100g and Sodium_100g:")
prepared_df.select([count(when(col(c).isNull(), c)).alias(c) for c in ["Fat_100g","Saturated-fat_100g","Sugars_100g","Salt_100g","Sodium_100g"]]).show()

Summary of columns Fat_100g, Saturated Sugars_100g, Salt_100g and Sodium_100g:
+-------+------------------+------------------+------------------+------------------+------------------+
|summary|          Fat_100g|Saturated-fat_100g|       Sugars_100g|         Salt_100g|       Sodium_100g|
+-------+------------------+------------------+------------------+------------------+------------------+
|  count|             54943|             51422|             54726|             49904|             49904|
|   mean|13.580277425221896|   5.3079965798318|12.033901678974646| 1.362798136966866|0.5450869765681883|
| stddev|18.219800973810337| 8.462248779070036| 18.34915219315741|11.982079340684457| 4.789758286301969|
|    min|               0.0|               0.0|               0.0|               0.0|               0.0|
|    25%|               0.8|               0.2|               1.0|              0.06|             0.024|
|    50%|               5.5|               1.7|               3.9|              0

# 5. Data analysis to find out which products are safe:

## Authorized additives in the EU 

In [31]:
#Authorized additive in the EU 
#Retrieved from https://www.food.gov.uk/business-guidance/approved-additives-and-e-numbers

additives = spark.read \
                 .option("inferSchema", "true") \
                 .option("delimiter", ";") \
                 .option("mode","PERMISSIVE") \
                 .option("header", "true") \
                 .csv("authorised_additive_eu.csv")
additives.show()

+---------+--------------------+
|E numbers|           Additives|
+---------+--------------------+
|     E100|            Curcumin|
|     E101|      (i) Riboflavin|
|     null|(ii) Riboflavin-5...|
|     E102|          Tartrazine|
|     E104|    Quinoline yellow|
|     E110|Sunset Yellow FCF...|
|     E120|Cochineal; Carmin...|
|     E122|Azorubine; Carmoi...|
|     E123|            Amaranth|
|     E124|Ponceau 4R; Cochi...|
|     E127|         Erythrosine|
|     E129|       Allura Red AC|
|     E131|       Patent Blue V|
|     E132|lndigotine; Indig...|
|     E133|  Brilliant Blue FCF|
|     E140|Chlorophylls and ...|
|     E141|Copper complexes ...|
|     E142|             Green S|
|    E150a|       Plain caramel|
|    E150b|Caustic sulphite ...|
+---------+--------------------+
only showing top 20 rows



In [32]:
aditive_df = prepared_df.select(explode(split(col("additives_en"), ",")).alias("additives_list"), "Product_name","Fat_100g","Saturated-fat_100g","Sugars_100g","Salt_100g","Sodium_100g").withColumn("E numbers", regexp_extract("additives_list", r"E\d+",0))
health_df = aditive_df.join(additives,"E numbers",how="left")

health_df.show(3)

+---------+----------------+------------------+--------+------------------+-----------+---------+-----------+------------+
|E numbers|  additives_list|      Product_name|Fat_100g|Saturated-fat_100g|Sugars_100g|Salt_100g|Sodium_100g|   Additives|
+---------+----------------+------------------+--------+------------------+-----------+---------+-----------+------------+
|     E322|E322 - Lecithins|Belgische Pralinen|    33.3|              21.1|       51.5|     0.09|      0.036|   Lecithins|
|     E420| E420 - Sorbitol|Belgische Pralinen|    33.3|              21.1|       51.5|     0.09|      0.036|(i) Sorbitol|
|     E422| E422 - Glycerol|Belgische Pralinen|    33.3|              21.1|       51.5|     0.09|      0.036|    Glycerol|
+---------+----------------+------------------+--------+------------------+-----------+---------+-----------+------------+
only showing top 3 rows



## Products with additives not authorized by the EU 

In [58]:
#Potential danger products?

non_registeredA = health_df.filter(col("additives").isNull()).select("product_name","additives_list")
non_registeredA.show(10,0)
print("total products with unkown additives ")
non_registeredA.count()

+--------------------------------+---------------------------------------------------------------------------------------+
|product_name                    |additives_list                                                                         |
+--------------------------------+---------------------------------------------------------------------------------------+
|Sour Fruit Gummies              |E428 - Gelatine                                                                        |
|Vita Cola pur                   |E150d - Sulphite ammonia caramel                                                       |
|Chocolate and caramel candy     |E1400 - Dextrin                                                                        |
|Zesty Italian                   |E160b - Annatto                                                                        |
|Bröd Mjukkaka                   |E472e - Mono- and diacetyltartaric acid esters of mono- and diglycerides of fatty acids|
|Sweet Baby Ray'

5299

##  Introducing a food trafic light with the health factor of each product 

In [34]:
#Defining and categorizing the food trafic light by contents of sugar, fat, salt and saturated fat 
#According to the British Nutrition Foundation: https://www.nutrition.org.uk/

health_dflight = health_df.withColumn('foodlighting', when(col("Additives").isNull(), "darkred")\
                                                     .when((col("Sugars_100g") > 22.5) | (col("Fat_100g") > 17.5) | (col("Salt_100g") > 1.5) | (col("Saturated-fat_100g") > 5), "red")\
                                                    .otherwise(when((col("Sugars_100g") > 5) | (col("Fat_100g") > 3) | (col("Salt_100g") > 0.3) | (col("Saturated-fat_100g") > 1.5), "orange").otherwise("green"))\
                                                    )\
                        .select("Product_name","foodlighting").distinct()

In [35]:
health_dflight.show(10,0)

+---------------------------------+------------+
|Product_name                     |foodlighting|
+---------------------------------+------------+
|Gaufrettes fines                 |red         |
|French Dressing                  |red         |
|Chorizo-salami - Dulano - 125 G  |red         |
|Double Concentré de Tomates      |orange      |
|Crunchy muesli - Chocolate & Nuts|red         |
|Pralinés Selection               |darkred     |
|Hähnchenbrustfilet, Klassik      |red         |
|Ce'Real                          |red         |
|Kabanossin mild                  |red         |
|Curry gewürz ketchup             |red         |
+---------------------------------+------------+
only showing top 10 rows



In [36]:
health_dflight.filter(col("foodlighting") == 'green').show(10,0)
health_dflight.filter(col("foodlighting") == 'orange').show(10,0)
health_dflight.filter(col("foodlighting") == 'red').show(10,0)

+--------------------------------------+------------+
|Product_name                          |foodlighting|
+--------------------------------------+------------+
|High protein chocolate pudding        |green       |
|Extra Professionell White             |green       |
|Guma rozpuszczalna o smakach owocowych|green       |
|Thymian Lutsch Pastillen              |green       |
|Apfel-Grape                           |green       |
|Heilwasser mit Kohlensäure            |green       |
|Rocka Milk                            |green       |
|Xucker Light                          |green       |
|Fein gehackte Tomaten in Tomatensaft  |green       |
|Boemboe Nasi Goreng                   |green       |
+--------------------------------------+------------+
only showing top 10 rows

+-----------------------------------+------------+
|Product_name                       |foodlighting|
+-----------------------------------+------------+
|Double Concentré de Tomates        |orange      |
|Schweppes Rus

## Ratio for each light given the dataset

In [37]:
healthy = health_dflight.filter(col("foodlighting") == 'green').count()
midhealthy = health_dflight.filter(col("foodlighting") == 'orange').count()
unhealthy = health_dflight.filter(col("foodlighting") == 'red').count()
unauthorized = health_dflight.filter(col("foodlighting") == 'darkred').count()
allP = health_dflight.count()
#Healthy food ratio
print("Healthy products: "+ str((healthy/allP)*100))
#Unhealthy product
print("Unhealthy products: "+ str((unhealthy/allP)*100))
#midhealthy product 
print("Midhealthy products: "+ str((midhealthy/allP)*100))
#forbidden additives 
print("Forbidden products: "+ str((unauthorized/allP)*100))

Healthy products: 10.17864099282097
Unhealthy products: 45.56736601925538
Midhealthy products: 23.050809727864653
Forbidden products: 21.203183260058992


## Extra analysis of the most famous German product: Wurst (sausage) :)

In [38]:
df = health_dflight.withColumn('sausages',when((col('Product_name').contains('Wurst')) | (col('Product_name').contains('wurst'))| (col('Product_name').contains('sausages')),True).otherwise(False))

In [39]:
df.filter(col('sausages') == True).select(col("*")).distinct().count()

284

In [60]:
df.filter(col('sausages') == True).select(col("*")).distinct().show(20,0)

+-----------------------------------------+------------+--------+
|Product_name                             |foodlighting|sausages|
+-----------------------------------------+------------+--------+
|Biewurst                                 |red         |true    |
|Gutsmettwurst                            |red         |true    |
|Oma Hildes Leberwurst Hausmacher Art     |red         |true    |
|Geflügel-Bratwurst                       |red         |true    |
|Original Pfälzer Bratwurst               |red         |true    |
|Kaiser-Jagdwurst                         |red         |true    |
|Rostbratwurst / Schwein/Grill            |red         |true    |
|Sahne-Leberwurst                         |red         |true    |
|Schinken-Zwiebelmettwurst                |red         |true    |
|Fleischwurst geräuchert                  |red         |true    |
|Leberwurst mit Kalbsleber und Kalbfleisch|red         |true    |
|Pommersche Leberwurst mit Kalbfleisch    |red         |true    |
|Bio Jagdw

Conclusion about the dataset... 