In [10]:
import pandas as pd
import json
pd.set_option('display.max_colwidth', None)
pd.set_option('display.min_rows', 25)
pd.set_option('display.max_columns', None)

In [11]:
FILENAME = '../../datasets/products_0.995_cleaned.csv'
df = pd.read_csv(FILENAME)

  df = pd.read_csv(FILENAME)


# Preprocessing

In [12]:
def to_json(df: pd.DataFrame, filename): df.to_json(filename, indent=4, orient='records')

## Adding additives danger

In [13]:
with open('additives_danger.json', 'r') as json_file:
    ADDITIVES_DANGER = { additive['code']: 4 - additive['danger'] for additive in json.load(json_file) }


NOT_FOUND = []
def get_codes_by_level(level: int) -> list[str]:
    return [k for k,v in ADDITIVES_DANGER if v == level]

def get_danger(additives):
    if type(additives) != str: return 0, 0, 0
    dangeres = []
    n = 0
    for additive in additives.split(','):
        code = additive.split('-')[0].strip()
        if code in ADDITIVES_DANGER:
            dangeres.append(ADDITIVES_DANGER[code])
            n += 1
    if len(dangeres) == 0: return 0, 0, 0
    return min(dangeres), sum(dangeres) / n, max(dangeres)


df[['additives_min_danger', 'additives_average_danger', 'additives_max_danger']] = df.apply(lambda r: get_danger(r['additives']), axis=1, result_type="expand")

In [14]:
def get_additives_count_hazard(additives, level):
    if type(additives) != str: return 0
    hazard = 0
    for additive in additives.split(','):
        code = additive.split('-')[0].strip()
        if ADDITIVES_DANGER.get(code, -1) == level: hazard += 1
    return hazard

df['additives_0_count'] = df.apply(lambda r: get_additives_count_hazard(r['additives'], 0), axis=1)
df['additives_1_count'] = df.apply(lambda r: get_additives_count_hazard(r['additives'], 1), axis=1)
df['additives_2_count'] = df.apply(lambda r: get_additives_count_hazard(r['additives'], 2), axis=1)
df['additives_3_count'] = df.apply(lambda r: get_additives_count_hazard(r['additives'], 3), axis=1)

In [21]:
df['has_additives'] = df['additives'].notna()
df['has_additives_3'] = df['additives_3_count'] != 0
df['additives_count'] = df['additives'].apply(lambda x: len(x.split(',')) if type(x) == str else 0)
df['ingredients_count'] = df['ingredients_tags'].apply(lambda x: len(x.split(',')) if type(x) == str else 0)

# Analysis

## Introduction
Additives are substances that are added to food products during processing to maintain or improve certain properties such as appearance, freshness, taste or texture.

The development of food consumption habits has required a transformation of the food industry which must produce in greater quantities.

Thus, additives are sometimes necessary to ensure that processed foods remain safe and wholesome throughout their journey from factories through transportation to warehouses and stores to consumers.

Nevertheless, many products contain non-essential additives that do not serve a necessary need but are present only to embellish the product, to make it more attractive either its taste or its appearance (artificial sweeteners, food colorings, and flavor enhancers). Other products contain preservatives to increase the shelf life of the product.

[1] https://food.ec.europa.eu/safety/food-improvement-agents/additives_en 

[2] https://www.who.int/news-room/fact-sheets/detail/food-additives

[3] https://www.fda.gov/food/food-ingredients-packaging/overview-food-ingredients-additives-colors


In [16]:
bubble_chart_additives = df['additives'].str.split(pat=',').explode(ignore_index=True).value_counts().to_frame().reset_index().head(50)
bubble_chart_additives['danger'] = bubble_chart_additives.apply(lambda r: ADDITIVES_DANGER.get(r['index'].split('-')[0].strip(), 0), axis=1)
bubble_chart_additives.columns = ['additive', 'count', 'dangerosity']
to_json(bubble_chart_additives, 'graph/bubble_chart_additives.json')

## Dangeroussness

Nevertheless, many additives present health risks and it is important to be aware of the products we consume. Organizations such as the World Health Organization (WHO), the Food Drug Administration (FDA) in the United States or the European Food Safety Authority (EFSA) evaluate and regulate the use of additives and put in place restrictions on quantities. 

Nevertheless, some of these substances still present risks and many scientific studies show a correlation between high consumption of certain additives and adverse health effects such as increased risk of cardiovascular disease and cancer.

For our project, we used data from the company Yuka, which reviewed numerous scientific studies [4] on the effects of additive consumption in order to assign a score to each of them according to their dangerousness :
- 0 : No risk
- 1 : Limited risk
- 2 : Moderate risk
- 3 : Hazardous

[4] https://help.yuka.io/l/fr/article/bf5vi9gytc

In [24]:
print(df['has_additives'].mean())
print(df['has_additives_3'].mean())
print(df[df['has_additives']]['additives_count'].mean())

0.45714986309207567
0.1955126895255221
3.0057214375111747


## Food categories

The presence and the danger of additives depend strongly on the categories of products. Indeed, we find much more additives in cold cuts, sweetened drinks and ready-made meals than in vegetables, pasta and vegetable milks for example.

The following graph shows these two variables, the presence (radius) and the dangerousness (color) by product category. We can clearly observe that the delicatessen and the sodas represent the most dangerous products.

In [27]:
df['categories_splitted'] = df['categories'].str.split(',')

bubble_chart_categories = df.explode(['categories_splitted']).groupby('categories_splitted')['additives_average_danger'].agg(['mean', 'count']).query('count > 25').reset_index()
bubble_chart_categories.columns = ['categories', 'dangerosity', 'count']
bubble_chart_categories['group'] = 1
to_json(bubble_chart_categories, 'graph/bubble_chart_categories.json')

## NOVA Index

The NOVA index is used to indicate how processed a product is. Product processing is a broad term that can mean mixing different products to create a new one as well as several processing steps such as cooking, freezing, drying, fermenting and, of course, adding additives. 
- 1 : Unprocessed or minimally processed foods
- 2 : Processed culinary ingredients
- 3 : Processed foods
- 4 : Ultra-processed food and drink products

[5] https://world.openfoodfacts.org/nova

In [28]:
df.groupby('nova_group')[['additives_0_count', 'additives_1_count', 'additives_2_count', 'additives_3_count']] \
    .mean() \
    .reset_index() \
    .to_json('graph/nova_stacked_bar_chart.json', indent=4, orient='records')

> Explain graph

## Vegan and Vegetarian