# Introduction

Ce document résume les jeu de données, méthodologie, et statistiques utilisées pour l'estimation de la souffrance contenue dans les boîtes d'oeufs.

Nous commençons par l'import de la base de données complète d'open food facts obtenue le 31 mars 2025.

De cette base de données, nous ne retenons que les colonnes (goodcol) nécessaires au calcul du poids de souffrance, telles que définies dans le code.




In [3]:
import sys
sys.path.append("backend")

import duckdb
import requests

import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_colwidth', 1000)


In [4]:
LOCAL_PARQUET = r"..\data\food.parquet"
SOURCE_PARQUET = 'https://huggingface.co/datasets/openfoodfacts/product-database/resolve/main/food.parquet'

download = input('Télécharger le dernier parquet (4 GO) ? o/n')

if download == 'o':

    with requests.get(SOURCE_PARQUET, stream=True) as r:
        r.raise_for_status()
        with open(LOCAL_PARQUET, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)



In [5]:


minicol=['code', # ID
 'categories_tags', # déjà présent
 'labels_tags', # déjà présent
 'product_name', # déjà présent
 'generic_name',
 'quantity',
 'product_quantity_unit',
 'product_quantity',
 'allergens_tags',
 'ingredients_tags',
 'ingredients',
 'countries_tags',
 'images',
 ]

duckdb.execute(f"CREATE OR REPLACE VIEW db_col AS SELECT {','.join(minicol)} FROM '{LOCAL_PARQUET}'")
duckdb.execute("SUMMARIZE db_col").df()

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
0,code,VARCHAR,,9999999999999,3431918,,,,,,3881437,0.0
1,categories_tags,VARCHAR[],[],[zh:黑芝麻醬],185549,,,,,,3881437,53.28
2,labels_tags,VARCHAR[],[],[zh:饼干],112383,,,,,,3881437,60.83
3,product_name,"STRUCT(lang VARCHAR, ""text"" VARCHAR)[]",[],"[{'lang': zu, 'text': kungu baltmaize}]",2687474,,,,,,3881437,0.0
4,generic_name,"STRUCT(lang VARCHAR, ""text"" VARCHAR)[]",[],"[{'lang': zh, 'text': 鹽水香蕉蕾}]",134609,,,,,,3881437,0.0
5,quantity,VARCHAR,,😐,64588,,,,,,3881437,59.34
6,product_quantity_unit,VARCHAR,%,mmol/l,4,,,,,,3881437,73.77
7,product_quantity,VARCHAR,0,999999999999999999,7534,,,,,,3881437,67.67
8,allergens_tags,VARCHAR[],[],"[zh:乳制品, zh:小麦, zh:豆制品]",13530,,,,,,3881437,0.72
9,ingredients_tags,VARCHAR[],[],"[zu:salt, zu:anti-caking-agent, zu:potassium-iodate]",836663,,,,,,3881437,70.84


# Identification des oeufs

## Approche additive

Le résumé de la base de données indique qu'il y a 3,78 millions de produits.

On note au passage que `quantity` et `product_quantity`, qui constituent la base de notre approche, nécessiteront un gros travail se nettoyage.

L'identification des boîtes d'oeufs semble assez pédestre, puisque selon l'approche taxonomique il suffirait de sélectionner les produits ayant `en:chicken-eggs` dans `categories_tags`.

Combien y en a-t-il ?


In [6]:
duckdb.execute("SELECT COUNT(*) FROM db_col WHERE 'en:chicken-eggs' IN categories_tags").df()


Unnamed: 0,count_star()
0,4608


## Approche soustractive

On suppose que la taxonomie est incomplète / imparfaite et on tente une approche par soustraction:
plutôt que de prendre les éléments d'oeufs de poule, on sélectionne les oeufs dont on enlève tout ce qui est identifié comme oeufs d'autre animaux, en supposant que le défaut est oeuf de poule.

Afin d'arbitrer entre les deux approches, on compare le nombre d'éléments de cette approche avec le nombre d'éléments précédents et échantillonne quelques éléments afin de voir si cela a du sens.


In [7]:
ltags=duckdb.execute("SELECT list_distinct(categories_tags) FROM db_col WHERE 'en:eggs' IN categories_tags").df()
set(x for xs in ltags.iloc[:, 0] for x in xs if x.startswith("en") and x.endswith("-eggs"))


{'en:barn-chicken-eggs',
 'en:boiled-eggs',
 'en:british-free-range-eggs',
 'en:brown-eggs',
 'en:cage-chicken-eggs',
 'en:caged-chicken-eggs',
 'en:century-eggs',
 'en:chicken-eggs',
 'en:chocolate-eggs',
 'en:duck-eggs',
 'en:easter-eggs',
 'en:farming-products-eggs',
 'en:filled-chocolate-eggs',
 'en:fish-and-meat-and-eggs',
 'en:fish-eggs',
 'en:free-range-chicken-eggs',
 'en:free-range-duck-eggs',
 'en:free-range-large-eggs',
 'en:free-range-organic-large-chicken-eggs',
 'en:fresh-chicken-eggs',
 'en:fresh-eggs',
 'en:frozen-chicken-eggs',
 'en:frozen-eggs',
 'en:grade-a-eggs',
 'en:grade-aa-eggs',
 'en:hard-cooked-peeled-eggs',
 'en:labeled-eggs',
 'en:large-chicken-eggs',
 'en:large-eggs',
 'en:large-free-run-chicken-eggs',
 'en:large-organic-chicken-eggs',
 'en:large-organic-free-range-chicken-eggs',
 'en:medium-organic-free-range-chicken-eggs',
 'en:organic-chicken-eggs',
 'en:organic-eggs',
 'en:organic-free-range-chicken-eggs',
 'en:organic-large-brown-chicken-eggs',
 'en:pi

Avec cette approche nous ne sommes pas parvenus à retrouver les éléments "ostrich eggs", "guineafowl eggs", etc.
Nous parvenons à cet ensemble d'éléments à exclure :

In [8]:
pas_poule={'en:chocolate-eggs',
 'en:duck-eggs',
 'en:easter-eggs',
 'en:fish-eggs',
 'en:free-range-duck-eggs',
 'en:quail-eggs',
 'en:raw-quail-eggs',
 'en:savoury-eggs',
 'en:scotch-eggs',
 'en:streamed-eggs',
'en:meals',
'en:snacks',        
'en:meats-and-their-products',
'en:breads'
          }


joint="' NOT IN categories_tags AND '".join(list(pas_poule))
request="CREATE OR REPLACE VIEW eggs AS SELECT * FROM db_col WHERE 'en:eggs' IN categories_tags AND '" +\
         joint+"' NOT IN categories_tags"
#print(request)

duckdb.execute(request)
eggs_from_parquet_duckdb=duckdb.execute("FROM eggs").df()

In [9]:
eggs_from_parquet_duckdb.sample(50, random_state=10)

Unnamed: 0,code,categories_tags,labels_tags,product_name,generic_name,quantity,product_quantity_unit,product_quantity,allergens_tags,ingredients_tags,ingredients,countries_tags,images
6068,4956455172358,"[en:farming-products, en:eggs, en:chicken-eggs]",[],[],[],10,,0.0,[],,,[en:hong-kong],"[{'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': 1653055967, 'uploader': 'openfoodfacts-contributors'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': 1653055972, 'uploader': 'openfoodfacts-contributors'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': 1653055957, 'uploader': 'openfoodfacts-contributors'}, {'key': 'front_en', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 100, 'w': 75}, '200': {'h': 200, 'w': 150}, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': None, 'uploader': None}, {'key': 'ingredients_en', 'imgid': 2, 'rev': 5, 'sizes': {'100': {'h': 100, 'w': 75}, '200': {'h': 200, 'w': 150}, '..."
5960,80569114,"[en:farming-products, en:eggs, en:chicken-eggs]",,"[{'lang': 'main', 'text': 'Uova'}, {'lang': 'it', 'text': 'Uova'}]",[],,,,[],,,[en:italy],"[{'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 50, 'w': 100}, '200': None, '400': {'h': 201, 'w': 400}, 'full': {'h': 343, 'w': 681}}, 'uploaded_t': 1649609565, 'uploader': 'kiliweb'}, {'key': 'front_it', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 100, 'w': 97}, '200': {'h': 200, 'w': 194}, '400': {'h': 400, 'w': 388}, 'full': {'h': 1084, 'w': 1051}}, 'uploaded_t': None, 'uploader': None}, {'key': 'nutrition_it', 'imgid': 2, 'rev': 5, 'sizes': {'100': {'h': 50, 'w': 100}, '200': {'h': 101, 'w': 200}, '400': {'h': 201, 'w': 400}, 'full': {'h': 343, 'w': 681}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 97}, '200': None, '400': {'h': 400, 'w': 388}, 'full': {'h': 1084, 'w': 1051}}, 'uploaded_t': 1649609564, 'uploader': 'kiliweb'}]"
7045,20926519,"[en:farming-products, en:eggs]",,"[{'lang': 'main', 'text': '6 Corn Fed Free Range Irish Eggs'}, {'lang': 'en', 'text': '6 Corn Fed Free Range Irish Eggs'}]",[],,,,[en:eggs],[en:egg],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg"",""text"":""Eggs"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:ireland],"[{'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 19, 'w': 100}, '200': None, '400': {'h': 74, 'w': 400}, 'full': {'h': 540, 'w': 2900}}, 'uploaded_t': 1705231694, 'uploader': 'jaame'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 75, 'w': 100}, '200': None, '400': {'h': 300, 'w': 400}, 'full': {'h': 3024, 'w': 4032}}, 'uploaded_t': 1695989738, 'uploader': 'smoothie-app'}, {'key': 'front_en', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 75, 'w': 100}, '200': {'h': 150, 'w': 200}, '400': {'h': 300, 'w': 400}, 'full': {'h': 3024, 'w': 4032}}, 'uploaded_t': None, 'uploader': None}, {'key': 'nutrition_en', 'imgid': 2, 'rev': 8, 'sizes': {'100': {'h': 19, 'w': 100}, '200': {'h': 37, 'w': 200}, '400': {'h': 74, 'w': 400}, 'full': {'h': 540, 'w': 2900}}, 'uploaded_t': None, 'uploader': None}]"
7640,9120125190026,"[en:farming-products, en:eggs, en:chicken-eggs]","[en:organic, en:eu-organic, en:bio-austria]","[{'lang': 'main', 'text': 'Grundnig's Bio Eier'}, {'lang': 'de', 'text': 'Grundnig's Bio Eier'}]",[],6pcs,,0.0,[],,,[en:germany],"[{'key': 'front_de', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 67, 'w': 100}, '200': {'h': 133, 'w': 200}, '400': {'h': 266, 'w': 400}, 'full': {'h': 532, 'w': 800}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 67, 'w': 100}, '200': None, '400': {'h': 266, 'w': 400}, 'full': {'h': 532, 'w': 800}}, 'uploaded_t': 1730562034, 'uploader': 'prepperapp'}]"
1831,3560071098278,"[en:farming-products, en:eggs, en:chicken-eggs, en:barn-chicken-eggs, en:free-range-chicken-eggs, en:fresh-eggs]","[en:fed-without-gmos, en:french-eggs, en:made-in-france, en:nutriscore, en:nutriscore-grade-a, fr:poules-nourries-sans-ogm, fr:oeufs-de-poules-elevees-au-sol]","[{'lang': 'main', 'text': 'Œufs de poule élevées au sol'}, {'lang': 'fr', 'text': 'Œufs de poule élevées au sol'}]","[{'lang': 'main', 'text': '6 œufs frais (Catégorie A) de poules élevées au sol.'}, {'lang': 'fr', 'text': '6 œufs frais (Catégorie A) de poules élevées au sol.'}]",6 œufs,,0.0,[en:eggs],"[en:fresh-egg, en:egg]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:fresh-egg"",""text"":""œufs frais"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":100.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":""en:france"",""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:france],"[{'key': '14', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 46, 'w': 100}, '200': None, '400': {'h': 186, 'w': 400}, 'full': {'h': 1024, 'w': 2208}}, 'uploaded_t': 1729915548, 'uploader': 'org-carrefour'}, {'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': 1549906106, 'uploader': 'openfoodfacts-contributors'}, {'key': '8', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 32, 'w': 100}, '200': None, '400': {'h': 129, 'w': 400}, 'full': {'h': 761, 'w': 2365}}, 'uploaded_t': 1616944085, 'uploader': 'org-carrefour'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 4032, 'w': 3024}}, 'uploaded_t': 1549906066, 'uploader': 'openfoodfacts-contributors'}, {'key': 'packaging_fr', 'imgid': 11, 'rev': 39, 'sizes': {'100': {'h': 72, 'w': 100}, '200': {'h': 144, 'w': 200}, '400': {'h': 2..."
4426,8008801506693,"[en:farming-products, en:eggs, en:chicken-eggs]",,"[{'lang': 'main', 'text': 'Uova'}, {'lang': 'it', 'text': 'Uova'}]",[],,,,[],,,[en:italy],"[{'key': 'nutrition_it', 'imgid': 3, 'rev': 9, 'sizes': {'100': {'h': 51, 'w': 100}, '200': {'h': 103, 'w': 200}, '400': {'h': 206, 'w': 400}, 'full': {'h': 1054, 'w': 2050}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 62, 'w': 100}, '200': None, '400': {'h': 247, 'w': 400}, 'full': {'h': 927, 'w': 1501}}, 'uploaded_t': 1615744938, 'uploader': 'kiliweb'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 51, 'w': 100}, '200': None, '400': {'h': 206, 'w': 400}, 'full': {'h': 1054, 'w': 2050}}, 'uploaded_t': 1618044855, 'uploader': 'kiliweb'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 82, 'w': 100}, '200': None, '400': {'h': 328, 'w': 400}, 'full': {'h': 1200, 'w': 1462}}, 'uploaded_t': 1615744938, 'uploader': 'kiliweb'}, {'key': 'front_it', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 82, 'w': 100}, '200': {'h': 164, 'w': 200}, '400': {'h': 328, 'w': 400}, 'full': {'h': 1200, 'w': 1462}},..."
838,3543692665708,"[en:farming-products, en:eggs, en:chicken-eggs]","[en:organic, en:eu-organic, en:fr-bio-01, en:green-dot, fr:ab-agriculture-biologique]","[{'lang': 'main', 'text': '6 oeufs biologiques frais'}, {'lang': 'fr', 'text': '6 oeufs biologiques frais'}]","[{'lang': 'main', 'text': '6 Oeufs Biologiques Frais'}, {'lang': 'fr', 'text': '6 Oeufs Biologiques Frais'}]",6 Oeufs,,0.0,[en:eggs],"[en:free-range-eggs, en:egg]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:free-range-eggs"",""text"":""Oeufs élevés en plein air"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:france],"[{'key': 'front', 'imgid': 1, 'rev': 8, 'sizes': {'100': {'h': 65, 'w': 100}, '200': {'h': 129, 'w': 200}, '400': {'h': 258, 'w': 400}, 'full': {'h': 1121, 'w': 1736}}, 'uploaded_t': None, 'uploader': None}, {'key': '5', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 31, 'w': 100}, '200': None, '400': {'h': 125, 'w': 400}, 'full': {'h': 941, 'w': 3000}}, 'uploaded_t': 1535639287, 'uploader': 'kiliweb'}, {'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 95, 'w': 100}, '200': None, '400': {'h': 379, 'w': 400}, 'full': {'h': 1200, 'w': 1265}}, 'uploaded_t': 1535639285, 'uploader': 'kiliweb'}, {'key': 'ingredients', 'imgid': 2, 'rev': 10, 'sizes': {'100': {'h': 49, 'w': 100}, '200': {'h': 97, 'w': 200}, '400': {'h': 195, 'w': 400}, 'full': {'h': 968, 'w': 1986}}, 'uploaded_t': None, 'uploader': None}, {'key': 'front_fr', 'imgid': 4, 'rev': 14, 'sizes': {'100': {'h': 95, 'w': 100}, '200': {'h': 190, 'w': 200}, '400': {'h': 379, 'w': 400}, 'full': {'h': 1200, 'w': 1265}..."
341,3251320080419,"[en:farming-products, en:eggs, en:chicken-eggs, en:free-range-chicken-eggs, fr:oeufs-fermiers-label-rouge]","[en:fed-without-gmos, en:no-gmos, en:green-dot, en:nutriscore, en:nutriscore-grade-a, en:pgi, fr:label-rouge, fr:fermier, fr:oeufs-de-poules-elevees-en-liberte, fr:oeufs-fermiers]","[{'lang': 'main', 'text': '4 oeufs fermiers label rouge de'}, {'lang': 'fr', 'text': '4 oeufs fermiers label rouge de'}]","[{'lang': 'main', 'text': '4 Œufs fermiers Label Rouge de Loué de poules élevées en liberté'}, {'lang': 'fr', 'text': '4 Œufs fermiers Label Rouge de Loué de poules élevées en liberté'}]",4x Gros Œufs,,0.0,[en:eggs],[en:egg],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg"",""text"":""Œufs"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:france],"[{'key': '8', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 80, 'w': 100}, '200': None, '400': {'h': 320, 'w': 400}, 'full': {'h': 1601, 'w': 2000}}, 'uploaded_t': 1558098324, 'uploader': 'fermiers-de-loue'}, {'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 67, 'w': 100}, '200': None, '400': {'h': 267, 'w': 400}, 'full': {'h': 683, 'w': 1024}}, 'uploaded_t': 1538735635, 'uploader': 'beniben'}, {'key': '21', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 89, 'w': 100}, '200': None, '400': {'h': 357, 'w': 400}, 'full': {'h': 2553, 'w': 2858}}, 'uploaded_t': 1653217383, 'uploader': 'didierg'}, {'key': '18', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 100}, '200': None, '400': {'h': 400, 'w': 400}, 'full': {'h': 3543, 'w': 3543}}, 'uploaded_t': 1610120248, 'uploader': 'org-loue'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 9, 'w': 100}, '200': None, '400': {'h': 35, 'w': 400}, 'full': {'h': 51, 'w': 579}}, 'uploaded_t': 1368..."
318,3250391411528,"[en:farming-products, en:eggs, en:chicken-eggs, en:cage-chicken-eggs]",[],"[{'lang': 'main', 'text': 'Oeuf frais 30'}, {'lang': 'fr', 'text': 'Oeuf frais 30'}]",[],30 pcs,,0.0,[],[fr:30-oeufs-frais],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":0,""percent_estimate"":100.0,""vegan"":null,""id"":""fr:30-oeufs-frais"",""text"":""30 Œufs frais"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:france],"[{'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 3264, 'w': 2448}}, 'uploaded_t': 1517937238, 'uploader': 'kiliweb'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 34, 'w': 100}, '200': None, '400': {'h': 134, 'w': 400}, 'full': {'h': 687, 'w': 2050}}, 'uploaded_t': 1591782457, 'uploader': 'kiliweb'}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 88, 'w': 100}, '200': None, '400': {'h': 351, 'w': 400}, 'full': {'h': 2608, 'w': 2976}}, 'uploaded_t': 1537289797, 'uploader': 'openfoodfacts-contributors'}, {'key': 'front_fr', 'imgid': 2, 'rev': 8, 'sizes': {'100': {'h': 88, 'w': 100}, '200': {'h': 175, 'w': 200}, '400': {'h': 351, 'w': 400}, 'full': {'h': 2608, 'w': 2976}}, 'uploaded_t': None, 'uploader': None}, {'key': 'ingredients_fr', 'imgid': 3, 'rev': 16, 'sizes': {'100': {'h': 34, 'w': 100}, '200': {'h': 67, 'w': 200}, '400': {'h': 134, 'w': 400}, 'full': {'h..."
1045,3760200840000,"[en:farming-products, en:eggs, en:chicken-eggs, en:free-range-chicken-eggs, en:fresh-eggs, fr:oeufs-labellises-biologiques]","[en:organic, en:eu-organic, en:fr-bio-01, fr:ab-agriculture-biologique]","[{'lang': 'main', 'text': '6 oeufs Bio'}, {'lang': 'fr', 'text': '6 oeufs Bio'}]",[],6 unités,,0.0,[],,,[en:france],"[{'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 1333, 'w': 1000}}, 'uploaded_t': 1533309735, 'uploader': 'date-limite-app'}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 69, 'w': 100}, '200': None, '400': {'h': 275, 'w': 400}, 'full': {'h': 1200, 'w': 1743}}, 'uploaded_t': 1535630015, 'uploader': 'kiliweb'}, {'key': 'front_fr', 'imgid': 2, 'rev': 8, 'sizes': {'100': {'h': 69, 'w': 100}, '200': {'h': 138, 'w': 200}, '400': {'h': 275, 'w': 400}, 'full': {'h': 1200, 'w': 1743}}, 'uploaded_t': None, 'uploader': None}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 28, 'w': 100}, '200': None, '400': {'h': 113, 'w': 400}, 'full': {'h': 652, 'w': 2306}}, 'uploaded_t': 1535630018, 'uploader': 'kiliweb'}, {'key': 'ingredients_fr', 'imgid': 3, 'rev': 11, 'sizes': {'100': {'h': 28, 'w': 100}, '200': {'h': 57, 'w': 200}, '400': {'h': 113, 'w': 400}, 'full': {'h': 652, 'w'..."


Conversion de la synthaxe duckdb en json

In [10]:
import json
import numpy as np

cols_to_json = []

for col in eggs_from_parquet_duckdb.columns:
    sample = eggs_from_parquet_duckdb[col].dropna().head(20)
    if sample.apply(lambda x: isinstance(x, (list, dict, np.ndarray))).any():
        cols_to_json.append(col)

cols_to_json

cols_to_json_for_import = cols_to_json + ['ingredients']

In [11]:
eggs_from_parquet = eggs_from_parquet_duckdb.copy()

def ndarray_to_json(arr):
    if isinstance(arr, (list, dict)):
        return json.dumps(arr)
    elif isinstance(arr, np.ndarray):
        return json.dumps(arr.tolist())
    else:
        return arr  # valeur non traitée

for col in cols_to_json_for_import:
    eggs_from_parquet[col] = eggs_from_parquet_duckdb[col].apply(ndarray_to_json)

In [12]:
with open("../data/cols_to_json.txt", "w") as f:
    json.dump(cols_to_json_for_import, f)

eggs_from_parquet.to_csv("../data/eggs_from_parquet.csv", index=False)
eggs_from_parquet

Unnamed: 0,code,categories_tags,labels_tags,product_name,generic_name,quantity,product_quantity_unit,product_quantity,allergens_tags,ingredients_tags,ingredients,countries_tags,images
0,00003100,"[""en:farming-products"", ""en:eggs""]",[],"[{""lang"": ""main"", ""text"": ""Hard Boiled Eggs""}, {""lang"": ""fr"", ""text"": ""Hard Boiled Eggs""}]",[],2,,0.0,"[""en:eggs""]","[""fr:eggs"", ""en:e330"", ""fr:sodium-benzoate"", ""fr:nisin-preparation""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":0,""percent_estimate"":100.0,""vegan"":null,""id"":""fr:eggs"",""text"":""Eggs"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":[{""percent_max"":100.0,""percent_min"":25.0,""is_in_taxonomy"":0,""percent_estimate"":62.5,""vegan"":null,""id"":""fr:eggs"",""text"":""Eggs"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null},{""percent_max"":50.0,""percent_min"":0.0,""is_in_taxonomy"":1,""percent_estimate"":18.75,""vegan"":""yes"",""id"":""en:e330"",""text"":""Citric Acid"",""vegetarian"":""yes"",""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}...","[""en:france""]","[{""key"": ""front"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 2666, ""w"": 2000}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""nutrition_fr"", ""imgid"": 3, ""rev"": 18, ""sizes"": {""100"": {""h"": 100, ""w"": 85}, ""200"": {""h"": 200, ""w"": 170}, ""400"": {""h"": 400, ""w"": 340}, ""full"": {""h"": 785, ""w"": 668}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 2666, ""w"": 2000}}, ""uploaded_t"": 1415119256, ""uploader"": ""openfoodfacts-contributors""}, {""key"": ""ingredients_en"", ""imgid"": 3, ""rev"": 22, ""sizes"": {""100"": {""h"": 16, ""w"": 100}, ""200"": {""h"": 31, ""w"": 200}, ""400"": {""h"": 63, ""w"": 400}, ""full"": {""h"": 98, ""w"": 624}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 70}, ""200"": null, ""400"": {""h"": 400, ""w"": 278}, ""full"": {""h"": 1002,..."
1,0011110797698,"[""en:farming-products"", ""en:eggs"", ""en:undefined""]",,"[{""lang"": ""main"", ""text"": ""Natural Grade Aa Large Brown Eggs""}, {""lang"": ""en"", ""text"": ""Natural Grade Aa Large Brown Eggs""}]",[],50 g,g,50.0,[],"[""en:large-brown-eggs""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":0,""percent_estimate"":100.0,""vegan"":null,""id"":""en:large-brown-eggs"",""text"":""LARGE BROWN EGGS"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]","[{""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 45}, ""200"": null, ""400"": {""h"": 400, ""w"": 178}, ""full"": {""h"": 1200, ""w"": 534}}, ""uploaded_t"": 1626629587, ""uploader"": ""kiliweb""}, {""key"": ""5"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4000, ""w"": 3000}}, ""uploaded_t"": 1730566573, ""uploader"": ""mcnerneyd""}, {""key"": ""3"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 83}, ""200"": null, ""400"": {""h"": 400, ""w"": 333}, ""full"": {""h"": 600, ""w"": 500}}, ""uploaded_t"": 1649794716, ""uploader"": ""foodvisor""}, {""key"": ""4"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4000, ""w"": 3000}}, ""uploaded_t"": 1730566511, ""uploader"": ""mcnerneyd""}, {""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 43}, ""200"": null, ""400"": {""h"": 400, ""w"": 172}, ""full"": {""h"": 1200, ""w"": 515}}, ""uploaded_t"": 162662958..."
2,0011110806543,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""100% Egg Whites""}, {""lang"": ""en"", ""text"": ""100% Egg Whites""}]",[],,,,"[""en:eggs""]","[""en:egg-white"", ""en:egg""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg-white"",""text"":""egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-organic-code0"",""processing"":null,""labels"":""en:organic"",""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]",[]
3,0011110828897,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Kroger, break-free, real egg product""}, {""lang"": ""en"", ""text"": ""Kroger, break-free, real egg product""}]",[],,,,"[""en:eggs""]","[""en:egg-white"", ""en:egg"", ""en:contains-1-and-less-of-the-following"", ""en:e415"", ""en:salt"", ""en:onion"", ""en:vegetable"", ""en:root-vegetable"", ""en:onion-family-vegetable"", ""en:natural-flavouring"", ""en:flavouring"", ""en:colour"", ""en:vitamins"", ""en:minerals"", ""en:iron"", ""en:d-alpha-tocopheryl-acetate"", ""en:vitamin-e"", ""en:zinc-sulfate"", ""en:zinc"", ""en:calcium-pantothenate"", ""en:pantothenic-acid"", ""en:vitamin-b12"", ""en:e101"", ""en:thiamin-mononitrate"", ""en:thiamin"", ""en:pyridoxine-hydrochloride"", ""en:vitamin-b6"", ""en:folic-acid"", ""en:folate"", ""en:biotin"", ""en:cholecalciferol"", ""en:vitamin-d"", ""en:e412"", ""en:includes-beta-carotene"", ""en:e516"", ""en:ferric-orthophosphate""]","[{""percent_max"":99.0,""percent_min"":99.0,""is_in_taxonomy"":1,""percent_estimate"":99.0,""vegan"":""no"",""id"":""en:egg-white"",""text"":""Egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":99.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null},{""percent_max"":1.0,""percent_min"":0.0,""is_in_taxonomy"":0,""percent_estimate"":0.5,""vegan"":null,""id"":""en:contains-1-and-less-of-the-following"",""text"":""contains 1% and less of the following"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":[{""percent_max"":1.0,""percent_min"":0.0,""is_in_taxonomy"":1,""percent_estimate"":0.5,""vegan"":""yes"",""id"":""en:e412"",""text"":""guar gum"",""vegetarian"":""yes"",""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_...","[""en:united-states""]",[]
4,0011110846037,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""100% Liquid Egg Whites""}, {""lang"": ""en"", ""text"": ""100% Liquid Egg Whites""}]",[],,,,"[""en:eggs""]","[""en:liquid-egg-white"", ""en:egg"", ""en:egg-white""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:liquid-egg-white"",""text"":""liquid egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":100.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]",[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7911,8901662042198,"[""en:farming-products"", ""en:eggs"", ""en:egg-powder"", ""en:egg-curry-masala""]",,"[{""lang"": ""main"", ""text"": ""Egg Curry Masala""}, {""lang"": ""en"", ""text"": ""Egg Curry Masala""}]",[],,,,[],,,"[""en:india""]","[{""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": null, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 3120, ""w"": 4160}}, ""uploaded_t"": 1750132645, ""uploader"": ""yashpowar4444""}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4160, ""w"": 3120}}, ""uploaded_t"": 1750132507, ""uploader"": ""yashpowar4444""}, {""key"": ""3"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4160, ""w"": 3120}}, ""uploaded_t"": 1750132741, ""uploader"": ""yashpowar4444""}, {""key"": ""ingredients_en"", ""imgid"": 2, ""rev"": 5, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": {""h"": 150, ""w"": 200}, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 3120, ""w"": 4160}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""front_en"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h..."
7912,4620060200279,"[""en:farming-products"", ""en:eggs"", ""en:chicken-eggs""]",[],"[{""lang"": ""main"", ""text"": ""\u042f\u0439\u0446\u043e \u043a\u0443\u0440\u0438\u043d\u043e\u0435 \u043f\u0438\u0449\u0435\u0432\u043e\u0435 \u0441\u0442\u043e\u043b\u043e\u0432\u043e\u0435 C1""}, {""lang"": ""ru"", ""text"": ""\u042f\u0439\u0446\u043e \u043a\u0443\u0440\u0438\u043d\u043e\u0435 \u043f\u0438\u0449\u0435\u0432\u043e\u0435 \u0441\u0442\u043e\u043b\u043e\u0432\u043e\u0435 C1""}]",[],10 шт,,0.0,[],,,"[""en:russia""]","[{""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4624, ""w"": 3468}}, ""uploaded_t"": 1750249152, ""uploader"": ""food-facts-hound""}, {""key"": ""front_ru"", ""imgid"": 1, ""rev"": 4, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": {""h"": 150, ""w"": 200}, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 3468, ""w"": 4624}}, ""uploaded_t"": null, ""uploader"": null}]"
7913,34509712,"[""en:farming-products"", ""en:eggs"", ""en:chicken-eggs"", ""en:barn-chicken-eggs""]",,[],[],,,,[],,,"[""en:france""]",[]
7914,8995656320063,"[""en:farming-products"", ""en:eggs"", ""en:hard-boiled-egg"", ""en:egg-white-cooked""]",[],"[{""lang"": ""main"", ""text"": ""White Eggs""}, {""lang"": ""en"", ""text"": ""White Eggs""}]",[],,,,[],,,"[""en:india""]","[{""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": null, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 3072, ""w"": 4096}}, ""uploaded_t"": 1750353516, ""uploader"": ""smoothie-app""}, {""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4096, ""w"": 3072}}, ""uploaded_t"": 1750353549, ""uploader"": ""smoothie-app""}, {""key"": ""4"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4096, ""w"": 3072}}, ""uploaded_t"": 1750353615, ""uploader"": ""smoothie-app""}, {""key"": ""3"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4096, ""w"": 3072}}, ""uploaded_t"": 1750353594, ""uploader"": ""smoothie-app""}, {""key"": ""packaging_en"", ""imgid"": 4, ""rev"": 9, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 409..."
