# Introduction

Ce document résume les jeu de données, méthodologie, et statistiques utilisées pour l'estimation de la souffrance contenue dans les boîtes d'oeufs.

Nous commençons par l'import de la base de données complète d'open food facts obtenue le 31 mars 2025.

De cette base de données, nous ne retenons que les colonnes (goodcol) nécessaires au calcul du poids de souffrance, telles que définies dans le code.




In [1]:
import sys
import duckdb
sys.path.append("backend")

import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_colwidth', 1000)


In [2]:


minicol=['code', # ID
 'categories_tags', # déjà présent
 'labels_tags', # déjà présent
 'product_name', # déjà présent
 'generic_name',
 'quantity',
 'product_quantity_unit',
 'product_quantity',
 'allergens_tags',
 'ingredients_tags',
 'ingredients',
 'countries_tags',
 'images',
 ]

duckdb.execute(f"CREATE OR REPLACE VIEW db_col AS SELECT {','.join(minicol)} FROM 'C:/Users/DELL/Desktop/Data4Good local/food.parquet'")
duckdb.execute("SUMMARIZE db_col").df()

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
0,code,VARCHAR,,9999999999999,3493842,,,,,,3864051,0.0
1,categories_tags,VARCHAR[],[],[zh:黑芝麻醬],188438,,,,,,3864051,53.46
2,labels_tags,VARCHAR[],[],[zh:饼干],106922,,,,,,3864051,60.97
3,product_name,"STRUCT(lang VARCHAR, ""text"" VARCHAR)[]",[],"[{'lang': zh, 'text': 風味發酵乳}]",2517162,,,,,,3864051,0.0
4,generic_name,"STRUCT(lang VARCHAR, ""text"" VARCHAR)[]",[],"[{'lang': zh, 'text': 鹽水香蕉蕾}]",122360,,,,,,3864051,0.0
5,quantity,VARCHAR,,😐,64588,,,,,,3864051,59.46
6,product_quantity_unit,VARCHAR,%,mmol/l,4,,,,,,3864051,73.96
7,product_quantity,VARCHAR,0,999999999999999999,7534,,,,,,3864051,67.81
8,allergens_tags,VARCHAR[],[],"[zh:乳制品, zh:小麦, zh:豆制品]",13530,,,,,,3864051,0.69
9,ingredients_tags,VARCHAR[],[],"[zu:salt, zu:anti-caking-agent, zu:potassium-iodate]",882416,,,,,,3864051,71.01


# Identification des oeufs

## Approche additive

Le résumé de la base de données indique qu'il y a 3,78 millions de produits.

On note au passage que `quantity` et `product_quantity`, qui constituent la base de notre approche, nécessiteront un gros travail se nettoyage.

L'identification des boîtes d'oeufs semble assez pédestre, puisque selon l'approche taxonomique il suffirait de sélectionner les produits ayant `en:chicken-eggs` dans `categories_tags`.

Combien y en a-t-il ?


In [3]:
duckdb.execute("SELECT COUNT(*) FROM db_col WHERE 'en:chicken-eggs' IN categories_tags").df()


Unnamed: 0,count_star()
0,4321


## Approche soustractive

On suppose que la taxonomie est incomplète / imparfaite et on tente une approche par soustraction:
plutôt que de prendre les éléments d'oeufs de poule, on sélectionne les oeufs dont on enlève tout ce qui est identifié comme oeufs d'autre animaux, en supposant que le défaut est oeuf de poule.

Afin d'arbitrer entre les deux approches, on compare le nombre d'éléments de cette approche avec le nombre d'éléments précédents et échantillonne quelques éléments afin de voir si cela a du sens.


In [4]:
ltags=duckdb.execute("SELECT list_distinct(categories_tags) FROM db_col WHERE 'en:eggs' IN categories_tags").df()
set(x for xs in ltags.iloc[:, 0] for x in xs if x.startswith("en") and x.endswith("-eggs"))


{'en:barn-chicken-eggs',
 'en:boiled-eggs',
 'en:british-free-range-eggs',
 'en:brown-eggs',
 'en:cage-chicken-eggs',
 'en:caged-chicken-eggs',
 'en:century-eggs',
 'en:chicken-eggs',
 'en:chocolate-eggs',
 'en:duck-eggs',
 'en:easter-eggs',
 'en:farming-products-eggs',
 'en:filled-chocolate-eggs',
 'en:fish-and-meat-and-eggs',
 'en:fish-eggs',
 'en:free-range-chicken-eggs',
 'en:free-range-duck-eggs',
 'en:free-range-large-eggs',
 'en:free-range-organic-large-chicken-eggs',
 'en:fresh-chicken-eggs',
 'en:fresh-eggs',
 'en:frozen-chicken-eggs',
 'en:frozen-eggs',
 'en:grade-a-eggs',
 'en:grade-aa-eggs',
 'en:hard-cooked-peeled-eggs',
 'en:labeled-eggs',
 'en:large-chicken-eggs',
 'en:large-eggs',
 'en:large-free-run-chicken-eggs',
 'en:large-organic-chicken-eggs',
 'en:large-organic-free-range-chicken-eggs',
 'en:medium-organic-free-range-chicken-eggs',
 'en:organic-chicken-eggs',
 'en:organic-eggs',
 'en:organic-free-range-chicken-eggs',
 'en:organic-large-brown-chicken-eggs',
 'en:pi

Avec cette approche nous ne sommes pas parvenus à retrouver les éléments "ostrich eggs", "guineafowl eggs", etc.
Nous parvenons à cet ensemble d'éléments à exclure :

In [5]:
pas_poule={'en:chocolate-eggs',
 'en:duck-eggs',
 'en:easter-eggs',
 'en:fish-eggs',
 'en:free-range-duck-eggs',
 'en:quail-eggs',
 'en:raw-quail-eggs',
 'en:savoury-eggs',
 'en:scotch-eggs',
 'en:streamed-eggs',
'en:meals',
'en:snacks',        
'en:meats-and-their-products',
'en:breads'
          }


joint="' NOT IN categories_tags AND '".join(list(pas_poule))
request="CREATE OR REPLACE VIEW eggs AS SELECT * FROM db_col WHERE 'en:eggs' IN categories_tags AND '" +\
         joint+"' NOT IN categories_tags"
#print(request)

duckdb.execute(request)
eggs_from_parquet_duckdb=duckdb.execute("FROM eggs").df()

In [6]:
eggs_from_parquet_duckdb.sample(50, random_state=10)

Unnamed: 0,code,categories_tags,labels_tags,product_name,generic_name,quantity,product_quantity_unit,product_quantity,allergens_tags,ingredients_tags,ingredients,countries_tags,images
1724,3760165984078,"[en:farming-products, en:eggs, en:chicken-eggs, en:free-range-chicken-eggs]","[en:organic, en:eu-organic, en:fr-bio-10, fr:ab-agriculture-biologique]","[{'lang': 'main', 'text': '6 oeufs bio plein air'}, {'lang': 'fr', 'text': '6 oeufs bio plein air'}]",[],,,,[],,,[en:france],"[{'key': 'front_fr', 'imgid': 1, 'rev': 4, 'sizes': {'100': {'h': 56, 'w': 100}, '200': {'h': 112, 'w': 200}, '400': {'h': 225, 'w': 400}, 'full': {'h': 1152, 'w': 2050}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 56, 'w': 100}, '200': None, '400': {'h': 225, 'w': 400}, 'full': {'h': 1153, 'w': 2050}}, 'uploaded_t': 1546185531, 'uploader': 'kiliweb'}, {'key': 'ingredients_fr', 'imgid': 2, 'rev': 7, 'sizes': {'100': {'h': 56, 'w': 100}, '200': {'h': 112, 'w': 200}, '400': {'h': 225, 'w': 400}, 'full': {'h': 1153, 'w': 2050}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 56, 'w': 100}, '200': None, '400': {'h': 225, 'w': 400}, 'full': {'h': 1152, 'w': 2050}}, 'uploaded_t': 1546185529, 'uploader': 'kiliweb'}]"
4969,8480010200683,"[en:farming-products, en:eggs]","[en:organic, en:eu-organic, en:es-eco-025-na, en:green-dot, en:nutriscore]","[{'lang': 'main', 'text': 'Huevos'}, {'lang': 'es', 'text': 'Huevos'}]",[],,,,[],,,[en:spain],"[{'key': 'nutrition_es', 'imgid': 3, 'rev': 9, 'sizes': {'100': {'h': 78, 'w': 100}, '200': {'h': 156, 'w': 200}, '400': {'h': 313, 'w': 400}, 'full': {'h': 1200, 'w': 1534}}, 'uploaded_t': None, 'uploader': None}, {'key': 'front_es', 'imgid': 1, 'rev': 15, 'sizes': {'100': {'h': 75, 'w': 100}, '200': {'h': 150, 'w': 200}, '400': {'h': 300, 'w': 400}, 'full': {'h': 901, 'w': 1200}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 85, 'w': 100}, '200': None, '400': {'h': 341, 'w': 400}, 'full': {'h': 1200, 'w': 1407}}, 'uploaded_t': 1627922756, 'uploader': 'kiliweb'}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 75}, '200': None, '400': {'h': 400, 'w': 300}, 'full': {'h': 1200, 'w': 901}}, 'uploaded_t': 1627922755, 'uploader': 'kiliweb'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 78, 'w': 100}, '200': None, '400': {'h': 313, 'w': 400}, 'full': {'h': 1200, 'w': 1534}}, 'uploaded_..."
5302,4056489000792,"[en:farming-products, en:eggs, en:chicken-eggs, en:free-range-chicken-eggs]","[en:british-lion-quality, en:rspca-assured]","[{'lang': 'main', 'text': 'Free Range Very Large Eggs'}, {'lang': 'en', 'text': 'Free Range Very Large Eggs'}]",[],6 Eggs,,0.0,[en:eggs],"[en:free-range-eggs, en:egg]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:free-range-eggs"",""text"":""Free range eggs"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:united-kingdom],"[{'key': 'ingredients_en', 'imgid': 1, 'rev': 5, 'sizes': {'100': {'h': 75, 'w': 100}, '200': {'h': 150, 'w': 200}, '400': {'h': 300, 'w': 400}, 'full': {'h': 3072, 'w': 4096}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 76, 'w': 100}, '200': None, '400': {'h': 302, 'w': 400}, 'full': {'h': 1889, 'w': 2501}}, 'uploaded_t': 1740322635, 'uploader': 'nah'}, {'key': 'nutrition_en', 'imgid': 2, 'rev': 16, 'sizes': {'100': {'h': 90, 'w': 100}, '200': {'h': 180, 'w': 200}, '400': {'h': 360, 'w': 400}, 'full': {'h': 1490, 'w': 1656}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 75, 'w': 100}, '200': None, '400': {'h': 300, 'w': 400}, 'full': {'h': 3072, 'w': 4096}}, 'uploaded_t': 1638527515, 'uploader': 'waistline-app'}, {'key': 'front_en', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 75, 'w': 100}, '200': {'h': 150, 'w': 200}, '400': {'h': 300, 'w': 400}, 'full': {'h': 30..."
4489,8029885009108,"[en:farming-products, en:eggs, en:chicken-eggs]",[],"[{'lang': 'main', 'text': 'Uova sode colorate'}, {'lang': 'it', 'text': 'Uova sode colorate'}]",[],,,,[],,,[en:italy],"[{'key': 'nutrition_it', 'imgid': 2, 'rev': 5, 'sizes': {'100': {'h': 65, 'w': 100}, '200': {'h': 131, 'w': 200}, '400': {'h': 261, 'w': 400}, 'full': {'h': 1200, 'w': 1836}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 90, 'w': 100}, '200': None, '400': {'h': 361, 'w': 400}, 'full': {'h': 1200, 'w': 1331}}, 'uploaded_t': 1616951439, 'uploader': 'kiliweb'}, {'key': 'front_it', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 90, 'w': 100}, '200': {'h': 180, 'w': 200}, '400': {'h': 361, 'w': 400}, 'full': {'h': 1200, 'w': 1331}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 65, 'w': 100}, '200': None, '400': {'h': 261, 'w': 400}, 'full': {'h': 1200, 'w': 1836}}, 'uploaded_t': 1616951440, 'uploader': 'kiliweb'}]"
1831,8856294057280,"[en:farming-products, en:eggs]",[],"[{'lang': 'main', 'text': 'S Pure Hen Egg'}, {'lang': 'en', 'text': 'S Pure Hen Egg'}, {'lang': 'th', 'text': 'เอสเพียว ไข่ไก่'}]","[{'lang': 'main', 'text': 'Hen Egg'}, {'lang': 'en', 'text': 'Hen Egg'}, {'lang': 'th', 'text': 'ไข่ไก่'}]",600 g,g,600.0,[en:eggs],[en:egg],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg"",""text"":""Egg"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":100.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:thailand],"[{'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 28, 'w': 100}, '200': None, '400': {'h': 111, 'w': 400}, 'full': {'h': 1063, 'w': 3833}}, 'uploaded_t': 1550417691, 'uploader': 'n6o6n6'}, {'key': 'nutrition_en', 'imgid': 2, 'rev': 15, 'sizes': {'100': {'h': 34, 'w': 100}, '200': {'h': 68, 'w': 200}, '400': {'h': 135, 'w': 400}, 'full': {'h': 1111, 'w': 3281}}, 'uploaded_t': None, 'uploader': None}, {'key': 'ingredients_th', 'imgid': 1, 'rev': 22, 'sizes': {'100': {'h': 43, 'w': 100}, '200': {'h': 87, 'w': 200}, '400': {'h': 174, 'w': 400}, 'full': {'h': 1626, 'w': 3744}}, 'uploaded_t': None, 'uploader': None}, {'key': 'front_th', 'imgid': 1, 'rev': 12, 'sizes': {'100': {'h': 43, 'w': 100}, '200': {'h': 87, 'w': 200}, '400': {'h': 174, 'w': 400}, 'full': {'h': 1626, 'w': 3744}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 43, 'w': 100}, '200': None, '400': {'h': 174, 'w': 400}, 'full': {'h': 1626, 'w': 3744}}..."
6250,8009655000023,"[en:farming-products, en:eggs]",[],"[{'lang': 'main', 'text': 'Gusto tondo uova'}, {'lang': 'it', 'text': 'Gusto tondo uova'}]",[],,,,[],,,[en:italy],"[{'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 66, 'w': 100}, '200': None, '400': {'h': 263, 'w': 400}, 'full': {'h': 1200, 'w': 1822}}, 'uploaded_t': 1665852146, 'uploader': 'kiliweb'}, {'key': 'front_it', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 66, 'w': 100}, '200': {'h': 132, 'w': 200}, '400': {'h': 263, 'w': 400}, 'full': {'h': 1200, 'w': 1822}}, 'uploaded_t': None, 'uploader': None}]"
3928,8431876288377,"[en:farming-products, en:eggs]","[en:no-preservatives, en:no-colorings, en:nutriscore, en:nutriscore-grade-a]","[{'lang': 'main', 'text': 'Clara de huevo'}, {'lang': 'es', 'text': 'Clara de huevo'}]",[],,,,[en:eggs],"[en:egg-white, en:egg]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg-white"",""text"":""Clara de huevo"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:spain],"[{'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 40}, '200': None, '400': {'h': 400, 'w': 162}, 'full': {'h': 6605, 'w': 2675}}, 'uploaded_t': 1601975469, 'uploader': 'org-carrefour-espana'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 87, 'w': 100}, '200': None, '400': {'h': 347, 'w': 400}, 'full': {'h': 603, 'w': 696}}, 'uploaded_t': 1681290302, 'uploader': 'kiliweb'}, {'key': 'nutrition_es', 'imgid': 3, 'rev': 10, 'sizes': {'100': {'h': 87, 'w': 100}, '200': {'h': 173, 'w': 200}, '400': {'h': 347, 'w': 400}, 'full': {'h': 603, 'w': 696}}, 'uploaded_t': None, 'uploader': None}, {'key': 'front_es', 'imgid': 2, 'rev': 8, 'sizes': {'100': {'h': 100, 'w': 39}, '200': {'h': 200, 'w': 78}, '400': {'h': 400, 'w': 157}, 'full': {'h': 1169, 'w': 458}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 39}, '200': None, '400': {'h': 400, 'w': 157}, 'full': {'h': 1169, 'w': 458}}, 'u..."
7117,3770008965003,"[en:farming-products, en:eggs]","[en:organic, en:eu-organic, en:fr-bio-10]","[{'lang': 'main', 'text': '6 Oeufs moyens'}, {'lang': 'fr', 'text': '6 Oeufs moyens'}]",[],6,,0.0,[en:eggs],[en:egg],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg"",""text"":""oeufs"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:france],"[{'key': 'packaging_fr', 'imgid': 3, 'rev': 7, 'sizes': {'100': {'h': 40, 'w': 100}, '200': {'h': 81, 'w': 200}, '400': {'h': 161, 'w': 400}, 'full': {'h': 493, 'w': 1223}}, 'uploaded_t': None, 'uploader': None}, {'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 33, 'w': 100}, '200': None, '400': {'h': 132, 'w': 400}, 'full': {'h': 699, 'w': 2122}}, 'uploaded_t': 1714411060, 'uploader': 'cedille'}, {'key': 'front_fr', 'imgid': 1, 'rev': 3, 'sizes': {'100': {'h': 68, 'w': 100}, '200': {'h': 136, 'w': 200}, '400': {'h': 273, 'w': 400}, 'full': {'h': 1553, 'w': 2277}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 68, 'w': 100}, '200': None, '400': {'h': 273, 'w': 400}, 'full': {'h': 1553, 'w': 2277}}, 'uploaded_t': 1714411028, 'uploader': 'cedille'}, {'key': 'ingredients_fr', 'imgid': 2, 'rev': 5, 'sizes': {'100': {'h': 33, 'w': 100}, '200': {'h': 66, 'w': 200}, '400': {'h': 132, 'w': 400}, 'full': {'h': 699, 'w':..."
1265,7610029152920,"[en:farming-products, en:eggs, en:chicken-eggs]",[],"[{'lang': 'main', 'text': 'Picknick-EierOeufs de pique-nique'}, {'lang': 'fr', 'text': 'Picknick-EierOeufs de pique-nique'}]",[],10,,0.0,[],,,[en:switzerland],"[{'key': '2', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 50, 'w': 100}, '200': None, '400': {'h': 200, 'w': 400}, 'full': {'h': 1615, 'w': 3225}}, 'uploaded_t': 1555501992, 'uploader': 'eviv-bulgroz'}, {'key': '9', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 35, 'w': 100}, '200': None, '400': {'h': 139, 'w': 400}, 'full': {'h': 331, 'w': 954}}, 'uploaded_t': 1622560359, 'uploader': 'kiliweb'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 64}, '200': None, '400': {'h': 400, 'w': 256}, 'full': {'h': 2071, 'w': 1323}}, 'uploaded_t': 1555502128, 'uploader': 'eviv-bulgroz'}, {'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 46, 'w': 100}, '200': None, '400': {'h': 184, 'w': 400}, 'full': {'h': 942, 'w': 2050}}, 'uploaded_t': 1578417257, 'uploader': 'kiliweb'}, {'key': 'nutrition_fr', 'imgid': 9, 'rev': 28, 'sizes': {'100': {'h': 35, 'w': 100}, '200': {'h': 69, 'w': 200}, '400': {'h': 139, 'w': 400}, 'full': {'h': 331, 'w': 954}},..."
5427,715141514711,"[en:farming-products, en:eggs]",[en:kosher],"[{'lang': 'main', 'text': 'Grade AA Large Cage Free White Eggs'}, {'lang': 'en', 'text': 'Grade AA Large Cage Free White Eggs'}]",[],1 dozen,,0.0,[en:eggs],[en:egg],"[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg"",""text"":""eggs"",""vegetarian"":""yes"",""ciqual_food_code"":""22000"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]",[en:united-states],"[{'key': 'packaging_en', 'imgid': 4, 'rev': 14, 'sizes': {'100': {'h': 29, 'w': 100}, '200': {'h': 58, 'w': 200}, '400': {'h': 117, 'w': 400}, 'full': {'h': 884, 'w': 3024}}, 'uploaded_t': None, 'uploader': None}, {'key': '4', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 29, 'w': 100}, '200': None, '400': {'h': 117, 'w': 400}, 'full': {'h': 884, 'w': 3024}}, 'uploaded_t': 1705867687, 'uploader': 'johnhansen'}, {'key': 'nutrition_en', 'imgid': 2, 'rev': 10, 'sizes': {'100': {'h': 57, 'w': 100}, '200': {'h': 114, 'w': 200}, '400': {'h': 228, 'w': 400}, 'full': {'h': 333, 'w': 583}}, 'uploaded_t': None, 'uploader': None}, {'key': '1', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 100, 'w': 44}, '200': None, '400': {'h': 400, 'w': 175}, 'full': {'h': 1200, 'w': 524}}, 'uploaded_t': 1643134008, 'uploader': 'kiliweb'}, {'key': '3', 'imgid': None, 'rev': None, 'sizes': {'100': {'h': 69, 'w': 100}, '200': None, '400': {'h': 274, 'w': 400}, 'full': {'h': 2072, 'w': 3024}}, 'uploa..."


Conversion de la synthaxe duckdb en json

In [7]:
import json
import numpy as np

cols_to_json = []

for col in eggs_from_parquet_duckdb.columns:
    sample = eggs_from_parquet_duckdb[col].dropna().head(20)
    if sample.apply(lambda x: isinstance(x, (list, dict, np.ndarray))).any():
        cols_to_json.append(col)

cols_to_json

cols_to_json_for_import = cols_to_json + ['ingredients']

In [8]:
eggs_from_parquet = eggs_from_parquet_duckdb.copy()

def ndarray_to_json(arr):
    if isinstance(arr, (list, dict)):
        return json.dumps(arr)
    elif isinstance(arr, np.ndarray):
        return json.dumps(arr.tolist())
    else:
        return arr  # valeur non traitée

for col in cols_to_json_for_import:
    print(col)
    print(type(eggs_from_parquet['categories_tags'][0]))
    print(eggs_from_parquet['categories_tags'][0])
    eggs_from_parquet[col] = eggs_from_parquet_duckdb[col].apply(ndarray_to_json)
    print(type(eggs_from_parquet['categories_tags'][0]))
    print(eggs_from_parquet['categories_tags'][0])
    print()


categories_tags
<class 'numpy.ndarray'>
['en:farming-products' 'en:eggs']
<class 'str'>
["en:farming-products", "en:eggs"]

labels_tags
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

product_name
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

generic_name
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

allergens_tags
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

ingredients_tags
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

countries_tags
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

images
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-products", "en:eggs"]

ingredients
<class 'str'>
["en:farming-products", "en:eggs"]
<class 'str'>
["en:farming-pro

In [9]:
with open("../data/cols_to_json.txt", "w") as f:
    json.dump(cols_to_json_for_import, f)

eggs_from_parquet.to_csv("../data/eggs_from_parquet.csv", index=False)
eggs_from_parquet

Unnamed: 0,code,categories_tags,labels_tags,product_name,generic_name,quantity,product_quantity_unit,product_quantity,allergens_tags,ingredients_tags,ingredients,countries_tags,images
0,00003100,"[""en:farming-products"", ""en:eggs""]",[],"[{""lang"": ""main"", ""text"": ""Hard Boiled Eggs""}, {""lang"": ""fr"", ""text"": ""Hard Boiled Eggs""}]",[],2,,0.0,"[""en:eggs""]","[""fr:eggs"", ""en:e330"", ""fr:sodium-benzoate"", ""fr:nisin-preparation""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":0,""percent_estimate"":100.0,""vegan"":null,""id"":""fr:eggs"",""text"":""Eggs"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":[{""percent_max"":100.0,""percent_min"":25.0,""is_in_taxonomy"":0,""percent_estimate"":62.5,""vegan"":null,""id"":""fr:eggs"",""text"":""Eggs"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null},{""percent_max"":50.0,""percent_min"":0.0,""is_in_taxonomy"":1,""percent_estimate"":18.75,""vegan"":""yes"",""id"":""en:e330"",""text"":""Citric Acid"",""vegetarian"":""yes"",""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}...","[""en:france""]","[{""key"": ""front"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 2666, ""w"": 2000}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""nutrition_fr"", ""imgid"": 3, ""rev"": 18, ""sizes"": {""100"": {""h"": 100, ""w"": 85}, ""200"": {""h"": 200, ""w"": 170}, ""400"": {""h"": 400, ""w"": 340}, ""full"": {""h"": 785, ""w"": 668}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 2666, ""w"": 2000}}, ""uploaded_t"": 1415119256, ""uploader"": ""openfoodfacts-contributors""}, {""key"": ""ingredients_en"", ""imgid"": 3, ""rev"": 22, ""sizes"": {""100"": {""h"": 16, ""w"": 100}, ""200"": {""h"": 31, ""w"": 200}, ""400"": {""h"": 63, ""w"": 400}, ""full"": {""h"": 98, ""w"": 624}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 70}, ""200"": null, ""400"": {""h"": 400, ""w"": 278}, ""full"": {""h"": 1002,..."
1,0011110797698,"[""en:farming-products"", ""en:eggs"", ""en:undefined""]",,"[{""lang"": ""main"", ""text"": ""Natural Grade Aa Large Brown Eggs""}, {""lang"": ""en"", ""text"": ""Natural Grade Aa Large Brown Eggs""}]",[],50 g,g,50.0,[],"[""en:large-brown-eggs""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":0,""percent_estimate"":100.0,""vegan"":null,""id"":""en:large-brown-eggs"",""text"":""LARGE BROWN EGGS"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]","[{""key"": ""nutrition_en"", ""imgid"": 2, ""rev"": 6, ""sizes"": {""100"": {""h"": 100, ""w"": 43}, ""200"": {""h"": 200, ""w"": 86}, ""400"": {""h"": 400, ""w"": 172}, ""full"": {""h"": 1200, ""w"": 515}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""front_en"", ""imgid"": 1, ""rev"": 4, ""sizes"": {""100"": {""h"": 100, ""w"": 45}, ""200"": {""h"": 200, ""w"": 89}, ""400"": {""h"": 400, ""w"": 178}, ""full"": {""h"": 1200, ""w"": 534}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""2"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 43}, ""200"": null, ""400"": {""h"": 400, ""w"": 172}, ""full"": {""h"": 1200, ""w"": 515}}, ""uploaded_t"": 1626629588, ""uploader"": ""kiliweb""}, {""key"": ""3"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 83}, ""200"": null, ""400"": {""h"": 400, ""w"": 333}, ""full"": {""h"": 600, ""w"": 500}}, ""uploaded_t"": 1649794716, ""uploader"": ""foodvisor""}, {""key"": ""ingredients_en"", ""imgid"": 4, ""rev"": 10, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 4000, ""w""..."
2,0011110806543,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""100% Egg Whites""}, {""lang"": ""en"", ""text"": ""100% Egg Whites""}]",[],,,,"[""en:eggs""]","[""en:egg-white"", ""en:egg""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:egg-white"",""text"":""egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-organic-code0"",""processing"":null,""labels"":""en:organic"",""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]",[]
3,0011110828897,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Kroger, break-free, real egg product""}, {""lang"": ""en"", ""text"": ""Kroger, break-free, real egg product""}]",[],,,,"[""en:eggs""]","[""en:egg-white"", ""en:egg"", ""en:contains-1-and-less-of-the-following"", ""en:e415"", ""en:salt"", ""en:onion"", ""en:vegetable"", ""en:root-vegetable"", ""en:onion-family-vegetable"", ""en:natural-flavouring"", ""en:flavouring"", ""en:colour"", ""en:vitamins"", ""en:minerals"", ""en:iron"", ""en:d-alpha-tocopheryl-acetate"", ""en:vitamin-e"", ""en:zinc-sulfate"", ""en:zinc"", ""en:calcium-pantothenate"", ""en:pantothenic-acid"", ""en:vitamin-b12"", ""en:e101"", ""en:thiamin-mononitrate"", ""en:thiamin"", ""en:pyridoxine-hydrochloride"", ""en:vitamin-b6"", ""en:folic-acid"", ""en:folate"", ""en:biotin"", ""en:cholecalciferol"", ""en:vitamin-d"", ""en:e412"", ""en:includes-beta-carotene"", ""en:e516"", ""en:ferric-orthophosphate""]","[{""percent_max"":99.0,""percent_min"":99.0,""is_in_taxonomy"":1,""percent_estimate"":99.0,""vegan"":""no"",""id"":""en:egg-white"",""text"":""Egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":99.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null},{""percent_max"":1.0,""percent_min"":0.0,""is_in_taxonomy"":0,""percent_estimate"":0.5,""vegan"":null,""id"":""en:contains-1-and-less-of-the-following"",""text"":""contains 1% and less of the following"",""vegetarian"":null,""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":[{""percent_max"":1.0,""percent_min"":0.0,""is_in_taxonomy"":1,""percent_estimate"":0.5,""vegan"":""yes"",""id"":""en:e412"",""text"":""guar gum"",""vegetarian"":""yes"",""ciqual_food_code"":null,""percent"":null,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":null,""processing"":null,""labels"":null,""origins"":null,""ecobalyse_...","[""en:united-states""]",[]
4,0011110846037,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""100% Liquid Egg Whites""}, {""lang"": ""en"", ""text"": ""100% Liquid Egg Whites""}]",[],,,,"[""en:eggs""]","[""en:liquid-egg-white"", ""en:egg"", ""en:egg-white""]","[{""percent_max"":100.0,""percent_min"":100.0,""is_in_taxonomy"":1,""percent_estimate"":100.0,""vegan"":""no"",""id"":""en:liquid-egg-white"",""text"":""liquid egg whites"",""vegetarian"":""yes"",""ciqual_food_code"":""22001"",""percent"":100.0,""from_palm_oil"":null,""ingredients"":null,""ecobalyse_code"":""egg-indoor-code3"",""processing"":null,""labels"":null,""origins"":null,""ecobalyse_proxy_code"":null,""quantity"":null,""quantity_g"":null,""ciqual_proxy_food_code"":null}]","[""en:united-states""]",[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7645,6287027360032,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Rahima Fresh Egg""}, {""lang"": ""en"", ""text"": ""Rahima Fresh Egg""}]",[],,,,[],,,"[""en:saudi-arabia""]","[{""key"": ""front_en"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": 1747978006, ""uploader"": ""openfoodfacts-contributors""}]"
7646,6287027360049,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Fresh Egg""}, {""lang"": ""en"", ""text"": ""Fresh Egg""}]",[],,,,[],,,"[""en:saudi-arabia""]","[{""key"": ""front_en"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": 1747978055, ""uploader"": ""openfoodfacts-contributors""}]"
7647,6287004270057,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Maknoon Egg""}, {""lang"": ""en"", ""text"": ""Maknoon Egg""}]",[],,,,[],,,"[""en:saudi-arabia""]","[{""key"": ""front_en"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": {""h"": 150, ""w"": 200}, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 1440, ""w"": 1920}}, ""uploaded_t"": null, ""uploader"": null}, {""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 75, ""w"": 100}, ""200"": null, ""400"": {""h"": 300, ""w"": 400}, ""full"": {""h"": 1440, ""w"": 1920}}, ""uploaded_t"": 1747978104, ""uploader"": ""openfoodfacts-contributors""}]"
7648,6281106110266,"[""en:farming-products"", ""en:eggs""]",,"[{""lang"": ""main"", ""text"": ""Rahima Egg""}, {""lang"": ""en"", ""text"": ""Rahima Egg""}]",[],,,,[],,,"[""en:saudi-arabia""]","[{""key"": ""1"", ""imgid"": null, ""rev"": null, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": null, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": 1747978147, ""uploader"": ""openfoodfacts-contributors""}, {""key"": ""front_en"", ""imgid"": 1, ""rev"": 3, ""sizes"": {""100"": {""h"": 100, ""w"": 75}, ""200"": {""h"": 200, ""w"": 150}, ""400"": {""h"": 400, ""w"": 300}, ""full"": {""h"": 1920, ""w"": 1440}}, ""uploaded_t"": null, ""uploader"": null}]"


In [10]:
head_db = duckdb.execute(f"SELECT * FROM 'C:/Users/DELL/Desktop/Data4Good local/food.parquet' limit 5").df()
