Lien github du projet : [Github](https://github.com/QuentinNav/Clustering-de-produits-alimentaires "Clustering de produits alimentaires")

# Partie B : Comparaison des produits

Les produits sont composés de 1 ou plusieurs vecteurs, nous souhaitons calculer leur niveau de similarité. \
Il faut donc trouver une solution pour comparer des produits avec des dimensions différentes. 

Pour comparer deux produits entre eux nous utilisons la similarité cosinus.

L'idée globale est de déterminer une fonction pour comparer tous les produits entre eux 2 à 2, comme nous avons pour l'instant gardé 700 000  produits dans le dataset, la vitesse d'exécution du code est un enjeu important.\
Il se peut que l'on soit dans l'obligation de réduire la taille du dataset afin d'obtenir un temps d'exécution acceptable.

## Import et installation des librairies et fonctions

In [1]:
#Import du drive 
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [18]:
from compute_similarities import *
from config import data_path

## Chargement des données 

In [19]:
df_ingredients_word2vec = pd.read_csv(data_path+"ingredients_word2vec.csv", sep="\t").set_index("ingredient")

df= pd.read_csv(data_path+"cleaned_data.csv", sep="\t", low_memory=False)
df['liste_ingredients'] = df['liste_ingredients'].str.strip('[]').str.replace(" ","").str.replace("'","").str.split(',')
df["product_name"]= df["product_name"].astype(str)

df.head()

Unnamed: 0,product_name,ingredients_tags,liste_ingredients,nombre_ingredients_pre_filtre,nombre_ingredients,nombre_ingredients_perdus,part_ingredients_perdus
0,L.casei,"{'en': ['semi-skimmed-milk', 'dairy', 'milk', ...","[semi-skimmed-milk, dairy, milk, sugar, added-...",14,12,2,0.142857
1,Solène céréales poulet,"{'en': ['antioxidant', 'colour', 'tomato', 've...","[antioxidant, colour, tomato, vegetable, mayon...",32,32,0,0.0
2,Crème dessert chocolat,"{'en': ['whole-milk', 'dairy', 'milk', 'sugar'...","[whole-milk, dairy, milk, sugar, added-sugar, ...",10,10,0,0.0
3,Baguette Poitevin,"{'en': [None, 'water', 'salt', 'yeast', 'glute...","[water, salt, yeast, gluten, e300, wheat-flour...",23,14,9,0.391304
4,Suedois saumon,"{'en': [None, 'water', 'rye-flour', 'flour', '...","[water, rye-flour, flour, cereal-flour, sugar,...",42,36,6,0.142857


## Matrice des similarités

Pour accélérer les algorithmes de comparaison des produits nous calculons à l'avance la matrice des similarités cosinus entre tous les produits.

Dimensions : 812 x 812

In [20]:
df_similarities = compute_similarity_matrix(df_ingredients_word2vec)
df_similarities.head()

Unnamed: 0,semi-skimmed-milk,dairy,milk,sugar,added-sugar,disaccharide,lactic-ferments,ferment,microbial-culture,vitamins,...,wholemeal-oat-flour,lactobacillus-bulgaricus,lactobacillus,streptococcus-thermophilus,lactobacillus-acidophilus,bifidus,zinc-sulfate,pork-collagen,spice-or-bell-pepper,superior-quality-durum-wheat-semolina
semi-skimmed-milk,1.0,0.812481,0.856659,0.408533,0.37441,0.395696,0.825005,0.791201,0.761323,0.39925,...,0.200331,0.664569,0.590097,0.636515,0.615569,0.612478,0.287747,0.391179,0.10134,0.359987
dairy,0.812481,1.0,0.922171,0.631504,0.590832,0.624127,0.630645,0.598705,0.574923,0.252504,...,0.278563,0.415509,0.419602,0.405257,0.446888,0.429913,0.049415,0.252374,0.173808,0.30889
milk,0.856659,0.922171,1.0,0.466385,0.417522,0.456121,0.755143,0.738621,0.71919,0.341347,...,0.108866,0.578011,0.571979,0.581873,0.591607,0.563413,0.180897,0.375677,0.129618,0.254378
sugar,0.408533,0.631504,0.466385,1.0,0.989781,0.998432,0.284654,0.287253,0.249729,0.139359,...,0.467385,0.209229,0.249344,0.210427,0.246889,0.229958,-0.099358,0.338123,0.543702,0.444924
added-sugar,0.37441,0.590832,0.417522,0.989781,1.0,0.989594,0.253843,0.269749,0.22336,0.150211,...,0.480627,0.18385,0.239396,0.188048,0.23888,0.217283,-0.115915,0.379745,0.539926,0.396437


## Fonctions pour calculer les similarités entre les produits 

### Essais des fonctions de comparaison de produits par Max pooling et average pooling

* A est le produit pour lequel on cherche des produits similaires 
* B est un autre produit du dataframe que l'on compare à A

#### Max pooling

In [None]:
df_test = find_similar_products_max_pooling(df,"Crème dessert chocolat")
df_test.sort_values("similarity",ascending=False).head(15)

100%|██████████| 10/10 [00:16<00:00,  1.67s/it]


Unnamed: 0,product_name,liste_ingredients,similarity
200535,Profiteroles,"[dairy, cream, sugar, added-sugar, disaccharid...",1.0
617940,Génoise au cacao fourrée au praliné,"[sugar, added-sugar, disaccharide, whole-milk,...",1.0
621829,Tourte nougat,"[sugar, added-sugar, disaccharide, whole-milk,...",1.0
268399,Gourmet cupcakes,"[icing-sugar, added-sugar, disaccharide, sugar...",1.0
629896,La recova,"[dairy, milk, condensed-milk, sweetened-conden...",1.0
992,Chocolate Bites,"[sugar, added-sugar, disaccharide, wheat-flour...",1.0
621850,Tourte au Nougat,"[whole-milk, dairy, milk, sugar, added-sugar, ...",1.0
44411,Frosted Cookies,"[flour, vegetable-fat, oil-and-fat, vegetable-...",1.0
468910,6 Macarons gourmands,"[sugar, added-sugar, disaccharide, nut, tree-n...",1.0
24056,"Lunds & byerlys, shortcake, lemon & cream","[lemon, fruit, citrus-fruit, sugar, added-suga...",1.0


Nous observons que les produits avec beaucoup d'ingrédients sont énormément avantagés.\
En effet comme nous récupérons la similarité maximale trouvée pour chaque ingrédients de A avec chaque ingrédient de B, si un produit contient beaucoup d'ingrédients il a plus de chance d'avoir un ingrédient similaire pour chaque ingrédient de B.\
L'inconvénient est qu'il peut aussi avoir beaucoup d'ingrédients en plus qui n'ont pas de rapport avec A et il ne sera pas pénalisé.

Pour que B obtienne un bon score avec cette méthode il faut que les ingrédients de A soient inclus dans les ingrédients de B. 

#### Average Pooling

In [None]:
df_test2 = find_similar_products_avg_pooling(df,"Crème dessert chocolat")
df_test2.sort_values("similarity",ascending=False).head(15)

100%|██████████| 10/10 [01:41<00:00, 10.12s/it]


Unnamed: 0,product_name,liste_ingredients,similarity
409253,Creme fraiche d'Isigny,"[dairy, cream]",0.642045
612056,Freefrom Crème fraîche Lactosefrei,"[cream, dairy]",0.642045
117526,Farmer s sweet whipped butter,"[cream, dairy]",0.642045
167766,Sour Cream,"[cream, dairy]",0.642045
167764,Sour Cream,"[cream, dairy]",0.642045
167763,Sour Cream,"[cream, dairy]",0.642045
589571,Masło osełkowe,"[cream, dairy]",0.642045
589570,Masło extra,"[cream, dairy]",0.642045
27900,Organic half & half,"[cream, dairy]",0.642045
534702,Crème fraîche,"[cream, dairy]",0.642045


Au contraire, ici les produits avec peu d'ingrédients semblent être avantagés.\
Comme nous gardons la moyenne des similarités entre les chaque ingrédient de A et tous les ingrédients de B, si les ingrédients de A sont "distants" même si B les contient, le score de similarité sera pénalisé. 


Pour avoir un bon score de similarité avec cette technique il faut que les ingrédients de B soient inclus dans ceux de A. 

## Solution 

Pour remédier à ce problème nous repartons de la fonction des similarités effectuant un max pooling mais cette fois nous l'effectuons dans les 2 sens.

Si l'on test cette méthode sur le produit qui a eu le meilleur score de similarité avec le max pooling, nous remarquons que le deuxième score est beaucoup moins bon et donc que cette méthode ne l'aurai pas mais aussi haut dans le classement des produits similaires. 

In [None]:
A =df.iloc[2]["liste_ingredients"]#Liste des ingrédients de la Crème chocolat
B = df.loc[df["product_name"]=="Chocolate Bites","liste_ingredients"].iloc[0]# Liste des ingrédients des "Chocolate bites"

In [None]:
print('Similarité Max pooling des ingrédients de A dans B : ',similarity_max_A_dans_B(A,B))
print('Similarité Max pooling des ingrédients de B dans A : ',similarity_max_A_dans_B(B,A))

Similarité Max pooling des ingrédients de A dans B :  1.0
Similarité Max pooling des ingrédients de B dans A :  0.6262173263633892


In [None]:
dicts_A = [df_similarities[ingredient].to_dict() for ingredient in A] 
print(A)
print(B)
similarities_both_ways(A,B,dicts_A)

['whole-milk', 'dairy', 'milk', 'sugar', 'added-sugar', 'disaccharide', 'corn-starch', 'starch', 'cocoa', 'e406']
['sugar', 'added-sugar', 'disaccharide', 'wheat-flour', 'cereal', 'flour', 'wheat', 'cereal-flour', 'palm-kernel-oil', 'oil-and-fat', 'vegetable-oil-and-fat', 'palm-kernel-oil-and-fat', 'whey', 'dairy', 'water', 'egg-yolk', 'egg', 'soya-oil', 'vegetable-oil', 'soya-flour', 'legume', 'soya', 'soya-bean', 'sour-cream', 'cream', 'cocoa-powder', 'cocoa', 'skimmed-milk', 'milk', 'corn-syrup', 'raising-agent', 'corn-starch', 'starch', 'wheat-starch', 'salt', 'e170i', 'e170', 'soya-lecithin', 'e322', 'e322i', 'dextrose', 'monosaccharide', 'glucose', 'e471', 'spice', 'condiment', 'e406', 'e490', 'e491', 'e466', 'natural-and-artificial-flavouring', 'flavouring', 'natural-flavouring', 'artificial-flavouring', 'corn-oil', 'e330', 'barley-malt-flour', 'barley', 'barley-flour', 'e375', 'reduced-iron', 'minerals', 'iron', 'thiamin-mononitrate', 'thiamin', 'e101', 'folic-acid', 'folate', 

(1.0, 0.6262173263633892)

Le deuxième score est nettement moins bon.

### On reprend le même exemple avec la nouvelle méthode

In [None]:
%%time
df_test = find_similar_products_max_pooling_both_ways(df, "Crème dessert chocolat")
df_test.head(10)

100%|██████████| 701084/701084 [36:06<00:00, 323.64it/s]


CPU times: user 34min 9s, sys: 49.1 s, total: 34min 58s
Wall time: 36min 7s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
502997,Creme dessert artisanale,"[dairy, milk, whole-milk, sugar, added-sugar, ...",0.975463,0.966879,0.971171
355916,Crème dessert,"[dairy, milk, pasteurised-milk, whole-milk, ca...",0.978254,0.958944,0.968599
536327,Délices de lait - Crème dessert Cacao,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.9808,0.952819,0.96681
440923,Crème dessert au chocolat,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.975463,0.957325,0.966394
528512,Landliebe Sahne Pudding Dunkle Schokolade,"[whole-milk, dairy, milk, modified-starch, sta...",0.945702,0.98495,0.965326
677269,Ovocný košík s kousky jahod,"[milk, dairy, milk-powder, sugar, added-sugar,...",0.96202,0.966869,0.964444
447631,Lactel max bio chocolat,"[semi-skimmed-milk, dairy, milk, rice-starch, ...",0.969071,0.959748,0.96441
500772,Crèmes dessert chocolat,"[dairy, milk, pasteurised-milk, whole-milk, su...",0.945702,0.976264,0.960983
425199,P'tit Goûter au lait cacao,"[whole-milk, dairy, milk, cane-sugar, added-su...",0.951536,0.966104,0.95882
424994,P'tit Gouter Au Lait chocolat,"[whole-milk, dairy, milk, cane-sugar, added-su...",0.951536,0.966104,0.95882


Le temps d'exécution est très long.

### Optimisation de la fonction 

#### V2

In [None]:
%%time
df_test = find_similar_products_max_pooling_both_waysV2(df, "Crème dessert chocolat")
df_test.head(10)

CPU times: user 2min 2s, sys: 2.41 s, total: 2min 5s
Wall time: 2min 7s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
497444,Creme dessert artisanale,"[dairy, milk, whole-milk, sugar, added-sugar, ...",0.978151,0.968251,0.973201
352550,Crème dessert,"[dairy, milk, pasteurised-milk, whole-milk, ca...",0.980159,0.960495,0.970327
436326,Crème dessert au chocolat,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.978151,0.960278,0.969215
530200,Délices de lait - Crème dessert Cacao,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.9833,0.953759,0.96853
442920,Lactel max bio chocolat,"[semi-skimmed-milk, dairy, milk, rice-starch, ...",0.972582,0.963024,0.967803
522517,Landliebe Sahne Pudding Dunkle Schokolade,"[whole-milk, dairy, milk, modified-starch, sta...",0.945112,0.985059,0.965086
668393,Ovocný košík s kousky jahod,"[milk, dairy, milk-powder, sugar, added-sugar,...",0.961873,0.965706,0.96379
495263,Crèmes dessert chocolat,"[dairy, milk, pasteurised-milk, whole-milk, su...",0.945112,0.975642,0.960377
535656,Schoko Pudding,"[whole-milk, dairy, milk, added-sugar, disacch...",0.957248,0.962144,0.959696
420849,P'tit Goûter au lait cacao,"[whole-milk, dairy, milk, cane-sugar, added-su...",0.953359,0.965138,0.959248


### V3

In [None]:
%%time
df_test =find_similar_products_max_pooling_both_waysV3(df, "Crème dessert chocolat")
df_test.head(10)

CPU times: user 1min 36s, sys: 976 ms, total: 1min 37s
Wall time: 1min 37s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
497444,Creme dessert artisanale,"[dairy, milk, whole-milk, sugar, added-sugar, ...",0.978151,0.968251,0.973201
352550,Crème dessert,"[dairy, milk, pasteurised-milk, whole-milk, ca...",0.980159,0.960495,0.970327
436326,Crème dessert au chocolat,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.978151,0.960278,0.969215
530200,Délices de lait - Crème dessert Cacao,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.9833,0.953759,0.96853
442920,Lactel max bio chocolat,"[semi-skimmed-milk, dairy, milk, rice-starch, ...",0.972582,0.963024,0.967803
522517,Landliebe Sahne Pudding Dunkle Schokolade,"[whole-milk, dairy, milk, modified-starch, sta...",0.945112,0.985059,0.965086
668393,Ovocný košík s kousky jahod,"[milk, dairy, milk-powder, sugar, added-sugar,...",0.961873,0.965706,0.96379
495263,Crèmes dessert chocolat,"[dairy, milk, pasteurised-milk, whole-milk, su...",0.945112,0.975642,0.960377
535656,Schoko Pudding,"[whole-milk, dairy, milk, added-sugar, disacch...",0.957248,0.962144,0.959696
420849,P'tit Goûter au lait cacao,"[whole-milk, dairy, milk, cane-sugar, added-su...",0.953359,0.965138,0.959248


### V3 avec multiprocessing

In [None]:
mp.cpu_count()

2

Comme il n'y a que 2 cpu sur google colab la différence n'est pas si grande mais on gagne tout de même une trentaine de secondes.

In [None]:
%%time
df_test =find_similar_products_max_pooling_both_waysV3_multiprocessing(df, "Crème dessert chocolat")
df_test.head(10)

CPU times: user 15.1 s, sys: 2.88 s, total: 18 s
Wall time: 1min 14s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
497444,Creme dessert artisanale,"[dairy, milk, whole-milk, sugar, added-sugar, ...",0.978151,0.968251,0.973201
352550,Crème dessert,"[dairy, milk, pasteurised-milk, whole-milk, ca...",0.980159,0.960495,0.970327
436326,Crème dessert au chocolat,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.978151,0.960278,0.969215
530200,Délices de lait - Crème dessert Cacao,"[whole-milk, dairy, milk, sugar, added-sugar, ...",0.9833,0.953759,0.96853
442920,Lactel max bio chocolat,"[semi-skimmed-milk, dairy, milk, rice-starch, ...",0.972582,0.963024,0.967803
522517,Landliebe Sahne Pudding Dunkle Schokolade,"[whole-milk, dairy, milk, modified-starch, sta...",0.945112,0.985059,0.965086
668393,Ovocný košík s kousky jahod,"[milk, dairy, milk-powder, sugar, added-sugar,...",0.961873,0.965706,0.96379
495263,Crèmes dessert chocolat,"[dairy, milk, pasteurised-milk, whole-milk, su...",0.945112,0.975642,0.960377
535656,Schoko Pudding,"[whole-milk, dairy, milk, added-sugar, disacch...",0.957248,0.962144,0.959696
420849,P'tit Goûter au lait cacao,"[whole-milk, dairy, milk, cane-sugar, added-su...",0.953359,0.965138,0.959248


## Comparaison des 3 méthodes sur un plus petit échantillon du dataframe :

In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]

In [None]:
def find_similar_products_max_pooling_both_waysV3_multiprocessing(df_products, product_name):
    liste_ingredients = df_products.loc[df_products["product_name"]==product_name, "liste_ingredients"].iloc[0]

    df_temp = df_products[df_products["product_name"]!=product_name]# On retire le produits testé de la liste
    list_dicts = [df_similarities[ingredient].to_dict() for ingredient in liste_ingredients]

    with mp.Pool(mp.cpu_count()) as pool :
        df_temp[["similarity1","similarity2"]] = pool.starmap(similarities_both_ways, zip(repeat(liste_ingredients),df_temp["liste_ingredients"], repeat(list_dicts)))

    df_temp["mean_similarity"] = (df_temp["similarity1"] + df_temp["similarity2"])/2


    return df_temp[["product_name","liste_ingredients","similarity1","similarity2","mean_similarity"]].sort_values("mean_similarity",ascending=False)

In [None]:
%%time 
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV2(df_temp, product_name)
df_test.head(10)

Produits similaires au produit : Apple juice cocktail from concentrate
CPU times: user 22.5 s, sys: 330 ms, total: 22.8 s
Wall time: 23 s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
73879,"Clover valley, juice cocktail, apple","[filtered-water, water, high-fructose-corn-syr...",0.96474,1.0,0.98237
308674,"Bogopa, Apple Juice Cocktail","[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
205647,Apple Juice Drink From Concentrate,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
210870,Apple juice cocktail flavored with other natur...,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,0.996696,0.974544
162362,"Dreamworks, Fruit Punch","[filtered-water, water, high-fructose-corn-syr...",0.987817,0.952982,0.9704
153943,"Mr. Pure, Juice Drink, Apple Cranberry","[water, high-fructose-corn-syrup, added-sugar,...",0.963618,0.969001,0.966309
250398,Cranberry apple flavored juice cocktail from c...,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.980084,0.964639
2868,Cranberry Apple Juice Cocktail,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.978741,0.963967
320108,"Goliath, Nectar Tropical","[water, high-fructose-corn-syrup, added-sugar,...",0.968173,0.956363,0.962268
52471,Cranberry apple juice cocktail from concentrat...,"[filtered-water, water, high-fructose-corn-syr...",0.965086,0.947506,0.956296


In [None]:
%%time 
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3(df_temp, product_name)
df_test.head(10)

Produits similaires au produit : Apple juice cocktail from concentrate
CPU times: user 10.2 s, sys: 34.9 ms, total: 10.2 s
Wall time: 10.2 s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
73879,"Clover valley, juice cocktail, apple","[filtered-water, water, high-fructose-corn-syr...",0.96474,1.0,0.98237
205647,Apple Juice Drink From Concentrate,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
308674,"Bogopa, Apple Juice Cocktail","[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
210870,Apple juice cocktail flavored with other natur...,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,0.996696,0.974544
162362,"Dreamworks, Fruit Punch","[filtered-water, water, high-fructose-corn-syr...",0.987817,0.952982,0.9704
153943,"Mr. Pure, Juice Drink, Apple Cranberry","[water, high-fructose-corn-syrup, added-sugar,...",0.963618,0.969001,0.966309
250398,Cranberry apple flavored juice cocktail from c...,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.980084,0.964639
2868,Cranberry Apple Juice Cocktail,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.978741,0.963967
320108,"Goliath, Nectar Tropical","[water, high-fructose-corn-syrup, added-sugar,...",0.968173,0.956363,0.962268
52471,Cranberry apple juice cocktail from concentrat...,"[filtered-water, water, high-fructose-corn-syr...",0.965086,0.947506,0.956296


In [None]:
%%time 
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(10)

Produits similaires au produit : Apple juice cocktail from concentrate
CPU times: user 1.61 s, sys: 673 ms, total: 2.28 s
Wall time: 10.2 s


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
73879,"Clover valley, juice cocktail, apple","[filtered-water, water, high-fructose-corn-syr...",0.96474,1.0,0.98237
205647,Apple Juice Drink From Concentrate,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
308674,"Bogopa, Apple Juice Cocktail","[water, high-fructose-corn-syrup, added-sugar,...",0.952392,1.0,0.976196
210870,Apple juice cocktail flavored with other natur...,"[water, high-fructose-corn-syrup, added-sugar,...",0.952392,0.996696,0.974544
162362,"Dreamworks, Fruit Punch","[filtered-water, water, high-fructose-corn-syr...",0.987817,0.952982,0.9704
153943,"Mr. Pure, Juice Drink, Apple Cranberry","[water, high-fructose-corn-syrup, added-sugar,...",0.963618,0.969001,0.966309
250398,Cranberry apple flavored juice cocktail from c...,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.980084,0.964639
2868,Cranberry Apple Juice Cocktail,"[filtered-water, water, high-fructose-corn-syr...",0.949193,0.978741,0.963967
320108,"Goliath, Nectar Tropical","[water, high-fructose-corn-syrup, added-sugar,...",0.968173,0.956363,0.962268
52471,Cranberry apple juice cocktail from concentrat...,"[filtered-water, water, high-fructose-corn-syr...",0.965086,0.947506,0.956296


La méthode V3 + multiprocessing est clairement la plus rapide, même sur un plus petit dataframe

## Autres exemples de produits

Exemple de résultats sur 50 000 produits aléatoires pour réduire le temps d'exécution

In [10]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Nonfat greek yogurt


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
239073,Greek strained yogurt with cherry,"[cane-sugar, added-sugar, disaccharide, sugar,...",0.993806,0.994424,0.994115
239070,"Fage, total greek strained yogurt, cherry","[cane-sugar, added-sugar, disaccharide, sugar,...",0.993806,0.971998,0.982902
239096,Classic greek strained yogurt,"[cane-sugar, added-sugar, disaccharide, sugar,...",0.991402,0.967126,0.979264
239056,Nonfat greek strained yogurt,"[cane-sugar, added-sugar, disaccharide, sugar,...",0.952307,0.993893,0.9731
89176,Premium blended black cherry authentic greek l...,"[cherry, fruit, cane-sugar, added-sugar, disac...",0.972279,0.97078,0.971529
239053,Greek strained yogurt,"[cane-sugar, added-sugar, disaccharide, sugar,...",0.951567,0.984592,0.96808
486014,Yaourt brassé sur lit de myrtilles,"[blueberry, fruit, berries, water, cane-sugar,...",0.952265,0.981168,0.966717
5327,2% milkfat lowfat greek yogurt,"[sugar, added-sugar, disaccharide, corn-starch...",0.956619,0.969788,0.963204
689726,Greek Nonfat Yogurt With Fruit On The Bottom,"[sugar, added-sugar, disaccharide, orange, fru...",0.962068,0.956451,0.959259
239052,Greek Strained Yogurt,"[cane-sugar, added-sugar, disaccharide, sugar,...",0.951567,0.961618,0.956593


In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Chocolate sandwich cookies


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
105714,Oreo cookies 12x10.700 oz,"[sugar, added-sugar, disaccharide, flour, palm...",0.991382,0.978811,0.985096
105797,Oreo cookies hot cocoa 1x10.7 oz,"[sugar, added-sugar, disaccharide, flour, palm...",0.991382,0.978811,0.985096
106027,Chocolate creme sandwich cookies,"[sugar, added-sugar, disaccharide, flour, palm...",0.990543,0.978318,0.98443
105764,Oreo cookies 1x20.000 oz,"[sugar, added-sugar, disaccharide, flour, palm...",0.990543,0.971891,0.981217
62369,"Original chocolate chip cookies, original","[flour, sugar, added-sugar, disaccharide, palm...",0.98945,0.970196,0.979823
20085,"Original chocolate chip cookies, original choc...","[flour, sugar, added-sugar, disaccharide, palm...",0.98945,0.970196,0.979823
74687,Chocolate Chip Cookies,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.968883,0.990347,0.979615
93864,Chocolate Sandwich Cookies,"[sugar, added-sugar, disaccharide, flour, vege...",0.991406,0.967362,0.979384
118070,Chocolate Sandwich Cookies,"[sugar, added-sugar, disaccharide, flour, palm...",0.991406,0.966565,0.978985
133970,Original chocolate sandwich cookie cremes,"[flour, sugar, added-sugar, disaccharide, palm...",0.991406,0.966565,0.978985


In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Creamy peanut butter


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
112128,"Skippy, peanut butter","[roasted-peanuts, nut, peanut, sugar, added-su...",1.0,1.0,1.0
271659,"Crunchy peanut butter, crunchy","[roasted-peanuts, nut, peanut, sugar, added-su...",1.0,1.0,1.0
252970,Peanut butter,"[roasted-peanuts, nut, peanut, sugar, added-su...",1.0,1.0,1.0
220845,Peanut butter,"[peanut, nut, sugar, added-sugar, disaccharide...",0.996382,1.0,0.998191
110037,"Creamy peanut butter, creamy","[peanut, nut, sugar, added-sugar, disaccharide...",0.996382,1.0,0.998191
109736,Crunchy,"[peanut, nut, sugar, added-sugar, disaccharide...",0.996382,1.0,0.998191
8001,"Roundy's, peanut butter","[peanut, nut, sugar, added-sugar, disaccharide...",0.988989,1.0,0.994495
605693,skippy,"[peanut, nut, peanut-oil, oil-and-fat, vegetab...",0.996382,0.992438,0.99441
10566,Peanut butter,"[peanut, nut, sugar, added-sugar, disaccharide...",0.996382,0.983148,0.989765
53010,Peanut butter,"[peanut, nut, sugar, added-sugar, disaccharide...",0.996382,0.983148,0.989765


In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Haselnusskerne


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
678199,Noisettes Grillées,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
384237,Noisettes torréfiées en poudre issues de l'agr...,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
454053,Noisettes grillees sans sel,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
70313,"Seitenbacher, low carb food hazelnuts","[hazelnut, nut, tree-nut]",1.0,1.0,1.0
390242,Noisettes en poudre,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
519588,Haselnusskerne gemahlen,"[nut, tree-nut, hazelnut]",1.0,1.0,1.0
376560,Noisette complète en poudre,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
443122,Purée de noisette,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0
487507,Noisettes décortiquées,"[nut, tree-nut, hazelnut]",1.0,1.0,1.0
643867,Damiano Roasted Hazelnut Butter - Organic,"[hazelnut, nut, tree-nut]",1.0,1.0,1.0


In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Restaurant style white corn tortillas chips


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
301741,Tortilla Chips,"[corn, cereal, vegetable-oil, oil-and-fat, veg...",1.0,0.962887,0.981444
653949,Gallette di Mais Integrale Biologiche,"[corn, cereal, corn-oil, oil-and-fat, vegetabl...",0.938672,1.0,0.969336
44849,Fritos Lightly Salted Corn Chips 9.75 Ounce Pl...,"[corn, cereal, corn-oil, oil-and-fat, vegetabl...",0.910829,1.0,0.955415
553853,Maiswaffeln mit Meersalz,"[corn, cereal, corn-oil, oil-and-fat, vegetabl...",0.938672,0.971154,0.954913
516816,Maiswaffeln Meersalz,"[corn, cereal, sea-salt, salt, corn-oil, oil-a...",0.938672,0.971154,0.954913
652360,Gallette Mais e Quinoa,"[corn, cereal, corn-oil, oil-and-fat, vegetabl...",0.938672,0.971154,0.954913
49397,"Golden fluff, popcorn","[vegetable-oil, oil-and-fat, vegetable-oil-and...",0.972848,0.926619,0.949734
85419,"Schnucks, authentic restaurant style tortilla ...","[cereal, corn, vegetable-oil, oil-and-fat, veg...",0.887696,1.0,0.943848
674950,Maiz gigante sabor BBQ,"[corn, cereal, vegetable-oil, oil-and-fat, veg...",0.887696,1.0,0.943848
466426,Pop corn salé,"[corn, cereal, sunflower-oil, oil-and-fat, veg...",0.887886,0.997269,0.942578


In [None]:
df_temp = df.sample(50000)
product_name = df_temp["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name)
df_test.head(15)

Produits similaires au produit : Nature's rancher, ground organic chicken


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
321164,Ground chicken,"[chicken, poultry, e392]",1.0,1.0,1.0
102290,85% lean 15% fat ground turkey,"[turkey, poultry, e392]",0.998805,0.983604,0.991204
69984,Ground Turkey,"[turkey, poultry, e392]",0.998805,0.983604,0.991204
36586,Ground turkey,"[turkey, poultry, e392]",0.998805,0.983604,0.991204
102599,"Jennie-o, ground turkey","[turkey, poultry, e392]",0.998805,0.983604,0.991204
102484,Lean ground turkey,"[turkey, poultry, e392]",0.998805,0.983604,0.991204
40517,"Oven roasted chicken breast, oven roasted","[chicken-breast, poultry, chicken, chicken-mea...",1.0,0.807623,0.903812
225503,"Pine Manor Farms, Extra Lean Ground Chicken","[chicken-breast, poultry, chicken, chicken-meat]",0.79617,0.986675,0.891422
217391,Organic boneless and skinless chicken breast,"[chicken-breast, poultry, chicken, chicken-meat]",0.79617,0.986675,0.891422
196494,"No salt added chicken bone broth, chicken","[chicken-broth, poultry, broth, chicken, poult...",0.891577,0.890993,0.891285


Exemples de résultats sur tout le dataframe : 



In [24]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Lowes foods, macaroni & cheese


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
11169,Macaroni & cheese dinner,"[wheat-flour, cereal, flour, wheat, cereal-flo...",1.0,0.99547,0.997735
259516,"Lowes foods, macaroni spirals & cheese","[wheat-flour, cereal, flour, wheat, cereal-flo...",1.0,0.99547,0.997735
10993,Dinosaurs macaroni & cheese dinner',"[wheat-flour, cereal, flour, wheat, cereal-flo...",1.0,0.99547,0.997735
11172,Spiral dinner macaroni & cheese,"[wheat-flour, cereal, flour, wheat, cereal-flo...",1.0,0.99547,0.997735
10580,Macaroni & cheese dinner,"[wheat-flour, cereal, flour, wheat, cereal-flo...",1.0,0.99547,0.997735
247591,Macaroni and cheese dinner,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.998796,0.995323,0.99706
246448,Macaroni and cheese dinner,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.998796,0.995323,0.99706
195986,"Market pantry, macaroni & cheese dinner","[wheat-flour, cereal, flour, wheat, cereal-flo...",0.998796,0.995323,0.99706
6457,Macaroni & cheese dinner,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.998796,0.994027,0.996411
5921,Mac and cheese,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.998796,0.994027,0.996411


In [26]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Lemppari


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
596836,Flatbread Street Food,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.976443,0.961098,0.968771
346115,Stenovns Ciabatta Stykker,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.924376,0.998435,0.961405
596688,Reilu täysjyvä,"[whole-wheat-flour, cereal, flour, wheat, cere...",0.943876,0.97622,0.960048
322985,pain spécial,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.951694,0.965241,0.958467
450540,Pain spécial,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.951694,0.965241,0.958467
596689,Reilu,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.921169,0.99537,0.958269
408430,Boule Tranchée Complète,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.948029,0.96487,0.956449
391117,Demi baguettes précuite complètes,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.943125,0.958853,0.950989
653834,Mollete estilo andaluz,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.930122,0.97004,0.950081
596817,Vehnä paahto,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.921169,0.97886,0.950014


In [27]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Tomato Ketchup


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
675646,Ketchup à la tomate,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
675766,Hot Ketchup,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
478925,Ketchup à la tomate,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
478937,Tomato ketchup,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
220,Tomato Ketchup Heinz Ouverture En Bas,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
675564,Ketchup à la tomate,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
675701,Heinz Tomato Ketchup,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
560602,Emblématique ketchup HEINZ,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
647777,Tomato Ketchup (offre Découverte),"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0
675645,Tomato ketchup,"[tomato, vegetable, vinegar, sugar, added-suga...",1.0,1.0,1.0


In [28]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Mor Braz Bio Blonde (5%)


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
578098,Bière belge Victoria blonde,"[water, hops, plant, cereal, yeast]",1.0,1.0,1.0
489739,Briarde - Ambrée,"[cereal, water, hops, plant, yeast]",1.0,1.0,1.0
501947,On the top,"[water, malt, cereal, wheat, hops, plant, yeast]",1.0,0.969785,0.984892
568604,Hoppel Hammer,"[malt, cereal, wheat, hops, plant, water, yeast]",1.0,0.969785,0.984892
539490,Urstrom,"[water, malt, cereal, yeast, hops, plant]",1.0,0.967949,0.983974
500916,L'Eurélienne Blanche,"[water, malt, cereal, hops, plant, yeast]",1.0,0.967949,0.983974
498236,Lager des étoiles,"[water, malt, cereal, hops, plant, yeast]",1.0,0.967949,0.983974
498237,Gens de la Lune,"[water, malt, cereal, hops, plant, yeast]",1.0,0.967949,0.983974
597438,Terapia Platin - Bere albă nefiltrată,"[water, hops, plant, malt, cereal, yeast]",1.0,0.967949,0.983974
485111,Gallia Brut IPA,"[water, malt, cereal, hops, plant, yeast]",1.0,0.967949,0.983974


In [29]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Légumes vapeur assaisonnés - Trio de haricots et poivrons


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
472770,Légumes vapeur assaisonnés - Sélection de 4 lé...,"[garden-peas, legume, pea, green-peas, green-b...",0.977088,0.983856,0.980472
394991,Légumes Vapeur haricots beurre et plat et poiv...,"[legume, green-bean, vegetable, bell-pepper, r...",0.924962,1.0,0.962481
457723,Légumes Méditerranéens,"[water, extra-virgin-olive-oil, oil-and-fat, v...",0.915054,1.0,0.957527
394992,"Légumes Vapeur petits pois, haricots verts et ...","[green-bean, legume, pea, water, extra-virgin-...",0.914918,0.998499,0.956708
348052,Mélange de légumes et pommes de terre surgelé,"[vegetable, root-vegetable, carrot, legume, gr...",0.939927,0.955844,0.947885
404714,Purée de Céleris - Surgelé,"[celery, vegetable, water, butterfat, dairy, o...",0.89561,1.0,0.947805
374229,Les légumes à la Printanière,"[vegetable, root-vegetable, onion, butterfat, ...",0.927186,0.966873,0.94703
394990,"Légumes Vapeur chou fleur, chou romanesco, bro...","[vegetable, cauliflower, root-vegetable, carro...",0.901705,0.986765,0.944235
363015,Poêlée ratatouille cuisinée,"[vegetable, water, tomato-concentrate, tomato,...",0.901893,0.981897,0.941895
388179,Petits Mélanges vapeur,"[vegetable, broccoli, cauliflower, water, extr...",0.894259,0.987462,0.940861


In [33]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Chicken Breast Nuggets With Rib Meat


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
288073,Chicken Breast Patties,"[water, modified-starch, starch, salt, sugar, ...",0.992367,0.999187,0.995777
288074,Breaded full cooked chicken breast tenders wit...,"[water, modified-starch, starch, salt, sodium,...",0.981804,0.99914,0.990472
288072,Breaded Fully Cooked Chicken Breast Rings With...,"[water, modified-starch, starch, salt, sodium,...",0.981804,0.99914,0.990472
315681,Fully cooked portioned chicken fillet white me...,"[chicken-breast, poultry, chicken, chicken-mea...",0.967513,0.957202,0.962357
210670,"Corn veggie tots, corn","[soya-oil, oil-and-fat, vegetable-oil-and-fat,...",0.967539,0.951326,0.959432
210655,"Cauliflower veggie tots, cauliflower","[cauliflower, vegetable, soya-oil, oil-and-fat...",0.967539,0.949927,0.958733
210656,"Broccoli veggie tots, broccoli","[broccoli, vegetable, soya-oil, oil-and-fat, v...",0.967539,0.948916,0.958227
210671,"Sweet potato & cauliflower veggie tots, sweet ...","[sweet-potato, vegetable, root-vegetable, caul...",0.969926,0.946115,0.958021
158454,Chicken Fried Steak Breading Mix,"[wheat-flour, cereal, flour, wheat, cereal-flo...",0.921436,0.986562,0.953999
316071,Chicken Rings,"[water, coating, reduced-iron, minerals, iron,...",0.967613,0.937969,0.952791


In [31]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Procacci Brothers, Italian Chestnuts


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
379585,Châtaignes entières,"[nut, tree-nut]",1.0,1.0,1.0
354744,"Marrons Entiers Sous Vide, à L'étouffée","[nut, tree-nut]",1.0,1.0,1.0
381317,Marronen,"[nut, tree-nut]",1.0,1.0,1.0
583067,Erdmandel Mehl,"[nut, tree-nut]",1.0,1.0,1.0
230027,"Elizabeth's naturals, raw macadamia nuts","[nut, tree-nut]",1.0,1.0,1.0
381316,Ponthier : Gekochte Maronen,"[nut, tree-nut]",1.0,1.0,1.0
381315,Marrons cuits,"[nut, tree-nut]",1.0,1.0,1.0
502610,Châtaignes entières bio,"[nut, tree-nut]",1.0,1.0,1.0
155675,"Mauna loa, dry roasted macadamia","[nut, tree-nut]",1.0,1.0,1.0
325224,Châtaigne Bouche Rouge G2 CAT 2 BIO France ~5kg,"[nut, tree-nut]",1.0,1.0,1.0


In [32]:
product_name = df["product_name"].sample().iloc[0]
print(f"Produits similaires au produit : {product_name}")
df_test = find_similar_products_max_pooling_both_waysV3_multiprocessing(df, product_name)
df_test.head(12)

Produits similaires au produit : Hot Cocoa With Natural & Artificial Flavors Of Peanut Butter Cup & Fugge


Unnamed: 0,product_name,liste_ingredients,similarity1,similarity2,mean_similarity
126839,Rich & Creamy Hot Cocoa Beverage Mix,"[sugar, added-sugar, disaccharide, corn-syrup-...",1.0,0.989084,0.994542
299781,"Duck Commander, Duck-Cups Cocoa Coffee, Uncle ...","[sugar, added-sugar, disaccharide, cocoa, salt...",0.994241,0.984349,0.989295
80851,Milk chocolate hot cocoa drink mix,"[sugar, added-sugar, disaccharide, corn-syrup-...",0.994241,0.983445,0.988843
32483,Hot Cocoa Drink Mix,"[sugar, added-sugar, disaccharide, corn-syrup-...",0.994241,0.983445,0.988843
84146,"Milk chocolate flavor hot cocoa drink mix, mil...","[sugar, added-sugar, disaccharide, corn-syrup-...",0.994241,0.983445,0.988843
34508,"Milk chocolate hot cocoa drink mix, milk choco...","[sugar, added-sugar, disaccharide, corn-syrup-...",0.994241,0.983445,0.988843
314864,Hot Cocoa,"[sugar, added-sugar, disaccharide, cocoa, skim...",0.994241,0.978067,0.986154
58564,French vanilla cappuccino mix,"[corn-syrup-solids, added-sugar, disaccharide,...",1.0,0.972197,0.986098
132962,Mild coffee and hazelnut flavor cappuccino mix...,"[corn-syrup-solids, added-sugar, disaccharide,...",1.0,0.972197,0.986098
132961,Mild coffee and french vanilla flavor cappucci...,"[corn-syrup-solids, added-sugar, disaccharide,...",1.0,0.972197,0.986098


#### Essai de calcul des similarités pour 100 produits avec un dataset de 10 000 produits

In [None]:
%%time
df_temp = df.sample(10000)
products_similarities={}
for i in tqdm(range(100)): 
    product_name = df_temp["product_name"].sample().iloc[0]
    products_similarities[product_name] = find_similar_products_max_pooling_both_waysV3_multiprocessing(df_temp, product_name).set_index("product_name")["mean_similarity"].to_dict()

100%|██████████| 100/100 [04:06<00:00,  2.46s/it]

CPU times: user 32.4 s, sys: 31.9 s, total: 1min 4s
Wall time: 4min 6s



