#Sistema di raccomandazione
Il sistema implementa un approccio di raccomandazione one-time, evitando la costruzione di matrici di similarità NxN. Applica un filtro hard sulla categoria e combina una similarità basata su ingredienti con una funzione deterministica di brand e segmento, pesate rispettivamente 0.8 e 0.2.



In [1]:
import pandas as pd
import numpy as np



1) Importo il df e creo un nuovo df con le colonne d'interesse



In [2]:
df = pd.read_csv("brand_categoria.csv",dtype={"code": str})

df = df[['code', 'product_name','brand_name', 'brand_segment','macro_category']].copy()
df['code'] = df['code'].astype(str)
df['product_name'] = df['product_name'].astype(str)
df['brand_name'] = df['brand_name'].astype(str)
df['brand_segment'] = df['brand_segment'].astype(str)
df['macro_category'] = df['macro_category'].astype(str)


df.head(20)



Unnamed: 0,code,product_name,brand_name,brand_segment,macro_category
0,5013965698897,1,1,mass_market,Other
1,88582000939,2,1,mass_market,Other
2,650240025396,zan zusi b b flash 30,1,mass_market,Other
3,3560070084074,deo men marine,1 de carrefour,mass_market,Deodorants
4,6134598000044,savon liquide main life,100 da,mass_market,Hygiene
5,5943044010046,gerocossen,12,mass_market,Other
6,20546472,cien,12,mass_market,Other
7,3700216252688,glycerine vegetale,123gelules,mass_market,Other
8,8859423200212,habino,13000,mass_market,Other
9,6130460000747,,150da,mass_market,Other


In [4]:
df_ing = pd.read_csv("ingr_cat.csv",dtype={"code": str})


df_ing['code'] = df_ing['code'].astype(str)
df_ing['product_name'] = df_ing['product_name'].astype(str)
df_ing['macro_category'] = df_ing['macro_category'].astype(str)
df_ing['inci_name'] = df_ing['inci_name'].astype(str)

df_finale = (
    df_ing
    .groupby(['code', 'product_name', 'macro_category'])['inci_name']
    .apply(lambda lst: ", ".join(sorted(set(lst))))
    .reset_index()
    .rename(columns={'inci_name': 'ingredients'}))

df_finale.head(20)


Unnamed: 0,code,product_name,macro_category,ingredients
0,14,Süßlupinen Mehl,Other,"Arnica Montana, Avoid Contact With Eyes, Burit..."
1,311,Huile végétale Amande douce,Oils,Oil
2,511,pure collagène marin,Other,"Added Sugarss, Colorings, Cr 6Ad, Croydon, Lac..."
3,561,Huile végétale coco,Oils,Oil
4,587,Macérât huileux Lys,Oils,"Lilium Candidum Flower Extract, Oil"
5,695,olé olé Aloé,Other,"Allantoin, Aloe Barbadensis Leaf Juice, Benzoi..."
6,859,Base lavante,Makeup,"Coco-Glucoside, Decyl Glucoside, Glycerin, Gly..."
7,905,Henné d'Egypte,Other,Lawsonia Inermis Leaf Powder
8,1277,Indian Healing Clay,Other,Natural Calcium Bentonite Clay
9,160332625,Colgate toothpaste,Hygiene,"Cellulose Gum, Citric Acid, Cocamidopropyl Bet..."


In [5]:
#per la semplificazione della ricerca in fase di test, è una lista di "code" in comune tra i dataset
common_codes = set(df['code']).intersection(set(df_finale['code']))

print(f"Prodotti in df: {len(df)}")
print(f"Prodotti in df_finale: {len(df_finale)}")
print(f"Prodotti comuni: {len(common_codes)}")
common_codes

Prodotti in df: 31056
Prodotti in df_finale: 15131
Prodotti comuni: 12304


{'5293796944527',
 '3401351102882',
 '3600531512088',
 '8710908315244',
 '3250391292738',
 '3245678675090',
 '42354840',
 '8001090516152',
 '30173552',
 '7509552844221',
 '4063528017796',
 '4005808925124',
 '5765228838471',
 '3245678562901',
 '8025796003822',
 '7640183490699',
 '8021983810051',
 '2000000000727',
 '5010123726621',
 '5063334005894',
 '4015600862459',
 '3054080045465',
 '3605971307928',
 '3760075070045',
 '3401346674455',
 '4088700011775',
 '20908621',
 '4005900035622',
 '3401560062816',
 '8720181042638',
 '8712561803847',
 '4058172432675',
 '3760099591496',
 '4015000702652',
 '8718951508507',
 '4066447705027',
 '0606345805944',
 '7899846080665',
 '3606000537613',
 '4005808368303',
 '5035832010274',
 '8410412460026',
 '3052505505105',
 '42232216',
 '5900095002789',
 '4017645021082',
 '26023229',
 '0883140040231',
 '20055929',
 '8436045034052',
 '2000000000894',
 '4005900551559',
 '8032755623021',
 '0667557015040',
 '4021457635436',
 '4066447380439',
 '8590031108841',
 '37

**2. Assegnazione pesi ai brand_segment.**

Il sistema di suggerimento tiene conto del segmento di mercato e pesa la similarità in base ad esso.
Prodotti dello stesso segmento avranno peso maggiore.

In [6]:
SEGMENT_SIMILARITY = {
    ('middle', 'middle'): 0.7,
    ('mass_market', 'mass_market'): 0.7,
    ('luxury', 'luxury'): 0.7,

    ('mass_market', 'middle'): 0.5,
    ('middle', 'mass_market'): 0.5,

    ('mass_market', 'luxury'): 0.3,
    ('luxury', 'mass_market'): 0.3,

    ('middle', 'luxury'): 0.5,
    ('luxury', 'middle'): 0.5
}


**3. Creazione dell'array di similarità 1xK.**

Per il brand non uso una cosine similarity, ma una funzione deterministica basata su regole di business che tengono conto sia dell’identità del brand sia del segmento di mercato.

Il sistema applica un hard filter sulla categoria per ridurre lo spazio di ricerca e garantire coerenza semantica, e successivamente calcola una similarità pesata basata su ingredienti e su regole di brand e segmento.



In [7]:
def brand_segment_similarity_1xK(
    query_product_id,
    df,
    category_col='macro_category',
    segment_col='brand_segment'
):
    query_product_id = str(query_product_id)

    if query_product_id not in df['code'].values:
        raise ValueError("Codice prodotto non trovato")

    query_row = df[df['code'] == query_product_id].iloc[0]
    query_category = query_row[category_col]
    query_brand = query_row['brand_name']
    query_segment = query_row[segment_col]

    # Filtro hard: stessa categoria -> ricerca tra prodotti della stessa categoria
    df_filtered = df[df[category_col] == query_category].copy()
    df_filtered = df_filtered[df_filtered['code'] != query_product_id]
    #filtro per evitare che restituisca lo stesso prodotto
    df_filtered['query_brand_name'] = query_brand

    similarities = []

    for _, row in df_filtered.iterrows():
        if row['brand_name'] == query_brand:
            similarities.append(1.0)#se il prodotto appartiene allo stesso brand avrà similarità massima
        else:
            pair = (query_segment, row[segment_col]) #altrimenti considera il segmento di mercato
            similarities.append(SEGMENT_SIMILARITY.get(pair, 0.0))
    df_filtered['similarity'] = similarities
    return np.array(similarities), df_filtered




4. Applicazione della funzione e creazione dell'aray one-time a partire dal prodotto


In [8]:
query_id = input('Inserisci il codice del prodotto: ')

brand_sim = brand_segment_similarity_1xK(query_id, df)
#non ottengo una matrice di similarità NXN, ma un array 1xK:
#cioè la similarità del prodotto query rispetto a tutti gli altri prodotti della stessa categoria



#brand_sim è una tupla: il primo elemento è l'array 1xK, il secondo elemento è il dataframe con i prodotti candidati
#brand_sim == (similarity_array, df_filtered).
#la tupla serve a sapere a quali prodotti mi riferisco
brand_sim_array, df_brand_candidates = brand_sim



In [9]:
brand_sim_array[:10] #vedo i primi dieci valori di similarità

array([0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3])

In [10]:
df_brand_candidates.sort_values('similarity', ascending= False).head(10) #vedo i primi dieci candidati


Unnamed: 0,code,product_name,brand_name,brand_segment,macro_category,query_brand_name,similarity
10873,3401399693984,mousse demaquillante,filorga,luxury,Other,filorga,1.0
10868,3401561313061,bb perfect,filorga,luxury,Other,filorga,1.0
10870,3401360156456,ncef essence,filorga,luxury,Other,filorga,1.0
11990,3346470122000,shalimar,guerlain,luxury,Other,filorga,0.7
13265,6291106031454,desert dusk palette,huda beauty,luxury,Other,filorga,0.7
3098,25514145003,matte clay,beesmade,luxury,Other,filorga,0.7
14343,3474630477056,elixir k ultime,kerastase,luxury,Other,filorga,0.7
7719,8004608242147,momo,davines,luxury,Other,filorga,0.7
5566,3348900012189,fahrenheit,christian dior,luxury,Other,filorga,0.7
12159,8809653230930,pore cleansing balm aha,hanskin,luxury,Other,filorga,0.7


5. Con questa funzione si trovano i prodotti più simili a un prodotto dato, guardando solo gli ingredienti. Prima limita il confronto ai prodotti della stessa categoria, così il confronto ha senso.
Poi confronta gli ingredienti del prodotto scelto con quelli degli altri e dice quanto si assomigliano, usando un punteggio numerico.

In [11]:
#pulizia per esclusione ingredienti che potrebbero compromettere il corretto
#calcolo della similarità
EXCLUDE_INGREDIENTS = {
    'aqua', 'water', 'parfum', 'fragrance',
    'ci', 'color', 'colour',
    'sodium chloride'
}

def clean_ingredients(ing_string):
    ing = [
        i.strip().lower()
        for i in ing_string.split(',')
        if len(i.strip()) > 2
    ]
    ing = [
        i for i in ing
        if not any(x in i for x in EXCLUDE_INGREDIENTS)
        and not i.startswith('ci ')
    ]
    return list(set(ing))

In [12]:
df_finale['ingredients_list'] = df_finale['ingredients'].apply(clean_ingredients)

df_finale[['ingredients', 'ingredients_list']].head(10)

Unnamed: 0,ingredients,ingredients_list
0,"Arnica Montana, Avoid Contact With Eyes, Burit...","[mucous, buriti oil, menthol, tocopherol, phen..."
1,Oil,[oil]
2,"Added Sugarss, Colorings, Cr 6Ad, Croydon, Lac...","[cr 6ad, preservatives, added sugarss, croydon..."
3,Oil,[oil]
4,"Lilium Candidum Flower Extract, Oil","[oil, lilium candidum flower extract]"
5,"Allantoin, Aloe Barbadensis Leaf Juice, Benzoi...","[aloe barbadensis leaf juice, phenoxyethanol, ..."
6,"Coco-Glucoside, Decyl Glucoside, Glycerin, Gly...","[sodium cocoamphoacetate, sodium cocoyl glutam..."
7,Lawsonia Inermis Leaf Powder,[lawsonia inermis leaf powder]
8,Natural Calcium Bentonite Clay,[]
9,"Cellulose Gum, Citric Acid, Cocamidopropyl Bet...","[sodium saccharin, tetrasodium pyrophosphate, ..."


In [13]:
def jaccard_similarity(set_a, set_b):
    if not set_a or not set_b:
        return 0.0
    return len(set_a & set_b) / len(set_a | set_b)

In [14]:
def ingredient_similarity_1xK(
    query_product_id,
    df_finale,
    category_col='macro_category'
):
    query_product_id = str(query_product_id)

    if query_product_id not in df_finale['code'].values:
        raise ValueError("Codice prodotto non trovato")

    query_row = df_finale[df_finale['code'] == query_product_id].iloc[0]
    query_category = query_row[category_col]
    query_ing = set(query_row['ingredients_list'])

    # Filtro hard: stessa categoria
    df_filtered = df_finale[df_finale[category_col] == query_category].copy()
    df_filtered = df_filtered[df_filtered['code'] != query_product_id]

    print(f"Trovati {len(df_filtered)} prodotti nella stessa categoria")

    similarities = []

    for _, row in df_filtered.iterrows():
        sim = jaccard_similarity(
            query_ing,
            set(row['ingredients_list'])
        )
        similarities.append(sim)

    return np.array(similarities), df_filtered



In [15]:
ing_sim_array, df_ing_candidates = ingredient_similarity_1xK(query_id, df_finale)

ing_sim_array[:10]

Trovati 5582 prodotti nella stessa categoria


array([0.02409639, 0.        , 0.02531646, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.07228916, 0.        ])

In [16]:
#aggiunta similarità e is_query per vedere quali sono i prod raccomandati e 
#qual è il prodotto originale cercato
df_ing_candidates_temp = df_ing_candidates[df_ing_candidates['code'] != query_id].copy()
df_ing_candidates_temp['ing_sim'] = ing_sim_array
df_ing_candidates_temp['is_query'] = False
df_ing_candidates_temp = df_ing_candidates_temp.sort_values(by='ing_sim', ascending=False).reset_index(drop=True)


df_query_ing = df_finale[df_finale['code'] == query_id].copy()
df_query_ing['ing_sim'] = 1.0        # similarità massima con se stesso
df_query_ing['is_query'] = True

df_ing_candidates = pd.concat(
    [df_query_ing, df_ing_candidates_temp],
    ignore_index=True
)

df_ing_candidates.head(10)



Unnamed: 0,code,product_name,macro_category,ingredients,ingredients_list,ing_sim,is_query
0,3401351102882,Filorga Hydra Filler 50ML,Other,"1, 2-Hexanediol, Alanine, Aminobutyric Acid, A...","[phenylalanine, asparagine, glucose, sodium gl...",1.0,True
1,769915190731,Natural Moisturizing Factors + HA,Other,"Alanine, Allantoin, Arginine, Aspartic Acid, C...","[cetyl alcohol, phenylalanine, glucose, trehal...",0.157895,False
2,8437018454099,averac caviar,Other,"Arginine, Ascorbyl Palmitate, Aspartic Acid, B...","[caviar extract, phenylalanine, carragenan pul...",0.141593,False
3,850056933285,Leave In - Numéro 5,Other,"Alanine, Algin, Arginine, Aspartic Acid, Astro...","[cetyl alcohol, fructooligosaccharides, phenyl...",0.125984,False
4,850155008532,Beauty insider,Other,"1, 2-Hexanediol, Acetyl Glutamine, Alanine, Al...","[phenylalanine, aloe barbadensis leaf extract,...",0.115702,False
5,769915150100,Multi-Molecular Hyaluronic Complex (MMHC2),Other,"1, 2-Hexanediol, Alanine, Algae Extract, Argin...","[phenylalanine, gallyl glucoside, phenoxyethan...",0.115044,False
6,3700454229541,Structu'r'temps jour,Other,"Acmella Oleracea Extract, Alpha-Isomethyl Iono...","[potassium sorbate, ethylhexyl palmitate, cycl...",0.112245,False
7,2021092899998,Hello Mirror guess my age,Other,"Alpha-Isomethyl Lonone, Anhydroxylitol, Arachi...","[glucose, butylphenyl methylpropional, behenyl...",0.111111,False
8,3700914601962,Princesse de jour,Other,"Algae Extract, Arachidyl Alcohol, Arachidyl Gl...","[cetery alcohol, behenyl alcohol, phenoxyethan...",0.10989,False
9,5904858674268,Odżywka do włosów zniszczonych,Other,"Alanine, Arginine, Aspartic Acid, Behentrimoni...","[cetrimonium chloride, laminaria digitata extr...",0.107527,False


In [17]:
query_product_ingredients = set(
    df_finale[df_finale['code'] == query_id]['ingredients_list'].iloc[0]
)

def get_common_ingredients(candidate_ingredients):
    return list(query_product_ingredients.intersection(set(candidate_ingredients)))

df_ing_candidates['common_ingr'] = df_ing_candidates['ingredients_list'].apply(get_common_ingredients)

df_ing_candidates[['code', 'product_name', 'ingredients_list', 'common_ingr']].head(10)

Unnamed: 0,code,product_name,ingredients_list,common_ingr
0,3401351102882,Filorga Hydra Filler 50ML,"[phenylalanine, asparagine, glucose, sodium gl...","[phenylalanine, asparagine, glucose, sodium gl..."
1,769915190731,Natural Moisturizing Factors + HA,"[cetyl alcohol, phenylalanine, glucose, trehal...","[phenylalanine, valine, glucose, alanine, chlo..."
2,8437018454099,averac caviar,"[caviar extract, phenylalanine, carragenan pul...","[phenylalanine, tyrosine, valine, chlorphenesi..."
3,850056933285,Leave In - Numéro 5,"[cetyl alcohol, fructooligosaccharides, phenyl...","[phenylalanine, valine, alanine, sodium hyalur..."
4,850155008532,Beauty insider,"[phenylalanine, aloe barbadensis leaf extract,...","[phenylalanine, valine, 2-hexanediol, alanine,..."
5,769915150100,Multi-Molecular Hyaluronic Complex (MMHC2),"[phenylalanine, gallyl glucoside, phenoxyethan...","[phenylalanine, valine, 2-hexanediol, alanine,..."
6,3700454229541,Structu'r'temps jour,"[potassium sorbate, ethylhexyl palmitate, cycl...","[chlorphenesin, dimethicone, sodium hyaluronat..."
7,2021092899998,Hello Mirror guess my age,"[glucose, butylphenyl methylpropional, behenyl...","[glucose, behenyl alcohol, sodium hyaluronate,..."
8,3700914601962,Princesse de jour,"[cetery alcohol, behenyl alcohol, phenoxyethan...","[chlorphenesin, behenyl alcohol, sodium hyalur..."
9,5904858674268,Odżywka do włosów zniszczonych,"[cetrimonium chloride, laminaria digitata extr...","[valine, alanine, potassium sorbate, glycerin,..."


In [18]:
ing_sim_array[:10]

array([0.02409639, 0.        , 0.02531646, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.07228916, 0.        ])

In [19]:
df_ing_candidates.head(10)

Unnamed: 0,code,product_name,macro_category,ingredients,ingredients_list,ing_sim,is_query,common_ingr
0,3401351102882,Filorga Hydra Filler 50ML,Other,"1, 2-Hexanediol, Alanine, Aminobutyric Acid, A...","[phenylalanine, asparagine, glucose, sodium gl...",1.0,True,"[phenylalanine, asparagine, glucose, sodium gl..."
1,769915190731,Natural Moisturizing Factors + HA,Other,"Alanine, Allantoin, Arginine, Aspartic Acid, C...","[cetyl alcohol, phenylalanine, glucose, trehal...",0.157895,False,"[phenylalanine, valine, glucose, alanine, chlo..."
2,8437018454099,averac caviar,Other,"Arginine, Ascorbyl Palmitate, Aspartic Acid, B...","[caviar extract, phenylalanine, carragenan pul...",0.141593,False,"[phenylalanine, tyrosine, valine, chlorphenesi..."
3,850056933285,Leave In - Numéro 5,Other,"Alanine, Algin, Arginine, Aspartic Acid, Astro...","[cetyl alcohol, fructooligosaccharides, phenyl...",0.125984,False,"[phenylalanine, valine, alanine, sodium hyalur..."
4,850155008532,Beauty insider,Other,"1, 2-Hexanediol, Acetyl Glutamine, Alanine, Al...","[phenylalanine, aloe barbadensis leaf extract,...",0.115702,False,"[phenylalanine, valine, 2-hexanediol, alanine,..."
5,769915150100,Multi-Molecular Hyaluronic Complex (MMHC2),Other,"1, 2-Hexanediol, Alanine, Algae Extract, Argin...","[phenylalanine, gallyl glucoside, phenoxyethan...",0.115044,False,"[phenylalanine, valine, 2-hexanediol, alanine,..."
6,3700454229541,Structu'r'temps jour,Other,"Acmella Oleracea Extract, Alpha-Isomethyl Iono...","[potassium sorbate, ethylhexyl palmitate, cycl...",0.112245,False,"[chlorphenesin, dimethicone, sodium hyaluronat..."
7,2021092899998,Hello Mirror guess my age,Other,"Alpha-Isomethyl Lonone, Anhydroxylitol, Arachi...","[glucose, butylphenyl methylpropional, behenyl...",0.111111,False,"[glucose, behenyl alcohol, sodium hyaluronate,..."
8,3700914601962,Princesse de jour,Other,"Algae Extract, Arachidyl Alcohol, Arachidyl Gl...","[cetery alcohol, behenyl alcohol, phenoxyethan...",0.10989,False,"[chlorphenesin, behenyl alcohol, sodium hyalur..."
9,5904858674268,Odżywka do włosów zniszczonych,Other,"Alanine, Arginine, Aspartic Acid, Behentrimoni...","[cetrimonium chloride, laminaria digitata extr...",0.107527,False,"[valine, alanine, potassium sorbate, glycerin,..."


**5. Indicizzazione su codice prodotto**

Costruzione nuovo DF: unione dei due dataframe sul codice prodotto al fine del calcolo della similarità finale.

In [21]:
df_ing= df_ing_candidates[['code', 'ing_sim']].copy()

df_brand = df_brand_candidates[['code']].copy()
df_brand['brand_sim'] = brand_sim_array

#allineo la similarità
df_merged = df_ing.merge(
    df_brand,
    on='code',
    how='inner'
)


In [23]:
#check per verificare che l'allineamento è corretto
assert len(df_merged) > 0
assert df_merged['code'].is_unique

df_merged.head()


Unnamed: 0,code,ing_sim,brand_sim
0,769915190731,0.157895,0.3
1,850056933285,0.125984,0.3
2,850155008532,0.115702,0.7
3,769915150100,0.115044,0.7
4,3700454229541,0.112245,0.5


**6. Calcolo della similarità finale**

Gli ingredienti hanno peso maggiore nel determinare il prodotto più simile (0.8). I brand hanno peso minore (0.2).


In [24]:
df_merged['final_similarity'] = (
    0.8 * df_merged['ing_sim'] +
    0.2 * df_merged['brand_sim']
)


**6. Costruzione dataframe per la visualizzazione del prodotto più simile**


In [25]:
query_product_name = (
    df.loc[df['code'] == query_id, 'product_name']
    .iloc[0]
)
risultato = pd.DataFrame({
    'query_product_id': query_id,
    'query_product_name': query_product_name,
    'recommended_product_id': df_merged['code'],
    'final_similarity_score': df_merged['final_similarity'],

})

#ordino i prodotti per suggerire il prodotto più simile
risultato = risultato.sort_values(
    by='final_similarity_score',
    ascending=False
).reset_index(drop=True)

risultato['rank'] = risultato.index + 1

# arricchimento informativo (nome, brand, segmento)
risultato = risultato.merge(
    df[['code', 'product_name', 'brand_name', 'brand_segment']],
    left_on='recommended_product_id',
    right_on='code',
    how='left'
).drop(columns='code')

risultato = risultato[
    [
        'rank',
        'query_product_id',
        'query_product_name',
        'recommended_product_id',
        'product_name',
        'brand_name',
        'brand_segment',
        'final_similarity_score'
    ]
]

risultato.head(10)



Unnamed: 0,rank,query_product_id,query_product_name,recommended_product_id,product_name,brand_name,brand_segment,final_similarity_score
0,1,3401351102882,filorga hydra filler 50ml,850155008532,beauty insider,sephora,luxury,0.232562
1,2,3401351102882,filorga hydra filler 50ml,769915150100,multi molecular hyaluronic complex mmhc2,niod,luxury,0.232035
2,3,3401351102882,filorga hydra filler 50ml,651986701308,born this way,too faced cosmetics,luxury,0.208085
3,4,3401351102882,filorga hydra filler 50ml,8500033101122,tula bright start,tula,luxury,0.206667
4,5,3401351102882,filorga hydra filler 50ml,3760096762141,baume divin,qiriness,luxury,0.203636
5,6,3401351102882,filorga hydra filler 50ml,641628005208,dynamic resurfacing facial wash,elemis,luxury,0.196
6,7,3401351102882,filorga hydra filler 50ml,7640114110887,le wrap exfolys,qiriness,luxury,0.191064
7,8,3401351102882,filorga hydra filler 50ml,3700454229541,structu r temps jour,auriege,middle,0.189796
8,9,3401351102882,filorga hydra filler 50ml,773602657070,studio radiance 24hr luminous lift concealer nc11,mac,luxury,0.189123
9,10,3401351102882,filorga hydra filler 50ml,3337871328795,liftactiv supreme peaux normales a mixte,vichy,luxury,0.18898
