## Perfume Scent Searcher Model

This project aims to create search functions that allow users to find perfumes that match their desired scents.

The functions created:
* TF-IDF-based search function
* USE-based search function

Please note this is an extension to a [fragrance recommendation system project](https://github.com/belleam/perfume). Full details of data cleaning can be found in [this notebook](https://github.com/belleam/perfume/blob/main/perfume.ipynb).

The dataset is from [Kaggle](https://www.kaggle.com/datasets/nandini1999/perfume-recommendation-dataset) and was scraped from the fragrance retailer Luckyscent's e-commerce website.

In [1]:
## Initalising libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics.pairwise import cosine_similarity
import re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
import tensorflow_hub as hub

### The data

In [2]:
## Loading the dataset
perfume_df = pd.read_csv('perfume_df.csv')
perfume_df.head()

Unnamed: 0,Name,Brand,Description,Notes,Image URL,Description_clean,Notes_clean
0,Tihota Eau de Parfum,Indult,"Rapa Nui for sugar, Tihota is, quite simply, T...","vanilla bean, musk",https://static.luckyscent.com/images/products/...,rapa nui sugar tihota quite simply one one cal...,"vanilla bean, musk"
1,Sola Parfum,Di Ser,A tribute to the expanse of space extending fr...,"lavender, yuzu, lemongrass, magnolia, geranium...",https://static.luckyscent.com/images/products/...,tribute expanse space extending sky flower fru...,"lavender, yuzu, lemongrass, magnolia, geranium..."
2,Kagiroi Parfum,Di Ser,An aromatic ode to the ancient beauty of Japan...,"green yuzu, green shikuwasa, sansho seed, cori...",https://static.luckyscent.com/images/products/...,aromatic ode ancient beauty japan kagiroi repr...,"green yuzu, green shikuwasa, sansho seed, cori..."
3,Velvet Fantasy Eau de Parfum,Montale,Velvet Fantasy is a solar fragrance where citr...,"tangerine, pink pepper, black coffee, leather,...",https://static.luckyscent.com/images/products/...,velvet fantasy solar fragrance citrus velvety ...,"tangerine, pink pepper, black coffee, leather,..."
4,A Blvd. Called Sunset Eau de Parfum,A Lab on Fire,There's no way A Lab On Fire could relocate to...,"bergamot, almond, violet, jasmine, leather, sa...",https://static.luckyscent.com/images/products/...,there's way lab fire could relocate los angele...,"bergamot, almond, violet, jasmine, leather, sa..."


### Scent Searcher #1

Made with TF-IDF

In [3]:
## Construct TF-IDF matrix
tfidf = TfidfVectorizer(stop_words='english')
database = tfidf.fit_transform(perfume_df['Notes_clean'])

In [4]:
## Function that returns recommended fragrances and scores
def scent_search(query):
    query_vec = tfidf.transform([query])
    scores = query_vec.dot(database.transpose())
    scores_array = scores.toarray()[0]
    sorted_indices = scores_array.argsort()[::-1]
    for position, idx in enumerate(sorted_indices[:5]):
        print('Score:', scores_array[idx], '|', perfume_df['Name'].iloc[idx])

scent_search('bergamot iris lily')

Score: 0.6770140004392624 | Morn To Dusk Eau de Parfum
Score: 0.5477605664857541 | Odalisque Eau de Parfum
Score: 0.4847664720910013 | Hakka Eau de Parfum
Score: 0.44390218130302994 | Feminin Pluriel Eau de Parfum
Score: 0.41625519463486677 | All That Matters Eau de Parfum


In [5]:
## Function that returns recommended fragrances, brands, and notes
def scent_search(query):
    query_vec = tfidf.transform([query])
    scores = query_vec.dot(database.transpose())
    scores_array = scores.toarray()[0]
    sorted_indices = scores_array.argsort()[::-1]
    results = enumerate(sorted_indices[:5])
    perfume_indices = [i[1] for i in results]
    return perfume_df[['Name', 'Brand', 'Notes']].iloc[perfume_indices]

scent_search('bergamot iris lily')

Unnamed: 0,Name,Brand,Notes
331,Morn To Dusk Eau de Parfum,Eau d'Italie,"bergamot, lily, vanilla, musk"
2132,Odalisque Eau de Parfum,PARFUMS DE NICOLAI,"lily of the valley, jasmine, iris root"
491,Hakka Eau de Parfum,J-Scent,"bergamot, mint, green leaves, jasmine, lily, i..."
1584,Feminin Pluriel Eau de Parfum,Maison Francis Kurkdjian,"iris, violet, rose, jasmine, lily-of-the-valle..."
277,All That Matters Eau de Parfum,Anamor,"musk, sandalwood, lily of the valley"


### Scent Searcher #2

Made with a Universal Sentence Encoder (USE)

In [6]:
## Load the model
embed = hub.load('https://tfhub.dev/google/universal-sentence-encoder/4')

## Fit the model
embeddings = embed(perfume_df['Notes_clean'])

In [7]:
## Function that returns recommended fragrances and scores
def search(query):
    query = [query]
    query_emb = embed(query)
    linear_similarities = linear_kernel(query_emb, embeddings).flatten() 
    index = linear_similarities.argsort()[:-6:-1]
    linear_similarities.sort()
    for position, idx in enumerate(index):
        print(perfume_df['Name'].iloc[idx])
    for position, score in enumerate(linear_similarities[:-6:-1]):
        print('Scores:', score)

search('bergamot iris lily')

Le Cri De La Lumiere Eau de Parfum
Odalisque Eau de Parfum
Scarlet Lily Eau de Parfum
10 AM Accord Eau de Parfum
Cedar Woodpecker Eau de Parfum
Scores: 0.73321533
Scores: 0.697061
Scores: 0.6476709
Scores: 0.6092781
Scores: 0.602033


In [8]:
## Function that returns recommended fragrances, brands, and notes
def search(query):
    query = [query]
    query_emb = embed(query)
    linear_similarities = linear_kernel(query_emb, embeddings).flatten() 
    index = linear_similarities.argsort()[:-6:-1]
    return perfume_df[['Name', 'Brand', 'Notes']].iloc[index] 

search('bergamot iris lily')

Unnamed: 0,Name,Brand,Notes
317,Le Cri De La Lumiere Eau de Parfum,Parfum d'Empire,"ambrette, rose, iris"
2132,Odalisque Eau de Parfum,PARFUMS DE NICOLAI,"lily of the valley, jasmine, iris root"
1060,Scarlet Lily Eau de Parfum,Shay & Blue,"lotus blossom, scarlet ariadne lily, ylang-yla..."
1062,10 AM Accord Eau de Parfum,Cinnamon Projects,"cedar, iris, lavender, sage, vetiver"
1158,Cedar Woodpecker Eau de Parfum,Parle Moi de Parfum,"cedar, citrus, iris"


### Conclusion

Although the USE model returns matches with greater scores, they're less likely to include the same notes as the query. This may be due to the USE model being optimised for lengthier pieces of text.