# Tujuan Model

**"Rekomendasi produk skincare berdasarkan permasalahan kulit user."**

Dalam hal ini sistem akan *mengukur kesamaan makna teks* antara `keluhan user` dan `deskripsi produk`. Sehingga fitur yang relevan hanyalah yang membantu model memahami isi atau konteks produk.

Ini dilakukan pakai pendekatan `Content-Based Filtering` yang artinya:
> kita merekomendasikan produk yang mirip secara teks dengan produk lain (misal berdasarkan deskripsi, ingredients atau benefit)

Jadi engine-nya bukan dari data pembelian orang lain, tetapi dari kemiripan konten (**TF-IDF** dan **Cosine Similarity**) dan model merupakan unsupervised learning karena:
- Pada model tidak ada label target
- hanya mengukur kemiripan antar teks berdasarkan representasi dari TF-IDF
- Model ngga belajar dari contoh tapi cuma ngukur dari kedekatan pola teks 

# FEATURE ENGINEERING
---

1. Import packages
2. Load data
3. Feature Engineering


---
## 1. Import Package


In [7]:
! pip install Sastrawi
! pip install nltk contractions
# Install it after sekali jeee

Collecting Sastrawi
  Downloading Sastrawi-1.0.1-py2.py3-none-any.whl.metadata (909 bytes)
Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)
Installing collected packages: Sastrawi
Successfully installed Sastrawi-1.0.1
Collecting contractions
  Downloading contractions-0.1.73-py2.py3-none-any.whl.metadata (1.2 kB)
Collecting textsearch>=0.0.21 (from contractions)
  Downloading textsearch-0.0.24-py2.py3-none-any.whl.metadata (1.2 kB)
Collecting anyascii (from textsearch>=0.0.21->contractions)
  Downloading anyascii-0.3.3-py3-none-any.whl.metadata (1.6 kB)
Collecting pyahocorasick (from textsearch>=0.0.21->contractions)
  Downloading pyahocorasick-2.2.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (13 kB)
Downloading contractions-0.1.73-py2.py3-none-any.whl (8.7 kB)
Downloading textsearch-0.0.24-py2.py3-none-any.whl (7.6 kB)
Downloading anyascii-0.3.3-py3-none-any.whl (345 kB)
Downloading pyahocorasick-2.2.0-cp310-cp310-macosx_11_0_arm64.whl (33 kB)
Installing collected packages: py

In [8]:
# Setup
import os
import re
import string
import numpy as np
import pandas as pd

import pickle

from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

# Model
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from sklearn.metrics.pairwise import cosine_similarity


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/adewidya/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


---
## 2. Load data


In [9]:
df = pd.read_csv('skincare_clean.csv')

In [10]:
display(HTML('<center><b><h1>Top 5 : dataset</h1></b></center><br>'))
display(df.head(5))

Unnamed: 0,product_type,brand,product,rating,review_count,url,image_url,price,description,unique_id,description_clean
0,cleanser,Cetaphil,Gentle Skin Cleanser,4.2,15298,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,112000.0,Cetaphil Gentle Skin Cleanser mengandung formu...,Cetaphil - Gentle Skin Cleanser,cetaphil gentle skin cleanser mengandung formu...
1,cleanser,Senka,Perfect Whip Facial Foam,4.2,6374,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,200000.0,Shiseido Perfect Whip adalah pembersih wajah u...,Senka - Perfect Whip Facial Foam,shiseido perfect whip adalah pembersih wajah u...
2,cleanser,Acnes,Creamy Wash,3.7,5510,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,29000.0,Acnes Creamy Wash adalah sabun pembersih wajah...,Acnes - Creamy Wash,acnes creamy wash adalah sabun pembersih wajah...
3,cleanser,Hada Labo,Tamagohada Mild Peeling Face Wash,4.1,5147,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,35000.0,Hada Labo Tamagohada Mild Peeling Face Wash ad...,Hada Labo - Tamagohada Mild Peeling Face Wash,hada labo tamagohada mild peeling face wash ad...
4,cleanser,Hada Labo,Gokujyun Ultimate Moisturizing Face Wash,4.3,4468,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,30000.0,Hada Labo Gokujyun Ultimate Moisturizing Face ...,Hada Labo - Gokujyun Ultimate Moisturizing Fac...,hada labo gokujyun ultimate moisturizing face ...


In [11]:
display(HTML('<center><b><h1>Bottom 5 : dataset</h1></b></center><br>'))
display(df.tail(5))

Unnamed: 0,product_type,brand,product,rating,review_count,url,image_url,price,description,unique_id,description_clean
855,toner,Dermies,Hello Glow Radiance Essence Toner,4.28,96,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,75000.0,Hello Glow Radiance Essence Toner merupakan pe...,Dermies - Hello Glow Radiance Essence Toner,hello glow radiance essence toner merupakan pe...
856,toner,A.Stop,Clear Face & Body Toner,3.7,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,255000.0,A.Stop Clear Face & Body Toner is the 1 stop s...,A.Stop - Clear Face & Body Toner,astop clear face body toner is the 1 stop shop...
857,toner,Hada Labo,Gokujyun Super Hyaluronic Acid Hydrating Lotio...,4.3,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,200000.0,Contains 3 types of Hyaluronic Acid that work ...,Hada Labo - Gokujyun Super Hyaluronic Acid Hyd...,contains 3 types of hyaluronic acid that work ...
858,toner,Innisfree,Aloe Revital Skin Mist,4.0,92,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,125000.0,The mist-type toner containing Jeju chemical-f...,Innisfree - Aloe Revital Skin Mist,the misttype toner containing jeju chemicalfre...
859,toner,By Wishtrend,Mandelic Acid 5% Skin Prep Water,4.2,92,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,300000.0,By Wishtrend Mandelic Acid 5% Skin Prep Water ...,By Wishtrend - Mandelic Acid 5% Skin Prep Water,by wishtrend mandelic acid 5 skin prep water m...


---
### 2.1. Memeriksa struktur data dan Missing value

In [12]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 860 entries, 0 to 859
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   product_type       860 non-null    object 
 1   brand              860 non-null    object 
 2   product            860 non-null    object 
 3   rating             860 non-null    float64
 4   review_count       860 non-null    int64  
 5   url                860 non-null    object 
 6   image_url          860 non-null    object 
 7   price              817 non-null    float64
 8   description        860 non-null    object 
 9   unique_id          860 non-null    object 
 10  description_clean  860 non-null    object 
dtypes: float64(2), int64(1), object(8)
memory usage: 74.0+ KB


In [13]:
print('\n Checking Missing Value')
print('-'*50)
print(df.isna().sum())


 Checking Missing Value
--------------------------------------------------
product_type          0
brand                 0
product               0
rating                0
review_count          0
url                   0
image_url             0
price                43
description           0
unique_id             0
description_clean     0
dtype: int64


In [14]:
print('\n === Unique Value: kolom brand ===')
print('-'*80)
display(df['brand'].unique()[:20])
print(f'Jumlah Unique Value dari kolom : ', (df['brand'].nunique()))

print('\n === Unique Value: kolom product ===')
print('-'*80)
display(df['product'].unique()[:10])
print(f'Jumlah Unique Value dari kolom : ', (df['product'].nunique()))

print('\n === Unique Value: kolom product_type ===')
print('-'*80)
display(df['product_type'].unique()[:10])
print(f'Jumlah Unique Value dari kolom : ', (df['product_type'].nunique()))

print('\n === Unique Value: kolom description_clean ===')
print('-'*80)
display(df['description_clean'].unique()[:10])
print(f'Jumlah Unique Value dari kolom : ', (df['description_clean'].nunique()))



 === Unique Value: kolom brand ===
--------------------------------------------------------------------------------


array(['Cetaphil', 'Senka', 'Acnes', 'Hada Labo', 'Cosrx',
       'The Body Shop', "Pond's", 'NIVEA', 'Wardah', 'Glow & Lovely',
       'NPURE', 'Kojic Plankton', 'Clean & Clear', 'Kleveru Organics',
       'Sariayu', 'Garnier', 'SK-II', 'Innisfree', 'White Story', 'Poise'],
      dtype=object)

Jumlah Unique Value dari kolom :  219

 === Unique Value: kolom product ===
--------------------------------------------------------------------------------


array(['Gentle Skin Cleanser', 'Perfect Whip Facial Foam', 'Creamy Wash',
       'Tamagohada Mild Peeling Face Wash',
       'Gokujyun Ultimate Moisturizing Face Wash',
       'Low pH Good Morning Gel Cleanser',
       'Tea Tree Skin Clearing Facial Wash',
       'Bright Miracle Ultimate Clarity Facial Foam',
       'Sparkling Bright Facial Foam', 'Lightening Micellar Gentle Wash'],
      dtype=object)

Jumlah Unique Value dari kolom :  832

 === Unique Value: kolom product_type ===
--------------------------------------------------------------------------------


array(['cleanser', 'serum/essence', 'sun protection', 'moisturizer',
       'toner'], dtype=object)

Jumlah Unique Value dari kolom :  5

 === Unique Value: kolom description_clean ===
--------------------------------------------------------------------------------


array(['cetaphil gentle skin cleanser mengandung formula gel yang telah terbukti secara klinis memberikan hidrasi yang sangat baik untuk melindungi dari kekeringan sambil membiarkan kulit terhidrasi setelah dibersihkan diformulasikan untuk menghilangkan kotoran riasan dan debu dengan lembut namun efektif ditambah campuran niacinamide vitamin b3 panthenol provitamin b5 dan gliserin yang menghidrasi untuk menjaga lapisan kelembapan alami kulit melindungi dari 5 tanda sensitivitas kulit termasuk lapisan kelembapan terganggu tidak nyaman dehidrasi kaku kasarhow to useaplikasikan pada bagian wajah dan tubuh dapat digunakan dengan atau tanpa airsuitable forsemua jenis kulitingredientsaqua glycerin cetearyl alcohol panthenol niacinamide pantolactone xanthan gum sodium cocoyl isethionate sodium benzoate citric acid fil1747v00',
       'shiseido perfect whip adalah pembersih wajah untuk menghilangkan kotoran minyak berlebih dan sisa make up dan pada saat bersamaan menghidrasi kulit formulanya y

Jumlah Unique Value dari kolom :  788


---
## 3. Feature Engineering
---

### 3.1. Feature Selection

In [15]:
df = df.drop(columns=['price'])
df

Unnamed: 0,product_type,brand,product,rating,review_count,url,image_url,description,unique_id,description_clean
0,cleanser,Cetaphil,Gentle Skin Cleanser,4.20,15298,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Cetaphil Gentle Skin Cleanser mengandung formu...,Cetaphil - Gentle Skin Cleanser,cetaphil gentle skin cleanser mengandung formu...
1,cleanser,Senka,Perfect Whip Facial Foam,4.20,6374,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Shiseido Perfect Whip adalah pembersih wajah u...,Senka - Perfect Whip Facial Foam,shiseido perfect whip adalah pembersih wajah u...
2,cleanser,Acnes,Creamy Wash,3.70,5510,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Acnes Creamy Wash adalah sabun pembersih wajah...,Acnes - Creamy Wash,acnes creamy wash adalah sabun pembersih wajah...
3,cleanser,Hada Labo,Tamagohada Mild Peeling Face Wash,4.10,5147,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hada Labo Tamagohada Mild Peeling Face Wash ad...,Hada Labo - Tamagohada Mild Peeling Face Wash,hada labo tamagohada mild peeling face wash ad...
4,cleanser,Hada Labo,Gokujyun Ultimate Moisturizing Face Wash,4.30,4468,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hada Labo Gokujyun Ultimate Moisturizing Face ...,Hada Labo - Gokujyun Ultimate Moisturizing Fac...,hada labo gokujyun ultimate moisturizing face ...
...,...,...,...,...,...,...,...,...,...,...
855,toner,Dermies,Hello Glow Radiance Essence Toner,4.28,96,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hello Glow Radiance Essence Toner merupakan pe...,Dermies - Hello Glow Radiance Essence Toner,hello glow radiance essence toner merupakan pe...
856,toner,A.Stop,Clear Face & Body Toner,3.70,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,A.Stop Clear Face & Body Toner is the 1 stop s...,A.Stop - Clear Face & Body Toner,astop clear face body toner is the 1 stop shop...
857,toner,Hada Labo,Gokujyun Super Hyaluronic Acid Hydrating Lotio...,4.30,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Contains 3 types of Hyaluronic Acid that work ...,Hada Labo - Gokujyun Super Hyaluronic Acid Hyd...,contains 3 types of hyaluronic acid that work ...
858,toner,Innisfree,Aloe Revital Skin Mist,4.00,92,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,The mist-type toner containing Jeju chemical-f...,Innisfree - Aloe Revital Skin Mist,the misttype toner containing jeju chemicalfre...



**Informasi**<br>

Karena fokus sistem pada model ini adalah `Rekomendasi produk berdasarkan permasalahan kulit` dan bukan berdasarkan produk termurah/termahal.

Maka kolom `price` di-*drop* dengan alasan:
1. Tidak ikut memengaruhi *teks deskripsi produk* yang dipakai model
2. Model berbasis teks hanya peduli dengan str bukan pada angka.

Sehingga kolom `price` ini tidak relevan untuk *similarity computation*.

---
### 3.2. Text Preprocessing

In [16]:
%%time
# Inisialisasi tools
factory = StemmerFactory()
stemmer = factory.create_stemmer()

indo_stopwords = set(stopwords.words('indonesian'))

# Fungsi preprocess
def preprocess_text(text):
    if pd.isna(text):
        return text
    # bersihkan garis baru dan spasi berlebih
    text = re.sub(r"\\n", " ",text)
    text = re.sub(r"\s+", " ",text).strip()

    # tambah spasi sebelum teks yang nempel
    keywords = ['ingredients', 'how to use', 'suitable']
    # buat pola yang mendeteksi kalu kata kunci nempel ke teks sebelumnya
    pattern = r'(?i)([a-zA-Z0-9])(' + '|'.join([k.replace(' ', '') for k in keywords]) + r')'
    text = re.sub(pattern, lambda m: f'{m.group(1)}{m.group(2)}', text)

    # lowercase semua teks
    text = text.lower()

    # hapus angka dan satuan
    text = re.sub(r'\bx\b', '', text)
    text = re.sub(r'\b\d+\b', '', text)
    text = re.sub(r'\b\d+[a-zA-Z]+\b', '', text)

    # menghilangkan tanda baca
    text = text.translate(str.maketrans('', '', string.punctuation))

    # hilangkan kata satuan umum
    satuan_umum = {'ml', 'gram', 'menit', 'pump', 'pcs'}
    text = ' '.join([word for word in text.split() if word not in satuan_umum])
    
    # standarisasi seperti hapus titik dua atau var besar kecil
    for k in keywords:
        text = re.sub(k + r'\s*:?', k, text)

    # Stopwords
    text = ' '.join([word for word in text.split() if word not in indo_stopwords])
    
    # Stemming
    text = re.sub(r'\s+', ' ', text).strip()

    return text

df['cleaned_description'] = df['description'].apply(preprocess_text)

CPU times: user 106 ms, sys: 3.16 ms, total: 109 ms
Wall time: 112 ms


In [17]:
df = df.drop(columns=['description_clean'])
df

Unnamed: 0,product_type,brand,product,rating,review_count,url,image_url,description,unique_id,cleaned_description
0,cleanser,Cetaphil,Gentle Skin Cleanser,4.20,15298,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Cetaphil Gentle Skin Cleanser mengandung formu...,Cetaphil - Gentle Skin Cleanser,cetaphil gentle skin cleanser mengandung formu...
1,cleanser,Senka,Perfect Whip Facial Foam,4.20,6374,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Shiseido Perfect Whip adalah pembersih wajah u...,Senka - Perfect Whip Facial Foam,shiseido perfect whip pembersih wajah menghila...
2,cleanser,Acnes,Creamy Wash,3.70,5510,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Acnes Creamy Wash adalah sabun pembersih wajah...,Acnes - Creamy Wash,acnes creamy wash sabun pembersih wajah bertek...
3,cleanser,Hada Labo,Tamagohada Mild Peeling Face Wash,4.10,5147,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hada Labo Tamagohada Mild Peeling Face Wash ad...,Hada Labo - Tamagohada Mild Peeling Face Wash,hada labo tamagohada mild peeling face wash sa...
4,cleanser,Hada Labo,Gokujyun Ultimate Moisturizing Face Wash,4.30,4468,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hada Labo Gokujyun Ultimate Moisturizing Face ...,Hada Labo - Gokujyun Ultimate Moisturizing Fac...,hada labo gokujyun ultimate moisturizing face ...
...,...,...,...,...,...,...,...,...,...,...
855,toner,Dermies,Hello Glow Radiance Essence Toner,4.28,96,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Hello Glow Radiance Essence Toner merupakan pe...,Dermies - Hello Glow Radiance Essence Toner,hello glow radiance essence toner perpaduan es...
856,toner,A.Stop,Clear Face & Body Toner,3.70,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,A.Stop Clear Face & Body Toner is the 1 stop s...,A.Stop - Clear Face & Body Toner,astop clear face body toner is the stop shop f...
857,toner,Hada Labo,Gokujyun Super Hyaluronic Acid Hydrating Lotio...,4.30,94,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,Contains 3 types of Hyaluronic Acid that work ...,Hada Labo - Gokujyun Super Hyaluronic Acid Hyd...,contains types of hyaluronic acid that work sy...
858,toner,Innisfree,Aloe Revital Skin Mist,4.00,92,https://reviews.femaledaily.com/products/clean...,https://image.femaledaily.com/dyn/210/images/p...,The mist-type toner containing Jeju chemical-f...,Innisfree - Aloe Revital Skin Mist,the misttype toner containing jeju chemicalfre...


---
### 3.3. Feature Extraction

In [18]:
# TF-IDF
tfidf = TfidfVectorizer(
    max_features=5000,
    ngram_range=(1,2),
    min_df=3,
    max_df=0.8,
    sublinear_tf=True,
    stop_words=None,
)
X_tfidf = tfidf.fit_transform(df['cleaned_description'])

In [19]:
print(X_tfidf.shape)
print(tfidf.get_feature_names_out()[:30])

(860, 5000)
['absorption' 'ac' 'ac care' 'acacia' 'acetate' 'acetate antioxidant'
 'acetate disodium' 'acetate retinyl' 'acetate sodium' 'acetate vitamin'
 'acetyl' 'acetyl glucosamine' 'acetyl hexapeptide' 'acetyl tetrapeptide'
 'acetyl tyrosine' 'acetylated' 'acetylated hyaluronate' 'achillea'
 'achillea millefolium' 'acid' 'acid adenosine' 'acid allantoin'
 'acid aloe' 'acid alpha' 'acid ammonium' 'acid arginine' 'acid ascorbyl'
 'acid berfungsi' 'acid butylene' 'acid caffeine']


**Informasi**

1. `max_features=5000`, artinya kita cuma ambil 5000 kata yang penting. Kenapa?
    - karena kalau dataset kecil, 5000 mungkin sudah cukup buat representasi kata unik tanpa terlalu berat
    
    - kalau dataset besar banget baru bisa coba naikin ke 10000 atau 20000

Dampak nya nurunin dimensi vektor dan mempercepat komputasi `cosine similarity`

2. `ngram_range=(1,2)`, ini maksudnya  buat ambil unigram (kata tunggal) dan bigram (dua kata berurutan)<br>
Contoh:

```
'kulit kering kusam'
-> unigram: ['kulit', 'kering', 'kusam']
-> bigram: ['kulit kering', 'kering kusam']
```

kenapa?
    - setau aku ya kombinasi kata tuh kalo didunia skincare sering dipakai kan ya, kaya misal '**melembapkan wajah** bukan berarti **wajah lembap** 
    - jadi bigram penting biar model nangkep konteks nya gitu

3. `min_df=3` hanya ambil kata yang muncul di ≥ 3 dokumen. kenapa?
ini belum tau kenapa mau uji coba aja dulu soalnya
    - kata yang muncul sekali biasanya typo atau nama merk atau ingredients mungkin ya

Harapannya sih biar model lebih general sama stabil

4. `max_df=0.8` ini tuh mengabaikan kata yang muncul di >80% dokumen. Kenapa? sama kaya min_df ini test aja cuma persepsi awal
    - kata yang muncul di hampir semua deskripsi gak bantu membedakan antar produk sih cuma gatau ya kita tes aja dulu

jadi ini tuh kaya otomatis buat ngurangin stopword tambahan yang sering muncul di semua teks

5. `subinear_tf=True` maksud dari ini tuh yang ku tau
    - kalau sebuah kata muncul 10 kali vs 100 kali bedanya ga perlu dibesar besarin
    - log scale bikin bobot lebih stabil sama hasil similarity nya harapan nya sih lebih smooth

6. `stop_words=None`, ini tuh karena aku udah hapus stopwords di preprocessing, jadi gaperlu set ulang disini gitu. Kalo misalkan belum pun sebenernya bisa set `stop_words='indonesian`


---
### 3.4. Similarity Computation

In [20]:
user_query = ['kulit kering dan kusam']
query_vec = tfidf.transform(user_query)
cos_sim = cosine_similarity(query_vec, X_tfidf)
top_indices = cos_sim[0].argsort()[-5:][::-1]
df.iloc[top_indices][['product_type', 'product', 'brand', 'description']]

Unnamed: 0,product_type,product,brand,description
58,cleanser,Antioxidant Cleansing Foam,Raiku Beauty,"Nett Weight : 60gFormulanya lembut, tidak memb..."
89,cleanser,Dot Burst Face Wash,Emina,Sabun cuci muka dengan formula mild surfactant...
643,moisturizer,Vitamin E Intense Moisture Cream,The Body Shop,THE BODY SHOP VITAMIN E INTENSE MOISTURE CREAM...
854,toner,Bright Complete Milky Brightening Dew Toner,Garnier,Garnier Bright Complete Milky Brightening Dew ...
755,toner,Moisture Rich Hydrating Toner,Azarine Cosmetic,Radiant Healthy Dewy SkinUntuk semua jenis kul...


In [24]:
# nanti ini definisi in ulang buat fungsi rekomendasi
def recommended_products(query, top_k=5):
    # copy df biar tidak merubah hasil
    df_temp = df.copy()

    # Menghitung cosine similarity
    query_vec = tfidf.transform([query])
    cos_sim = cosine_similarity(query_vec, X_tfidf)
    df_temp['similarity'] = cos_sim[0]

    # Mengurutkan dari yang paling mirip
    df_sorted = df_temp.sort_values(by='similarity', ascending=False)

    # ambil top-1 produk per kategori
    top_per_type = df_sorted.groupby('product_type').head(1)

    # ambil top-k dari hasil akhir
    result = top_per_type.head(top_k).copy()

    # menampilkan % biar lebih mudah dibaca
    result['similarity (%)'] = (result['similarity'] * 100).round(2)

    # result
    return result[['similarity (%)', 'product_type', 'product', 'brand', 'description']]

recommended_products('acne and moisturizing')

Unnamed: 0,similarity (%),product_type,product,brand,description
738,59.91,toner,Beauty Water,Son & Park,Toner. Exfoliating and moisturizing.
83,25.67,cleanser,Brightening Watermelon Facial Foam X Maudy Ayunda,Pond's,Brightening and Hydrating
321,20.28,serum/essence,Galactomyces Pure Vitamin C Glow Serum,Some by Mi,The galactomyces helps brighten the skin with ...
557,16.96,moisturizer,Effaclar Duo Plus,La Roche Posay,"CORRECTIVE UNCLOGGING CARE ANTI-IMPERFECTIONS,..."
383,13.72,sun protection,Intensive Triple-shield Sunscreen SPF50+ PA++++,Innisfree,Contain Sunflower oil and Jeju Green tea extra...


**Informasi**

1. **Konteks**
Dengan memberikan query `kulit kering dan kusam`, dan sistem rekomendasi menghasilkan 5 produk paling relevan berdasarkan kemiripan teks deskripsi produk.

2. **Makna dari nilai similarity (%)**
    - Nilai `similarity (%)` menunjukkan seberapa mirip deskripsi produk dengan kata-kata pada query.
    - Semakin tinggi nilainya, berarti kata atau konteks yang digunakan dalam deskripsi produk semakin sering tumpang tindih dengan kata-kata di query.
    - pada TF-IDF kemiripan dihitung dari bobot kata unik, jadi bukan cuma 'kulit' dan 'kering' tapi juga kata-kata yang sering muncul bersamaan seperti 'hidrasi', 'cerah', etc.

3. Interpretasi per produk

| Produk | Insight : Relevansi |
| --- | --- |
| Raiku Beauty - Antioxidant Cleansing Foam | Produk ini punya kata kunci terkait membersihkan kulit dengan lembut dan melembapkan sehingga relevan untuk kulit kering dan kusam |
| The Body Shop - Vitamin E Intense Moisture Cream | sistem mengenali relevansi kuat karena ada kandungan vitamin E dan misture yang dirasa cocok untuk melembapkan kulit kering. |
| Garnier - Bright Complete Milky Brightening Dew Toner | ada kata 'brightening dan dew yang berkaitan dengan perawatan kulit kusam ya ga sih? sehingga muncul sebagai produk ke tiga |
| Parasol - UV Guard Moisturizing Sunscreen Lotion | mengandung 'moisturizing' jadi sistem ngebaca ada keterkaitan sama kebutuhan hidrasi kulit yang kering |
| Some by Mi - AHA BHA PHA 30 Days Miracle Serum | serum ini punya kandungan eksfoliasi untuk peremajaan kulit kusam mungkin jadi masuk ke list terus karena fokusnya bukan kelembapan jadi skornya lebih rendah |

4. Kesimpulan <br>
Sistem rekomendasi berhasil mengidentifikasi produk produk yang relevan dengan permasalahan kulit kering dan kusam, meliputi kategori: *cleanser, moisturizer, toner, sunscreen* dan *serum*.

Dan nilai similarity tertinggi dimiliki oleh **produk yang menekankan fungsi hidrasi dan pelembapan kulit**, menunjukkan bahwa model TF-IDF dapat menangkap makna semantik dasar dari kebutuhan pengguna.

### Save Model

In [None]:
with open('tfidf_vectorizer.pkl', 'wb') as f:
    pickle.dump(tfidf, f)

with open('tfidf_matrix.pkl', 'wb') as f:
    pickle.dump(X_tfidf, f)

df.to_pickle('skincare_df.pkl')
