**Repo Github** : https://github.com/MohammadSobri14/search-engine-smartfindr.git 

**Tech Stack :**  

🧠 1. Data Science & NLP  
Digunakan untuk pemrosesan data, ekstraksi fitur, dan pembobotan semantik.  

- Python – Bahasa utama yang digunakan untuk seluruh proses backend dan analisis data.  

- Jupyter Notebook – Untuk eksplorasi data, preprocessing, dan eksperimen model awal.  

- Pandas – Mengelola dan memanipulasi data tabular (CSV & hasil query SQL).  

- NumPy – Operasi numerik dan pengganti nilai NaN.  

- Torch (PyTorch) – Backend library dari SentenceTransformer untuk pemrosesan vektor NLP.  

- SentenceTransformer – Untuk mengubah teks menjadi vektor embedding (menggunakan model all-MiniLM-L6-v2).  

- scikit-learn (sklearn.metrics.pairwise.cosine_similarity) – Menghitung kemiripan antara vektor teks.  

🕸️ 2. Web Scraping (untuk pengambilan data smartphone dari situs)  
- requests – Untuk mengambil konten HTML dari website.  

- BeautifulSoup – Untuk parsing dan ekstraksi data dari HTML.  

- re (Regular Expression) – Untuk ekstraksi informasi seperti "RAM 8GB", "kamera 50MP", dan "baterai 5000".  

🛢️ 3. Database & ORM  
- MySQL – Tempat penyimpanan data smartphone.  

- mysql-connector-python – Untuk koneksi langsung ke database MySQL.  

- SQLAlchemy – ORM (Object Relational Mapper) untuk integrasi Python dan MySQL secara lebih fleksibel.  

🌐 4. Web Framework (Backend)  
- Flask – Framework utama untuk membuat REST API dan menyajikan tampilan web.  

- Flask Templates (Jinja2) – Untuk merender halaman HTML dengan data dinamis.  

🎨 5. Frontend & UI  
- Tailwind CSS – Framework CSS utility-first untuk membuat tampilan website yang modern dan responsif.  

- JavaScript DOM + Fetch API – Untuk fitur auto-suggestion dan interaktivitas di halaman pencarian.  

In [None]:
import re
import requests
import pandas as pd
from bs4 import BeautifulSoup

# **Scrapping**

Kode ini bertujuan untuk mengambil data halaman web dari situs Pricebook menggunakan metode HTTP GET.

In [9]:
url = "https://www.pricebook.co.id/smartphone/"
req = requests.get(url)

Fungsi ini digunakan untuk melakukan web scraping produk smartphone dari situs Pricebook secara otomatis dan berulang berdasarkan jumlah halaman yang ditentukan.  

Fungsi menerima dua parameter:  

- url: URL dasar dari halaman yang ingin di-scrape.  

- num_pages: jumlah halaman yang akan diambil datanya.  

Semua hasil dari scraping beberapa halaman akan digabung dan dikembalikan dalam bentuk list of dictionaries, siap untuk diproses lebih lanjut atau disimpan ke dalam file/dataset.

In [10]:
def crawl_products(url, num_pages):
    results = []

    for page in range(1, num_pages + 1):
        api_url = f"{url}?page={page}"
        print(f"Scraping: {api_url}")
        req = requests.get(api_url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(req.text, 'lxml')

        product_cards = soup.find_all('div', {'class': 'styles_productPanel__Tlvp6 row'})

        for idx, product in enumerate(product_cards):
            product_dict = {}
            ranking = product.find('span', {'class': "styles_productRanking__bYl4W"})
            name = product.find('h2', {'class': 'styles_productName__fr99s'})
            price = product.find('span', {'class': 'styles_price__nEARt'})
            year = product.find('div', {'class': 'styles_yearReleased___jyCv'})

            link = product.find('a', {'class': 'styles_linkReplace__oS0Gg'})
            product_url = url + link['href'] if link and 'href' in link.attrs else "N/A"

            product_image = product.find('div', {'class': 'styles_productImageWrapper__IwCKz'})
            product_tag = product_image.find('img') if product_image else None
            product_image_url = product_tag.get('src', 'N/A') if product_tag else "N/A"

            specs = product.find_all('div', {'class': 'gx-xs-0 ps-md-2 pe-md-3 d-inline-flex col-md-6 col-6'})
            ram = specs[0].find_all('span')[0].text.strip() if len(specs) > 0 else "N/A"
            camera = specs[1].find_all('span')[0].text.strip() if len(specs) > 1 else "N/A"
            screen = specs[2].find_all('span')[0].text.strip() if len(specs) > 2 else "N/A"
            battery = specs[3].find_all('span')[0].text.strip() if len(specs) > 3 else "N/A"

            # Change 'i' to 'page' to match the loop variable
            product_dict['ID'] = (page - 1) * len(product_cards) + idx + 1
            product_dict['Product Name'] = name.text.strip() if name else "N/A"
            product_dict['Product Rank'] = ranking.text.strip() if ranking else "N/A"
            product_dict['Ram'] = ram
            product_dict['Camera'] = camera
            product_dict['Screen'] = screen
            product_dict['Battery'] = battery
            product_dict['Price'] = price.text.strip() if price else "N/A"
            product_dict['Year'] = year.find_all('span')[0].text.strip() if year else "N/A"
            product_dict['Product URL'] = product_url
            product_dict['Product Image'] = product_image_url

            results.append(product_dict)

    return results

Melakukan scraping terhadap 500 halaman dari situs tersebut.

In [11]:
url = "https://www.pricebook.co.id/smartphone/"
crwl = crawl_products(url,500)

Scraping: https://www.pricebook.co.id/smartphone/?page=1
Scraping: https://www.pricebook.co.id/smartphone/?page=2
Scraping: https://www.pricebook.co.id/smartphone/?page=3
Scraping: https://www.pricebook.co.id/smartphone/?page=4
Scraping: https://www.pricebook.co.id/smartphone/?page=5
Scraping: https://www.pricebook.co.id/smartphone/?page=6
Scraping: https://www.pricebook.co.id/smartphone/?page=7
Scraping: https://www.pricebook.co.id/smartphone/?page=8
Scraping: https://www.pricebook.co.id/smartphone/?page=9
Scraping: https://www.pricebook.co.id/smartphone/?page=10
Scraping: https://www.pricebook.co.id/smartphone/?page=11
Scraping: https://www.pricebook.co.id/smartphone/?page=12
Scraping: https://www.pricebook.co.id/smartphone/?page=13
Scraping: https://www.pricebook.co.id/smartphone/?page=14
Scraping: https://www.pricebook.co.id/smartphone/?page=15
Scraping: https://www.pricebook.co.id/smartphone/?page=16
Scraping: https://www.pricebook.co.id/smartphone/?page=17
Scraping: https://www.p

Menampilkan hasil scrapping yang telah disimpan ke dalam file CSV

In [12]:
crawl = pd.DataFrame(crwl)
crawl.to_csv('pricebook_smartphone1.csv', encoding='utf-8', index=False)
crawl

Unnamed: 0,ID,Product Name,Product Rank,Ram,Camera,Screen,Battery,Price,Year,Product URL,Product Image
0,1,OPPO A3s RAM 6GB ROM 128GB,#1 HP,6 GB,13 MP,6.2 inch,4230 mAh,Rp 532.000,2018,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...
1,2,Infinix Hot 40 Pro RAM 8GB ROM 256GB,#2 HP,8 GB,108 MP,6.78 inch,5000 mAh,Rp 1.704.000,2023,https://www.pricebook.co.id/smartphone//Infini...,https://cdn.pricebook.co.id/images/product/M/9...
2,3,Samsung Galaxy A15 RAM 8GB ROM 256GB,#3 HP,8 GB,50 MP,6.5 inch,5000 mAh,Rp 2.719.000,2023,https://www.pricebook.co.id/smartphone//Samsun...,https://cdn.pricebook.co.id/images/product/M/9...
3,4,OPPO A31 (2020) RAM 6GB ROM 128GB,#4 HP,6 GB,12 MP,6.5 inch,4230 mAh,Rp 950.000,2020,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...
4,5,Xiaomi Redmi 13C RAM 8GB ROM 256GB,#5 HP,8 GB,50 MP,6.74 inch,5000 mAh,Rp 1.599.000,2023,https://www.pricebook.co.id/smartphone//Xiaomi...,https://cdn.pricebook.co.id/images/product/M/9...
...,...,...,...,...,...,...,...,...,...,...,...
5005,4996,LG L20 D105 ROM 4GB,#5026 HP,512 MB,2 MP,3 inch,1540 mAh,,2014,https://www.pricebook.co.id/smartphone//LG-L20...,https://cdn.pricebook.co.id/images/product/M/2...
5006,4997,Smartfren Andromax G2 Touch QWERTY ROM 4GB,#5027 HP,512 MB,5 MP,3.5 inch,1700 mAh,,2014,https://www.pricebook.co.id/smartphone//Smartf...,https://cdn.pricebook.co.id/images/product/M/2...
5007,4998,ZTE Kis 3,#5028 HP,256 MB,3.15 MP,4 inch,1400 mAh,,2014,https://www.pricebook.co.id/smartphone//ZTE-Ki...,https://cdn.pricebook.co.id/images/product/M/2...
5008,4999,Mito Fantasy Selfie A77 RAM 1GB ROM 4GB,#5029 HP,1 GB,8 MP,4.5 inch,1500 mAh,,2014,https://www.pricebook.co.id/smartphone//Mito-F...,https://cdn.pricebook.co.id/images/product/M/2...


# **Pre-Processing**

Kode ini bertujuan untuk membersihkan dan merapikan data mentah dari file pricebook_smartphone1.csv, agar siap digunakan untuk analisis atau dimasukkan ke database.

In [13]:
df = pd.read_csv('pricebook_smartphone1.csv')

# Bersihkan kolom harga (hapus Rp, titik, dll)
df['Price'] = df['Price'].apply(lambda x: re.sub(r'[^\d]', '', str(x)) if pd.notnull(x) else x)
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')

def extract_number(text):
    if pd.isnull(text):
        return None
    match = re.search(r'(\d+\.?\d*)', str(text))
    return float(match.group(1)) if match else None

# Mengubah tipe data
df['Ram (GB)'] = df['Ram'].apply(extract_number)
df['Camera (MP)'] = df['Camera'].apply(extract_number)
df['Screen (inch)'] = df['Screen'].apply(extract_number)
df['Battery (mAh)'] = df['Battery'].apply(extract_number)
df['Year'] = pd.to_numeric(df['Year'], errors='coerce')

# Menghapus Kolom yang sudah tidak terpakai
df.drop_duplicates(subset=['Product Name'], inplace=True)
df.drop(columns=['Ram', 'Camera', 'Screen', 'Battery'], inplace=True)

df.to_csv('pricebook_products_clean1.csv', index=False)
df

Unnamed: 0,ID,Product Name,Product Rank,Price,Year,Product URL,Product Image,Ram (GB),Camera (MP),Screen (inch),Battery (mAh)
0,1,OPPO A3s RAM 6GB ROM 128GB,#1 HP,532000.0,2018.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,13.00,6.20,4230.0
1,2,Infinix Hot 40 Pro RAM 8GB ROM 256GB,#2 HP,1704000.0,2023.0,https://www.pricebook.co.id/smartphone//Infini...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,108.00,6.78,5000.0
2,3,Samsung Galaxy A15 RAM 8GB ROM 256GB,#3 HP,2719000.0,2023.0,https://www.pricebook.co.id/smartphone//Samsun...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.50,5000.0
3,4,OPPO A31 (2020) RAM 6GB ROM 128GB,#4 HP,950000.0,2020.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,12.00,6.50,4230.0
4,5,Xiaomi Redmi 13C RAM 8GB ROM 256GB,#5 HP,1599000.0,2023.0,https://www.pricebook.co.id/smartphone//Xiaomi...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.74,5000.0
...,...,...,...,...,...,...,...,...,...,...,...
5005,4996,LG L20 D105 ROM 4GB,#5026 HP,,2014.0,https://www.pricebook.co.id/smartphone//LG-L20...,https://cdn.pricebook.co.id/images/product/M/2...,512.0,2.00,3.00,1540.0
5006,4997,Smartfren Andromax G2 Touch QWERTY ROM 4GB,#5027 HP,,2014.0,https://www.pricebook.co.id/smartphone//Smartf...,https://cdn.pricebook.co.id/images/product/M/2...,512.0,5.00,3.50,1700.0
5007,4998,ZTE Kis 3,#5028 HP,,2014.0,https://www.pricebook.co.id/smartphone//ZTE-Ki...,https://cdn.pricebook.co.id/images/product/M/2...,256.0,3.15,4.00,1400.0
5008,4999,Mito Fantasy Selfie A77 RAM 1GB ROM 4GB,#5029 HP,,2014.0,https://www.pricebook.co.id/smartphone//Mito-F...,https://cdn.pricebook.co.id/images/product/M/2...,1.0,8.00,4.50,1500.0


---

# **Sentence Transformer**

Di bagian ini adalah membangun sistem rekomendasi produk berbasis teks menggunakan model transformer. Pertama, setiap produk diubah menjadi deskripsi lengkap. Lalu, dengan bantuan model sentence-transformers, sistem membandingkan input pengguna dengan deskripsi tersebut menggunakan cosine similarity, dan akhirnya menampilkan lima smartphone yang paling relevan secara semantik.

In [None]:
from sentence_transformers import SentenceTransformer, util
import torch

# Load dataset
df = pd.read_csv('pricebook_products_clean1.csv')
selected_cols = ['Product Name', 'Price', 'Ram (GB)', 'Camera (MP)', 'Screen (inch)', 'Battery (mAh)']
df = df[selected_cols].dropna()

# Buat deskripsi produk
def create_description(row):
    return f"{row['Product Name']} dengan RAM {int(row['Ram (GB)'])}GB, kamera {int(row['Camera (MP)'])}MP, layar {row['Screen (inch)']} inci, baterai {int(row['Battery (mAh)'])}mAh, dan harga sekitar Rp{int(row['Price']):,}".replace(",", ".")

df['Product Description'] = df.apply(create_description, axis=1)

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Buat embedding produk
product_descriptions = df['Product Description'].tolist()
product_embeddings = model.encode(product_descriptions, convert_to_tensor=True)

# Input dari pengguna
query = input("Masukkan kebutuhan kamu (misal: 'HP kamera bagus dan baterai awet'): ")
query_embedding = model.encode(query, convert_to_tensor=True)

# Hitung cosine similarity
cos_scores = util.cos_sim(query_embedding, product_embeddings)[0]

# Ambil top 5
top_results = torch.topk(cos_scores, k=5)
top_indices = top_results.indices.tolist()
top_scores = top_results.values.tolist()

# Tampilkan hasil
print("\n📱 Rekomendasi Produk Berdasarkan Pencarian:")
for idx, score in zip(top_indices, top_scores):
    print(f"- {df.iloc[idx]['Product Name']} (Skor kemiripan: {score:.4f})")
    print(f"  {df.iloc[idx]['Product Description']}\n")


Masukkan kebutuhan kamu (misal: 'HP kamera bagus dan baterai awet'): hp dengan kapasitas ram besar

📱 Rekomendasi Produk Berdasarkan Pencarian:
- Wiko Harry RAM 3GB ROM 16GB (Skor kemiripan: 0.6462)
  Wiko Harry RAM 3GB ROM 16GB dengan RAM 3GB. kamera 13MP. layar 5.0 inci. baterai 2500mAh. dan harga sekitar Rp1.399.000

- Lenovo K9 RAM 3GB ROM 32GB (Skor kemiripan: 0.6257)
  Lenovo K9 RAM 3GB ROM 32GB dengan RAM 3GB. kamera 13MP. layar 5.7 inci. baterai 3000mAh. dan harga sekitar Rp1.850.000

- itel P40 RAM 4GB ROM 64GB (Skor kemiripan: 0.6217)
  itel P40 RAM 4GB ROM 64GB dengan RAM 4GB. kamera 13MP. layar 6.6 inci. baterai 6000mAh. dan harga sekitar Rp1.000.000

- itel P40 RAM 4GB ROM 128GB (Skor kemiripan: 0.6213)
  itel P40 RAM 4GB ROM 128GB dengan RAM 4GB. kamera 13MP. layar 6.6 inci. baterai 6000mAh. dan harga sekitar Rp1.199.000

- OPPO K3 RAM 6GB ROM 64GB (Skor kemiripan: 0.6201)
  OPPO K3 RAM 6GB ROM 64GB dengan RAM 6GB. kamera 16MP. layar 6.5 inci. baterai 3765mAh. dan harga s

In [9]:
df = pd.read_csv('pricebook_products_clean1.csv')
df

Unnamed: 0,ID,Product Name,Product Rank,Price,Year,Product URL,Product Image,Ram (GB),Camera (MP),Screen (inch),Battery (mAh)
0,1,OPPO A3s RAM 6GB ROM 128GB,#1 HP,532000.0,2018.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,13.00,6.20,4230.0
1,2,Infinix Hot 40 Pro RAM 8GB ROM 256GB,#2 HP,1704000.0,2023.0,https://www.pricebook.co.id/smartphone//Infini...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,108.00,6.78,5000.0
2,3,Samsung Galaxy A15 RAM 8GB ROM 256GB,#3 HP,2719000.0,2023.0,https://www.pricebook.co.id/smartphone//Samsun...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.50,5000.0
3,4,OPPO A31 (2020) RAM 6GB ROM 128GB,#4 HP,950000.0,2020.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,12.00,6.50,4230.0
4,5,Xiaomi Redmi 13C RAM 8GB ROM 256GB,#5 HP,1599000.0,2023.0,https://www.pricebook.co.id/smartphone//Xiaomi...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.74,5000.0
...,...,...,...,...,...,...,...,...,...,...,...
4846,4996,LG L20 D105 ROM 4GB,#5026 HP,,2014.0,https://www.pricebook.co.id/smartphone//LG-L20...,https://cdn.pricebook.co.id/images/product/M/2...,512.0,2.00,3.00,1540.0
4847,4997,Smartfren Andromax G2 Touch QWERTY ROM 4GB,#5027 HP,,2014.0,https://www.pricebook.co.id/smartphone//Smartf...,https://cdn.pricebook.co.id/images/product/M/2...,512.0,5.00,3.50,1700.0
4848,4998,ZTE Kis 3,#5028 HP,,2014.0,https://www.pricebook.co.id/smartphone//ZTE-Ki...,https://cdn.pricebook.co.id/images/product/M/2...,256.0,3.15,4.00,1400.0
4849,4999,Mito Fantasy Selfie A77 RAM 1GB ROM 4GB,#5029 HP,,2014.0,https://www.pricebook.co.id/smartphone//Mito-F...,https://cdn.pricebook.co.id/images/product/M/2...,1.0,8.00,4.50,1500.0


In [10]:
df.isnull().sum()

ID                  0
Product Name        0
Product Rank        0
Price            3091
Year               43
Product URL         0
Product Image       0
Ram (GB)          121
Camera (MP)        42
Screen (inch)      38
Battery (mAh)     308
dtype: int64

In [11]:
df.dropna(inplace=True)

In [12]:
df.isnull().sum()

ID               0
Product Name     0
Product Rank     0
Price            0
Year             0
Product URL      0
Product Image    0
Ram (GB)         0
Camera (MP)      0
Screen (inch)    0
Battery (mAh)    0
dtype: int64

In [14]:
df.to_csv('pricebook_products_clean1.csv', index=False)
df

Unnamed: 0,ID,Product Name,Product Rank,Price,Year,Product URL,Product Image,Ram (GB),Camera (MP),Screen (inch),Battery (mAh)
0,1,OPPO A3s RAM 6GB ROM 128GB,#1 HP,532000.0,2018.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,13.00,6.20,4230.0
1,2,Infinix Hot 40 Pro RAM 8GB ROM 256GB,#2 HP,1704000.0,2023.0,https://www.pricebook.co.id/smartphone//Infini...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,108.00,6.78,5000.0
2,3,Samsung Galaxy A15 RAM 8GB ROM 256GB,#3 HP,2719000.0,2023.0,https://www.pricebook.co.id/smartphone//Samsun...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.50,5000.0
3,4,OPPO A31 (2020) RAM 6GB ROM 128GB,#4 HP,950000.0,2020.0,https://www.pricebook.co.id/smartphone//OPPO-A...,https://cdn.pricebook.co.id/images/product/M/9...,6.0,12.00,6.50,4230.0
4,5,Xiaomi Redmi 13C RAM 8GB ROM 256GB,#5 HP,1599000.0,2023.0,https://www.pricebook.co.id/smartphone//Xiaomi...,https://cdn.pricebook.co.id/images/product/M/9...,8.0,50.00,6.74,5000.0
...,...,...,...,...,...,...,...,...,...,...,...
1870,1975,OUKITEL C17 Pro RAM 4GB ROM 64GB,#2005 HP,2518200.0,2019.0,https://www.pricebook.co.id/smartphone//OUKITE...,https://cdn.pricebook.co.id/images/product/M/9...,4.0,12.98,6.35,3900.0
1872,1977,Infinix Note 7 Lite RAM 4GB ROM 128GB,#2007 HP,1849000.0,2020.0,https://www.pricebook.co.id/smartphone//Infini...,https://cdn.pricebook.co.id/images/product/M/9...,4.0,48.00,6.60,5000.0
2634,2783,Realme 14T RAM 8GB ROM 256GB,#2813 HP,3599000.0,2025.0,https://www.pricebook.co.id/smartphone//Realme...,https://cdn.pricebook.co.id/images/product/M/1...,8.0,50.00,6.67,6000.0
2671,2821,Realme 14T RAM 8GB ROM 128GB,#2851 HP,3199000.0,2025.0,https://www.pricebook.co.id/smartphone//Realme...,https://cdn.pricebook.co.id/images/product/M/1...,8.0,50.00,6.67,6000.0
