
# LapPick: Sistem Rekomendasi Laptop Menggunakan NLP
### Capstone Project Laskar AI

**Tanggal Pembuatan:** 20 May 2025

**Tim:** LAI25-SM035  
**Anggota:**  
- A533YBM071 – ARLIYANDI – STIKOM EL RAHMA  
- A006YBF160 – FATHUR RAHMAN AL FARIZY – Universitas Brawijaya  
- A245YBF227 – IRFAN FAJAR MUTTAQIN – Universitas Kristen Satya Wacana Salatiga  
- A011XBF457 – SHOFURA TSABITAH RAHMAH – Universitas Padjadjaran  

---

## Deskripsi Proyek
LapPick adalah sistem rekomendasi laptop berbasis Natural Language Processing (NLP) untuk membantu calon pembeli memilih laptop berdasarkan kebutuhan (gaming, desain grafis, perkantoran, dll.) dan anggaran.


In [1]:
# Data Processing and Numerical Operations
import pandas as pd
import numpy as np
import re

# Natural Language Processing (NLP)
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
import nltk
nltk.download('punkt')

# Machine Learning and Model Evaluation
from sklearn.model_selection import train_test_split


# TensorFlow (model rekomendasi)
import tensorflow as tf

# General Settings
import joblib
import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to /home/f2rra/nltk_data...
[nltk_data]   Package punkt is already up-to-date!



## 1. Pengumpulan Data

### Tujuan
Mengumpulkan data spesifikasi laptop dari berbagai sumber (e-commerce, dataset publik).

### Langkah
1. Melakukan web scraping dari situs e-commerce.
2. Mengambil dataset dari Kaggle.

### Implementasi


In [2]:

df_a = pd.read_csv('datasets/laptop_data.csv')
df_b = pd.read_csv('datasets/LaptopPricePrediction.csv')
df_c = pd.read_csv('datasets/laptops.csv')


In [3]:
df_a.head()

Unnamed: 0.1,Unnamed: 0,Company,TypeName,Inches,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight,Price
0,0,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8GB,128GB SSD,Intel Iris Plus Graphics 640,macOS,1.37kg,71378.6832
1,1,Apple,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8GB,128GB Flash Storage,Intel HD Graphics 6000,macOS,1.34kg,47895.5232
2,2,HP,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,No OS,1.86kg,30636.0
3,3,Apple,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16GB,512GB SSD,AMD Radeon Pro 455,macOS,1.83kg,135195.336
4,4,Apple,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8GB,256GB SSD,Intel Iris Plus Graphics 650,macOS,1.37kg,96095.808


In [4]:
df_a.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        1303 non-null   int64  
 1   Company           1303 non-null   object 
 2   TypeName          1303 non-null   object 
 3   Inches            1303 non-null   float64
 4   ScreenResolution  1303 non-null   object 
 5   Cpu               1303 non-null   object 
 6   Ram               1303 non-null   object 
 7   Memory            1303 non-null   object 
 8   Gpu               1303 non-null   object 
 9   OpSys             1303 non-null   object 
 10  Weight            1303 non-null   object 
 11  Price             1303 non-null   float64
dtypes: float64(2), int64(1), object(9)
memory usage: 122.3+ KB


In [5]:
df_b.head()

Unnamed: 0.1,Unnamed: 0,Name,Processor,RAM,Operating System,Storage,Display,Warranty,Price,rating
0,0,Lenovo Ideapad S145 Core i5 10th Gen - (8 GB/1...,Intel Core i5 Processor (10th Gen),8 GB DDR4 RAM,64 bit Windows 10 Operating System,1 TB HDD,39.62 cm (15.6 inch) Display,1 Year Onsite Warranty,"₹43,990",3.9
1,1,Lenovo IdeaPad Core i3 11th Gen - (8 GB/256 GB...,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 10 Operating System,256 GB SSD,35.56 cm (14 Inch) Display,1 Year Onsite Warranty,"₹43,990",4.2
2,2,HP Pentium Quad Core - (8 GB/256 GB SSD/Window...,Intel Pentium Quad Core Processor,8 GB DDR4 RAM,64 bit Windows 10 Operating System,256 GB SSD,35.56 cm (14 inch) Display,1 Year Onsite Warranty,"₹31,490",4.6
3,3,HP 14s Core i3 11th Gen - (8 GB/256 GB SSD/Win...,Intel Core i3 Processor (11th Gen),8 GB DDR4 RAM,64 bit Windows 10 Operating System,256 GB SSD,35.56 cm (14 inch) Display,1 Year Onsite Warranty,"₹40,990",4.1
4,4,HP 15s Athlon Dual Core - (4 GB/1 TB HDD/Windo...,AMD Athlon Dual Core Processor,4 GB DDR4 RAM,64 bit Windows 10 Operating System,1 TB HDD,39.62 cm (15.6 inch) Display,1 Year Onsite Warranty,"₹27,490",4.1


In [6]:
df_b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550 entries, 0 to 549
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        550 non-null    int64  
 1   Name              550 non-null    object 
 2   Processor         550 non-null    object 
 3   RAM               550 non-null    object 
 4   Operating System  550 non-null    object 
 5   Storage           550 non-null    object 
 6   Display           550 non-null    object 
 7   Warranty          550 non-null    object 
 8   Price             550 non-null    object 
 9   rating            550 non-null    float64
dtypes: float64(1), int64(1), object(8)
memory usage: 43.1+ KB


In [7]:
df_c.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


In [8]:
df_c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Laptop        2160 non-null   object 
 1   Status        2160 non-null   object 
 2   Brand         2160 non-null   object 
 3   Model         2160 non-null   object 
 4   CPU           2160 non-null   object 
 5   RAM           2160 non-null   int64  
 6   Storage       2160 non-null   int64  
 7   Storage type  2118 non-null   object 
 8   GPU           789 non-null    object 
 9   Screen        2156 non-null   float64
 10  Touch         2160 non-null   object 
 11  Final Price   2160 non-null   float64
dtypes: float64(2), int64(2), object(8)
memory usage: 202.6+ KB



## 2. Pembersihan dan Praproses Data

### Tujuan
Membersihkan data agar konsisten dan siap diproses oleh model.

### Langkah
1. Menghilangkan duplikasi.
2. Mengatasi nilai kosong.
3. Normalisasi teks (misalnya, huruf kecil).

### Implementasi


In [9]:
# Contoh membersihkan data
df = pd.read_csv('datasets/laptops.csv')

df.drop(columns=['Laptop'], inplace=True)  # Nama deskriptif yang bisa digantikan oleh spesifikasi

# Tangani missing value
# Mengganti nilai null pada kolom 'GPU' dengan 'Integrated'
df['GPU'].fillna('Integrated', inplace=True)

# Mengisi nilai null pada kolom 'Storage type'
screen_na_index = df[df['Screen'].isnull()].index
screens = np.array([
    15.6, # Acer Extensa 15 EX215-54
    15.6, # HP ENVY x360 2-in-1 Laptop 15-ew0008np
    15.6, # Lenovo IdeaPad Gaming 3 15ACH6
    15.6 # Lenovo ThinkPad P15 Gen 2
])
df.loc[screen_na_index, 'Screen'] = screens

# Mengganti nilai null pada kolom 'Storage type'
storage_type_na_index = df[df['Storage type'].isnull()].index
storage_types = np.array([
    "eMMC", # ASUS Chromebook CX1400CNA-BV0210
    "SSD",  # Portátil Alurin Flex Advance Intel Core I5-1155G7
    "SSD",  # ASUS ROG Strix G16 G614JZ-N3008
    "eMMC", # Prixton Flex Pro Intel Celeron N4020
    "SSD",  # Apple MacBook Pro Intel Core i5
    "SSD",  # Alurin AMD R5 5500U
    "SSD",  # Alurin Intel Core I7 12th
    "SSD",  # ASUS F515EA-BQ1625W
    "eMMC", # HP Chromebook x360 11 G3 Education Edition
    "eMMC", # HP Chromebook 11 G9
    "SSD",  # Alurin Flex Advance Intel Core i5-1155G7
    "SSD",  # PcCom Revolt 3050 Intel Core i7-13700H
    "SSD",  # PcCom Revolt 3050 Intel Core i7-13700H
    "SSD",  # Apple MacBook Pro Intel Core i5
    "SSD",  # Microsoft Surface Pro 7 Intel Core i5
    "SSD",  # Apple MacBook Air i5
    "SSD",  # Apple MacBook Air i5
    "SSD",  # HP 15S-EQ1148NS AMD Athlon Silver
    "SSD",  # Prixton Netbook Pro Intel Celeron N4020
    "SSD",  # ASUS VivoBook F515EA-BQ3013W Intel Core i5
    "SSD",  # Apple MacBook Intel Core M3
    "SSD",  # Apple MacBook Intel Core M3
    "SSD",  # Apple MacBook Intel Core i5
    "HDD",  # HP ProBook 640 G3 Intel Core i5-7200U
    "SSD",  # HP Victus 16-E0006NP AMD Ryzen 7
    "eMMC", # ASUS VivoBook 13 Slate OLED T3300KA
    "SSD",  # Apple MacBook Pro Intel Core i5
    "SSD",  # Apple MacBook Pro Touch Bar Intel Core i7
    "eMMC", # ASUS Chromebook CR1 CR1100CKA
    "SSD",  # HP ProBook 430 G6 Intel Core i5
    "SSD",  # Lenovo V14 IIL Intel Core i5
    "SSD",  # Lenovo V15 G2 ITL Intel Core i3
    "eMMC", # Lenovo Yoga Chromebook C630
    "SSD",  # Medion Akoya E4251 Intel Celeron
    "SSD",  # Microsoft Surface Pro 7 Intel Core i7
    "SSD",  # Microsoft Surface Pro 7 Intel Core i7
    "eMMC", # Samsung Chromebook 2 Intel Celeron
    "SSD",  # Thomson NEO Z3 Qualcomm Snapdragon
    "SSD",  # Apple MacBook Air i5
    "SSD",  # Apple MacBook Intel Core M3
    "HDD",  # HP OMEN 15-DC0005NS Intel Core i7
    "eMMC"  # Medion Akoya E4251 Intel Celeron
])
df['Storage type'].fillna('Unknown', inplace=True)
df.loc[storage_type_na_index, 'Storage type'] = storage_types

df.dropna(inplace=True)  # Jika ada sisa null, drop saja

print(df.head())
print(df.info())

  Status   Brand       Model            CPU  RAM  Storage Storage type  \
0    New    Asus  ExpertBook  Intel Core i5    8      512          SSD   
1    New  Alurin          Go  Intel Celeron    8      256          SSD   
2    New    Asus  ExpertBook  Intel Core i3    8      256          SSD   
3    New     MSI      Katana  Intel Core i7   16     1000          SSD   
4    New      HP         15S  Intel Core i5   16      512          SSD   

          GPU  Screen Touch  Final Price  
0  Integrated    15.6    No      1009.00  
1  Integrated    15.6    No       299.00  
2  Integrated    15.6    No       789.00  
3    RTX 3050    15.6    No      1199.00  
4  Integrated    15.6    No       669.01  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Status        2160 non-null   object 
 1   Brand         2160 non-null   object 
 2   Model        

In [10]:
# Konversi harga ke jutaan jika perlu
df['Final Price'] = df['Final Price'] * 16500  # USD to IDR

# Pastikan kolom RAM dan Storage bertipe numerik
df['RAM'] = df['RAM'].astype(int)
df['Storage'] = df['Storage'].astype(int)

# Normalisasi teks
df['Touch'] = df['Touch'].str.strip().str.lower()
df['Touch'] = df['Touch'].replace({'yes': 1, 'no': 0})

# Preview
df.head()


Unnamed: 0,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,Integrated,15.6,0,16648500.0
1,New,Alurin,Go,Intel Celeron,8,256,SSD,Integrated,15.6,0,4933500.0
2,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,Integrated,15.6,0,13018500.0
3,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,0,19783500.0
4,New,HP,15S,Intel Core i5,16,512,SSD,Integrated,15.6,0,11038665.0


In [11]:
df.to_csv('cleaned_dataset.csv', index=False)


## 3. Pemahaman Kebutuhan Pengguna (NLP)

### Tujuan
Menginterpretasikan input teks dari pengguna untuk memahami kebutuhan.

### Langkah
1. Tokenisasi dan Stemming.
2. Perhitungan TF-IDF.
3. Pembuatan vektor kebutuhan.

### Implementasi


In [12]:
# Contoh NLP dengan TF-IDF
# Load data simulasi preferensi pengguna
df_pref = pd.read_csv('simulasi_preferensi_pengguna.csv')
df_pref.head()

Unnamed: 0,Teks Kebutuhan,Label
0,Saya butuh laptop untuk edit video dan desain ...,desain
1,Main Valorant dan GTA V tanpa lag,gaming
2,"Laptop untuk bekerja, menggunakan Zoom dan Mic...",kantor
3,Laptop ringan untuk browsing dan mengetik,umum
4,Untuk main game berat seperti Cyberpunk dan El...,gaming


In [13]:
# Inisialisasi stemmer Bahasa Indonesia
stemmer = StemmerFactory().create_stemmer()

def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    return stemmer.stem(text)

In [14]:
# Terapkan preprocessing ke kolom teks
df_pref['preprocessed'] = df_pref['Teks Kebutuhan'].apply(preprocess)

# TF-IDF Vectorization
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df_pref["preprocessed"]).toarray()

In [15]:
# Encode label
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(df_pref["Label"])

In [16]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [17]:
# Build model
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(4, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [18]:
# Train model
model.fit(X_train, y_train, epochs=30, batch_size=4, validation_split=0.2)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x738f9ff06b20>

In [19]:
# Evaluate
loss, acc = model.evaluate(X_test, y_test)
print("Accuracy:", acc)

Accuracy: 0.5


In [20]:
# Simpan model TensorFlow
model.save("lappick_model.h5")

# Simpan vectorizer dan label encoder
joblib.dump(vectorizer, "tfidf_vectorizer.pkl")
joblib.dump(label_encoder, "label_encoder.pkl")

['label_encoder.pkl']


## 4. Sistem Rekomendasi

### Tujuan
Merekomendasikan laptop berdasarkan kesesuaian dengan input pengguna.

### Langkah
1. Menghitung kemiripan vektor (cosine similarity).
2. Menampilkan laptop dengan skor tertinggi.

### Implementasi


In [21]:
# Load semua komponen
model = tf.keras.models.load_model("lappick_model.h5")
vectorizer = joblib.load("tfidf_vectorizer.pkl")
label_encoder = joblib.load("label_encoder.pkl")
df_laptop = pd.read_csv("cleaned_dataset.csv")

In [22]:
# Preprocess
stemmer = StemmerFactory().create_stemmer()
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    return stemmer.stem(text)

In [23]:
def ekstrak_budget(teks):
    # Cari angka dan konversi dari "juta" ke angka (jika disebutkan)
    match = re.search(r'(\d+)\s*(juta|jt)', teks.lower())
    if match:
        angka = int(match.group(1)) * 1_000_000
        return angka
    return None  # Tidak ada angka ditemukan


In [24]:
def rekomendasi(teks_user, top_n=5):
    teks_clean = preprocess(teks_user)
    vektor_input = vectorizer.transform([teks_clean]).toarray()
    prediksi = model.predict(vektor_input)[0]
    label_index = np.argmax(prediksi)
    label = label_encoder.inverse_transform([label_index])[0]

    # Filter label
    df = df_laptop.copy()
    if label == 'gaming':
        df = df[(df['RAM'] >= 8) & df['GPU'].str.contains("RTX|GTX|Radeon", na=False, case=False)]
    elif label == 'desain':
        df = df[(df['RAM'] >= 8) & 
                df['Storage'] >= 256 &
                df['CPU'].str.contains("i5|i7|Ryzen 5|Ryzen 7", na=False, case=False) &
                df['GPU'].str.contains("RTX|GTX|Radeon", na=False, case=False)]
    elif label == 'kantor':
        df = df[(df['RAM'] >= 4) & (df['Storage'] >= 256)]
    else:
        df = df[df['RAM'] >= 4]

    # Tambahan: filter berdasarkan budget
    budget = ekstrak_budget(teks_user)
    if budget:
        df = df[df['Final Price'] <= budget]

    hasil = df.sort_values(by="Final Price").head(top_n)
    return label, hasil.reset_index(drop=True)


In [25]:
# input_user = "Saya ingin laptop untuk main game FPS dan edit video ringan"
input_user = "Saya ingin laptop untuk main game Valorant 1000 FPS dengan budget 10 juta"
# input_user = "Saya ingin laptop untuk main game FPS dan edit video ringan"
label, hasil_rekomendasi = rekomendasi(input_user)

print("Kategori:", label)
print("Hasil Rekomendasi:")
print(hasil_rekomendasi)

Kategori: gaming
Hasil Rekomendasi:
        Status Brand     Model            CPU  RAM  Storage Storage type  \
0  Refurbished  Acer     Nitro  Intel Core i5    8      256          SSD   
1  Refurbished    HP  Pavilion  Intel Core i5    8      512          SSD   

        GPU  Screen  Touch  Final Price  
0  GTX 1650    15.6      0    7880235.0  
1  GTX 1050    16.1      0    8998770.0  



## 5. Evaluasi Model

### Tujuan
Mengukur akurasi dan efektivitas model rekomendasi.

### Metode
- Precision, Recall, F1-Score
- Cosine Similarity

### Implementasi
