# **Klasifikasi Data Review Google (Aplikasi Shopee)**
---
Proyek klasifikasi sentimen pada ulasan pengguna aplikasi Shopee dimulai dengan mengimpor library yang dibutuhkan dan mengeksplorasi dataset berisi teks ulasan beserta label sentimen (**positive, neutral, negative**). Setelah dilakukan pembersihan teks dan analisis distribusi kelas, ditemukan ketidakseimbangan data yang cukup signifikan, sehingga diterapkan teknik random oversampling untuk menyamakan jumlah data di setiap kelas. Data yang telah seimbang kemudian digunakan untuk pelatihan model dengan empat pendekatan berbeda, yaitu **Logistic Regression, Support Vector Machine (SVM), dan CNN-LSTM** sebagai metode deep learning. Evaluasi dilakukan menggunakan data uji dengan metrik accuracy, precision, recall, dan F1-score untuk menilai performa model dalam mengklasifikasikan ketiga jenis sentimen secara adil.

🎯 **Proyek ini bertujuan untuk:**
1. Membangun model klasifikasi yang dapat mengidentifikasi sentimen pengguna terhadap aplikasi Shopee,
2. Membandingkan kinerja model tradisional vs deep learning,
3. Mengatasi bias data melalui oversampling agar hasil prediksi adil dan seimbang di setiap kelas.


# **1. Import Library**

In [75]:
import pandas as pd
import numpy as np
import joblib
import sklearn
from sklearn.utils import resample
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report, accuracy_score

In [35]:
!pip install tensorflow



In [79]:
import tensorflow as tf
import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, MaxPooling1D, LSTM, Dense, Dropout

# **2. Eksplorasi data**

In [12]:
# baca dataset hasil scrapping google
url = 'shopee_reviews.csv'
df = pd.read_csv(url)

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37840 entries, 0 to 37839
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              37840 non-null  object
 1   userName              37840 non-null  object
 2   userImage             37840 non-null  object
 3   content               37840 non-null  object
 4   score                 37840 non-null  int64 
 5   thumbsUpCount         37840 non-null  int64 
 6   reviewCreatedVersion  37840 non-null  object
 7   at                    37840 non-null  object
 8   replyContent          37840 non-null  object
 9   repliedAt             37840 non-null  object
 10  appVersion            37840 non-null  object
 11  text_clean            37840 non-null  object
 12  text_slangwords       37840 non-null  object
 13  text_tokenizing       37840 non-null  object
 14  text_stopword         37840 non-null  object
 15  text_akhir            37840 non-null

In [15]:
clean_df = df.copy()

# Menghapus baris yang mengandung NaN dan data duplikat
clean_df = clean_df.dropna()
clean_df = clean_df.drop_duplicates()

print("Jumlah baris dan kolom setelah cleaning:", clean_df.shape)

Jumlah baris dan kolom setelah cleaning: (37840, 18)


In [16]:
clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37840 entries, 0 to 37839
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              37840 non-null  object
 1   userName              37840 non-null  object
 2   userImage             37840 non-null  object
 3   content               37840 non-null  object
 4   score                 37840 non-null  int64 
 5   thumbsUpCount         37840 non-null  int64 
 6   reviewCreatedVersion  37840 non-null  object
 7   at                    37840 non-null  object
 8   replyContent          37840 non-null  object
 9   repliedAt             37840 non-null  object
 10  appVersion            37840 non-null  object
 11  text_clean            37840 non-null  object
 12  text_slangwords       37840 non-null  object
 13  text_tokenizing       37840 non-null  object
 14  text_stopword         37840 non-null  object
 15  text_akhir            37840 non-null

In [18]:
# cek data polarity apakah setiap data seimbang
print(clean_df['polarity'].value_counts())

polarity
positive    19773
negative    15616
neutral      2451
Name: count, dtype: int64


# **3. Penanganan Ketidakseimbangan Data (Oversampling)**

In [21]:
from sklearn.utils import resample

# Pisahkan masing-masing kelas
neutral_df = clean_df[clean_df['polarity'] == 'neutral']
positive_df = clean_df[clean_df['polarity'] == 'positive']
negative_df = clean_df[clean_df['polarity'] == 'negative']

# Oversample kelas neutral ke jumlah target (misal samakan dengan positive: 12500)
neutral_oversampled = resample(
    neutral_df,
    replace=True,           # sampling dengan pengembalian
    n_samples=12500,        # target jumlah
    random_state=42
)

# Gabungkan kembali semua
balanced_df = pd.concat([positive_df, negative_df, neutral_oversampled])

# Cek distribusinya
print(balanced_df['polarity'].value_counts())

polarity
positive    19773
negative    15616
neutral     12500
Name: count, dtype: int64


In [24]:
# Inisialisasi encoder
le = LabelEncoder()

# Ubah label string ke bentuk angka
balanced_df["label"] = le.fit_transform(balanced_df["polarity"])
print(dict(zip(le.classes_, le.transform(le.classes_))))

{'negative': np.int64(0), 'neutral': np.int64(1), 'positive': np.int64(2)}


In [25]:
balanced_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 47889 entries, 5 to 19556
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              47889 non-null  object
 1   userName              47889 non-null  object
 2   userImage             47889 non-null  object
 3   content               47889 non-null  object
 4   score                 47889 non-null  int64 
 5   thumbsUpCount         47889 non-null  int64 
 6   reviewCreatedVersion  47889 non-null  object
 7   at                    47889 non-null  object
 8   replyContent          47889 non-null  object
 9   repliedAt             47889 non-null  object
 10  appVersion            47889 non-null  object
 11  text_clean            47889 non-null  object
 12  text_slangwords       47889 non-null  object
 13  text_tokenizing       47889 non-null  object
 14  text_stopword         47889 non-null  object
 15  text_akhir            47889 non-null  obj

# **4. Perbandingan Logistic Regression, SVM, dan CNN-LSTM untuk Klasifikasi Sentimen Multikelas**

In [28]:
x_balanced = balanced_df['text_akhir']
y_balanced = balanced_df['polarity']

###### **SKEMA 1**
###### **PELATIHAN DAN EVALUASI MODEL LOGISTIC REGRESSION CLASSIFICATION DENGAN OVERSAMPLING**
---
Berikut skema model machine learning untuk klasifikasi teks menggunakan algoritma Logistic Regression, yang dipadukan dengan teknik ekstraksi fitur TF-IDF (Term Frequency-Inverse Document Frequency).

Pelatihan: **Logistic Regression**

Ekstraksi Fitur: **TF-IDF**

Pembagian data: **70/30**

In [41]:
# 1. Split data 70/30
X_train, X_test, y_train, y_test = train_test_split(x_balanced, y_balanced, test_size=0.3, random_state=42, stratify=y_balanced)

# 2. Ekstraksi fitur menggunakan TF-IDF
tfidf_vectorizer_logis = TfidfVectorizer(max_features=10000)
X_train_tfidf = tfidf_vectorizer_logis.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer_logis.transform(X_test)

# 3. Pelatihan model Logistic Regression
model_logis_tf = LogisticRegression(
    solver='sag',
    max_iter=1000,
    random_state=42)
model_logis_tf.fit(X_train_tfidf, y_train)

# 4. Prediksi dan evaluasi
y_pred_train = model_logis_tf.predict(X_train_tfidf)
y_pred_test = model_logis_tf.predict(X_test_tfidf)

# 5. Evaluasi akurasi
print("Train Accuracy:", accuracy_score(y_train, y_pred_train))
print("Test Accuracy:", accuracy_score(y_test, y_pred_test))

# 6. Classification Report
print("\nClassification Report on Test Set:\n")
print(classification_report(y_test, y_pred_test))

Train Accuracy: 0.9193663862538035
Test Accuracy: 0.8681701120623652

Classification Report on Test Set:

              precision    recall  f1-score   support

    negative       0.88      0.87      0.87      4685
     neutral       0.80      0.82      0.81      3750
    positive       0.90      0.90      0.90      5932

    accuracy                           0.87     14367
   macro avg       0.86      0.86      0.86     14367
weighted avg       0.87      0.87      0.87     14367



In [43]:
# Simpan model dan vectorizer
joblib.dump(model_logis_tf, 'logistic_tf_model.joblib')
joblib.dump(tfidf_vectorizer_logis, 'tfidf_vectorizer_logis.joblib')

['tfidf_vectorizer_logis.joblib']

###### **SKEMA 2**
###### **PELATIHAN DAN EVALUASI MODEL LOGISTIC REGRESSION CLASSIFICATION DENGAN OVERSAMPLING**
---
Berikut skema model machine learning untuk klasifikasi teks menggunakan algoritma Logistic Regression yang dipadukan dengan teknik ekstraksi fitur Bag-of-Words (BoW)

Pelatihan: **Logistic Regression**

Ekstraksi Fitur: **BoW**

Pembagian data: **80/20**

In [45]:
# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(x_balanced, y_balanced, test_size=0.2, random_state=42, stratify=y_balanced)

# 2. Inisialisasi CountVectorizer (BoW)
bow_vectorizer = CountVectorizer(
    max_features=30000,
    ngram_range=(1, 1),
    stop_words='english'
)

# 3. Ekstraksi fitur
X_train_bow = bow_vectorizer.fit_transform(X_train)
X_test_bow = bow_vectorizer.transform(X_test)

# 4. Latih model klasifikasi
model_logis_bow = LogisticRegression(max_iter=1000, random_state=42)
model_logis_bow.fit(X_train_bow, y_train)

# 5. Prediksi dan evaluasi
y_pred_train = model_logis_bow.predict(X_train_bow)
y_pred_test = model_logis_bow.predict(X_test_bow)

# 6. Evaluasi akurasi
print("Train Accuracy:", accuracy_score(y_train, y_pred_train))
print("Test Accuracy:", accuracy_score(y_test, y_pred_test))

# 7. Classification Report
print("\nClassification Report on Test Set:\n")
print(classification_report(y_test, y_pred_test))

Train Accuracy: 0.9628044164861267
Test Accuracy: 0.9023804552098559

Classification Report on Test Set:

              precision    recall  f1-score   support

    negative       0.95      0.85      0.90      3123
     neutral       0.80      0.96      0.87      2500
    positive       0.94      0.91      0.93      3955

    accuracy                           0.90      9578
   macro avg       0.90      0.91      0.90      9578
weighted avg       0.91      0.90      0.90      9578



In [46]:
# Simpan model dan vectorizer
joblib.dump(model_logis_bow, 'logistic_bow_model.joblib')
joblib.dump(bow_vectorizer, 'bow_vectorizer.joblib')

['bow_vectorizer.joblib']

###### **SKEMA 3**
###### **PELATIHAN DAN EVALUASI MODEL LINEAR SUPPORT VECTOR CLASSIFICATION**
---

Berikut skema model machine learning untuk klasifikasi teks menggunakan algoritma Linear Support Vector Classification (LinearSVC) yang dipadukan dengan teknik ekstraksi fitur TF-IDF (Term Frequency-Inverse Document Frequency).

Pelatihan: **SVM**

Ekstraksi Fitur: **TF-IDF**

Pembagian data: **80/20**

In [33]:
# 1. Split data
X_train, X_test, y_train, y_test = train_test_split(
    x_balanced, y_balanced, test_size=0.2, random_state=42, stratify=y_balanced
)

# 2. Pipeline: TF-IDF + LinearSVC
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(
        max_features=10000,
        ngram_range=(1, 3),
        min_df=5,
        stop_words='english'
    )),
    ('svc', LinearSVC(
        class_weight='balanced',
        random_state=42
    ))
])

# 3. Hyperparameter tuning
param_grid = {
    'svc__C': [0.1, 0.5, 1, 2, 5, 10],
    'svc__tol': [1e-3, 1e-4],
    'svc__max_iter': [1000, 2000]
}

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=cv,
    scoring='accuracy',
    verbose=2,
    n_jobs=-1
)

# 4. Training
grid_search.fit(X_train, y_train)

# 5. Evaluasi model terbaik
print("Best Parameters:", grid_search.best_params_)
print("Best CV Accuracy:", grid_search.best_score_)

best_model = grid_search.best_estimator_

# 6. Evaluasi di training set
y_train_pred = best_model.predict(X_train)
train_accuracy = accuracy_score(y_train, y_train_pred)

# 7. Evaluasi di test set
y_test_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_pred)

# 8. Output hasil evaluasi
print("\nTrain Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)
print("\nClassification Report on Test Set:")
print(classification_report(y_test, y_test_pred))

Fitting 5 folds for each of 24 candidates, totalling 120 fits
Best Parameters: {'svc__C': 10, 'svc__max_iter': 1000, 'svc__tol': 0.001}
Best CV Accuracy: 0.9120878655220113

Train Accuracy: 0.9939181958184333
Test Accuracy: 0.9155356024222175

Classification Report on Test Set:
              precision    recall  f1-score   support

    negative       0.93      0.87      0.90      3123
     neutral       0.87      0.98      0.92      2500
    positive       0.94      0.91      0.92      3955

    accuracy                           0.92      9578
   macro avg       0.91      0.92      0.91      9578
weighted avg       0.92      0.92      0.92      9578



In [47]:
# Simpan pipeline terbaik
joblib.dump(best_model, 'best_svc_tfidf_pipeline.joblib')

['best_svc_tfidf_pipeline.joblib']

###### **SKEMA 4**
###### **PELATIHAN DAN EVALUASI MODEL CNN-LSTM DENGAN WORD EMBEDDING**
---
Berikut skema model deep learning untuk klasifikasi teks menggunakan arsitektur gabungan Convolutional Neural Network (CNN) dan Long Short-Term Memory (LSTM), yang dipadukan dengan teknik ekstraksi fitur berbasis Word Embedding menggunakan Keras.

Pelatihan: **CNN-LSTM**

Ekstraksi Fitur: **Word Embedding (Keras Embedding Layer)**

Pembagian data: **70/30**

Fungsi Aktivasi Output: **Softmax**

Fungsi Loss: **Categorical Crossentropy**

Optimizer: **Adam**

Model ini mampu menangkap fitur lokal dari teks melalui lapisan konvolusi, kemudian mengolah informasi sekuensial menggunakan LSTM untuk prediksi kelas akhir (positif, netral, negatif). Arsitektur ini cocok untuk tugas klasifikasi teks multi-kelas.

In [49]:
# Split data (stratify biar seimbang)
train_df, test_df = train_test_split(
    balanced_df,
    test_size=0.2,
    random_state=42,
    stratify=balanced_df["label"]
)

# Tokenizer
tokenizer = Tokenizer(num_words=10000, oov_token="<OOV>")
tokenizer.fit_on_texts(train_df["text_akhir"])

# Konversi ke urutan angka
X_train_seq = tokenizer.texts_to_sequences(train_df["text_akhir"])
X_test_seq = tokenizer.texts_to_sequences(test_df["text_akhir"])

# Padding
max_len = 100  # bisa disesuaikan
X_train_pad = pad_sequences(X_train_seq, maxlen=max_len, padding='post')
X_test_pad = pad_sequences(X_test_seq, maxlen=max_len, padding='post')

# One-hot encode label
y_train = to_categorical(train_df["label"])
y_test = to_categorical(test_df["label"])

model_cnn = Sequential()
model_cnn.add(Embedding(input_dim=10000, output_dim=128, input_length=max_len))
model_cnn.add(Conv1D(filters=64, kernel_size=5, activation='relu'))
model_cnn.add(MaxPooling1D(pool_size=2))
model_cnn.add(LSTM(units=64, return_sequences=False))
model_cnn.add(Dropout(0.5))
model_cnn.add(Dense(3, activation='softmax'))  # 3 kelas

model_cnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])



In [50]:
history = model_cnn.fit(
    X_train_pad, y_train,
    validation_data=(X_test_pad, y_test),
    epochs=5,
    batch_size=64
)

Epoch 1/5
[1m599/599[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 123ms/step - accuracy: 0.5213 - loss: 0.9388 - val_accuracy: 0.8469 - val_loss: 0.4225
Epoch 2/5
[1m599/599[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 122ms/step - accuracy: 0.8906 - loss: 0.3132 - val_accuracy: 0.8978 - val_loss: 0.2899
Epoch 3/5
[1m599/599[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 118ms/step - accuracy: 0.9542 - loss: 0.1405 - val_accuracy: 0.9217 - val_loss: 0.2571
Epoch 4/5
[1m599/599[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 119ms/step - accuracy: 0.9707 - loss: 0.0968 - val_accuracy: 0.9234 - val_loss: 0.2880
Epoch 5/5
[1m599/599[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 121ms/step - accuracy: 0.9810 - loss: 0.0646 - val_accuracy: 0.9238 - val_loss: 0.2366


In [51]:
y_pred = model_cnn.predict(X_test_pad)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

print("Accuracy:", accuracy_score(y_true, y_pred_classes))
target_names = ["negative", "neutral", "positive"]
print(classification_report(y_true, y_pred_classes, target_names=target_names))

[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 22ms/step
Accuracy: 0.9237836709125078
              precision    recall  f1-score   support

    negative       0.96      0.89      0.93      3123
     neutral       0.91      0.91      0.91      2500
    positive       0.91      0.96      0.93      3955

    accuracy                           0.92      9578
   macro avg       0.93      0.92      0.92      9578
weighted avg       0.93      0.92      0.92      9578



In [52]:
# Langsung simpan
import pickle

with open("tokenizer.pkl", "wb") as f:
    pickle.dump(tokenizer, f)

model_cnn.save("cnn_lstm_model.h5")



# **5. Uji Validitas Model Menggunakan Data Teks Baru**

In [61]:
teks_baru = [
    "Saya benci ini sangat mengecewakan",
    "Barang yang saya terima rusak dan sangat mengecewakan",
    "Saya baru menerima paketnya hari ini",
    "Produk sesuai dskripsi",
    "Belum sempat digunakan"
]
print(teks_baru)

['Saya benci ini sangat mengecewakan', 'Barang yang saya terima rusak dan sangat mengecewakan', 'Saya baru menerima paketnya hari ini', 'Produk sesuai dskripsi', 'Belum sempat digunakan']


In [67]:
# Transform dan prediksi model Logistic Regression (TF-IDF)
print("--- Hasil Prediksi Model Logistic Regression (TF-IDF) ---")
teks_baru_tfidf = tfidf_vectorizer_logis.transform(teks_baru)
prediksi_kelas = model_logis_tf.predict(teks_baru_tfidf)

for teks, label in zip(teks_baru, prediksi_kelas):
    print(f"Teks: {teks}\nPrediksi Label: {label}\n")

--- Hasil Prediksi Model Logistic Regression (TF-IDF) ---
Teks: Saya benci ini sangat mengecewakan
Prediksi Label: negative

Teks: Barang yang saya terima rusak dan sangat mengecewakan
Prediksi Label: negative

Teks: Saya baru menerima paketnya hari ini
Prediksi Label: neutral

Teks: Produk sesuai dskripsi
Prediksi Label: positive

Teks: Belum sempat digunakan
Prediksi Label: neutral



In [68]:
# Transform dan prediksi model Logistic Regression (BoW)
print("--- Hasil Prediksi Model Logistic Regression (BoW) ---")
X_new = bow_vectorizer.transform(teks_baru)
y_pred_logis_bow = model_logis_bow.predict(X_new)

for teks, label in zip(teks_baru, y_pred_logis_bow):
    print(f"Teks: {teks}\nPrediksi Label: {label}\n")

--- Hasil Prediksi Model Logistic Regression (BoW) ---
Teks: Saya benci ini sangat mengecewakan
Prediksi Label: negative

Teks: Barang yang saya terima rusak dan sangat mengecewakan
Prediksi Label: negative

Teks: Saya baru menerima paketnya hari ini
Prediksi Label: neutral

Teks: Produk sesuai dskripsi
Prediksi Label: positive

Teks: Belum sempat digunakan
Prediksi Label: neutral



In [69]:
# Transform dan prediksi model SVM (TF-IDF + SVC via pipeline)
print("--- Hasil Prediksi Model SVM ---")
y_pred_svm = best_model.predict(teks_baru)
for teks, label in zip(teks_baru, y_pred_svm):
    print(f"Teks: {teks}\nPrediksi Label: {label}\n")

--- Hasil Prediksi Model SVM ---
Teks: Saya benci ini sangat mengecewakan
Prediksi Label: negative

Teks: Barang yang saya terima rusak dan sangat mengecewakan
Prediksi Label: negative

Teks: Saya baru menerima paketnya hari ini
Prediksi Label: positive

Teks: Produk sesuai dskripsi
Prediksi Label: positive

Teks: Belum sempat digunakan
Prediksi Label: neutral



In [71]:
# Tokenisasi
sequences_baru = tokenizer.texts_to_sequences(teks_baru)

# Padding
X_baru_pad = pad_sequences(sequences_baru, maxlen=max_len, padding='post')

y_pred_probs = model_cnn.predict(X_baru_pad)  # hasil: probabilitas
y_pred_classes = np.argmax(y_pred_probs, axis=1)

label_mapping = {0: "negative", 1: "neutral", 2: "positive"}

print("--- Hasil Prediksi Model CNN ---")
for teks, pred in zip(teks_baru, y_pred_classes):
    label = label_mapping[pred]  # ambil nama label dari mapping
    print(f"Teks: {teks}\nPrediksi Label: {label}\n")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 131ms/step
--- Hasil Prediksi Model CNN ---
Teks: Saya benci ini sangat mengecewakan
Prediksi Label: negative

Teks: Barang yang saya terima rusak dan sangat mengecewakan
Prediksi Label: negative

Teks: Saya baru menerima paketnya hari ini
Prediksi Label: positive

Teks: Produk sesuai dskripsi
Prediksi Label: positive

Teks: Belum sempat digunakan
Prediksi Label: positive



# **6. Kesimpulan**

---

Dari hasil evaluasi empat model klasifikasi sentimen, CNN menunjukkan performa terbaik dengan akurasi tertinggi (92,4%) dan f1-score yang seimbang di semua kelas. SVM menempati posisi kedua dengan performa yang sangat stabil dan akurasi 91,5%. Logistic Regression dengan BoW juga cukup andal (90,2%) dan cocok untuk penggunaan ringan. Sementara itu, model TF-IDF memiliki akurasi paling rendah (86,8%) dan cenderung kurang akurat untuk kelas netral. Namun, untuk prediksi teks baru, Logistic Regression (TF-IDF dan BoW) justru lebih konsisten memahami konteks netral dan ambigu. CNN direkomendasikan untuk performa maksimal, SVM untuk keseimbangan, dan BoW untuk efisiensi.