# 07 — Prediksi Sentimen dengan IndoBERT yang Sudah Di-Fine-Tune

Notebook ini menggunakan model **IndoBERT yang telah di-fine-tune**
untuk melakukan prediksi sentimen pada:

1. Teks input secara manual (single text).
2. Sekelompok teks (batch) dari list atau DataFrame.

Tujuan notebook ini:
- Menunjukkan cara penggunaan model dalam skenario nyata.
- Menampilkan contoh hasil prediksi beserta probabilitasnya.
- Menyediakan fungsi utilitas yang dapat digunakan kembali di aplikasi lain.

----

## Import Library

In [5]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
import pandas as pd
import os
from pathlib import Path

## Load Model

In [8]:
model_dir = "../models/indobert_sentiment_final"  # ganti kalau beda

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)

model.eval()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(50000, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

## Siapkan label mapping (0,1,2 → negative/neutral/positive)

In [9]:
id2label = {
    0: "negative",
    1: "neutral",
    2: "positive"
}

label2id = {v: k for k, v in id2label.items()}

## Fungsi prediksi untuk 1 teks

In [10]:
def predict_sentiment(text: str):
    # tokenisasi
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding="max_length",
        max_length=128
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]
    
    pred_id = int(np.argmax(probs))
    pred_label = id2label[pred_id]
    
    prob_dict = {id2label[i]: float(probs[i]) for i in range(len(probs))}
    
    print(f"Teks       : {text}")
    print(f"Prediksi   : {pred_label}")
    print("Probabilitas:")
    for lbl, p in prob_dict.items():
        print(f"  {lbl:<8}: {p:.4f}")
    print("-" * 50)
    
    return pred_label, prob_dict

In [18]:
predict_sentiment("kereta cepat whoosh ini keren banget, bangga sama Indonesia")
predict_sentiment("proyek ini buang-buang uang rakyat, parah banget")
predict_sentiment("ya biasa aja sih, nggak ngaruh juga ke hidup saya")
predict_sentiment("korupsi itu merugikan masyarakat kecil")
predict_sentiment("seneng banget akhirnya bisa nyobain whoosh")

Teks       : kereta cepat whoosh ini keren banget, bangga sama Indonesia
Prediksi   : neutral
Probabilitas:
  negative: 0.0007
  neutral : 0.9883
  positive: 0.0110
--------------------------------------------------
Teks       : proyek ini buang-buang uang rakyat, parah banget
Prediksi   : negative
Probabilitas:
  negative: 0.9951
  neutral : 0.0013
  positive: 0.0036
--------------------------------------------------
Teks       : ya biasa aja sih, nggak ngaruh juga ke hidup saya
Prediksi   : neutral
Probabilitas:
  negative: 0.0010
  neutral : 0.9924
  positive: 0.0066
--------------------------------------------------
Teks       : korupsi itu merugikan masyarakat kecil
Prediksi   : negative
Probabilitas:
  negative: 0.9929
  neutral : 0.0034
  positive: 0.0037
--------------------------------------------------
Teks       : seneng banget akhirnya bisa nyobain whoosh
Prediksi   : neutral
Probabilitas:
  negative: 0.0007
  neutral : 0.9907
  positive: 0.0086
----------------------------

('neutral',
 {'negative': 0.0007437911117449403,
  'neutral': 0.990704357624054,
  'positive': 0.00855178665369749})

## Fungsi prediksi batch (list of text)

In [14]:
def predict_batch(text_list):
    inputs = tokenizer(
        text_list,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=128
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=-1).cpu().numpy()
    
    pred_ids = np.argmax(probs, axis=1)
    pred_labels = [id2label[int(i)] for i in pred_ids]
    
    results = []
    for text, lbl, p in zip(text_list, pred_labels, probs):
        prob_dict = {id2label[i]: float(p[i]) for i in range(len(p))}
        results.append({
            "text": text,
            "pred_label": lbl,
            "prob_negative": prob_dict["negative"],
            "prob_neutral": prob_dict["neutral"],
            "prob_positive": prob_dict["positive"],
        })
    
    return pd.DataFrame(results)

In [15]:
texts = [
    "kereta cepat whoosh ini inovasi bagus buat Indonesia",
    "ini proyek korupsi yang merugikan negara",
    "biasa aja, ga ngaruh ke hidup saya"
]

df_pred = predict_batch(texts)
df_pred

Unnamed: 0,text,pred_label,prob_negative,prob_neutral,prob_positive
0,kereta cepat whoosh ini inovasi bagus buat Ind...,neutral,0.000552,0.990998,0.00845
1,ini proyek korupsi yang merugikan negara,negative,0.988981,0.007251,0.003768
2,"biasa aja, ga ngaruh ke hidup saya",neutral,0.001249,0.993554,0.005197


## (Opsional) Pakai dataset kamu untuk cek contoh nyata

In [16]:
df_full = pd.read_csv("../data/dataset_preprocessed.csv")

sample_df = df_full.sample(10, random_state=42)[["comment", "sentiment"]]
sample_df

Unnamed: 0,comment,sentiment
309,"Orang zalim seringkali mengandalkan kekuasaan,...",neutral
139,mana bsa berani usut kereta cepat,negative
499,KPK.di kendalikan sama Jokowi\nSeharusnya tent...,negative
854,Muliono manusia berhati iblis,negative
88,Yg di tangkap & di hukum jokodok,negative
398,Aiman kompoorrr,neutral
905,"Cacat Konstitusi,Pemilu,Awal yang Buruk?",neutral
107,Hati2 jangan bangun rute woosh lagi ke surabay...,negative
59,Yang harus di tangkap yang buat kebijakan dan ...,negative
534,"Oknum korupsi di whoosh harus diusut, biar dip...",negative


In [17]:
pred_df = predict_batch(sample_df["comment"].tolist())
pred_df["true_label"] = sample_df["sentiment"].values
pred_df

Unnamed: 0,text,pred_label,prob_negative,prob_neutral,prob_positive,true_label
0,"Orang zalim seringkali mengandalkan kekuasaan,...",negative,0.982348,0.004064,0.013588,neutral
1,mana bsa berani usut kereta cepat,negative,0.9929,0.003107,0.003993,negative
2,KPK.di kendalikan sama Jokowi\nSeharusnya tent...,negative,0.996901,0.000986,0.002113,negative
3,Muliono manusia berhati iblis,neutral,0.097392,0.890316,0.012292,negative
4,Yg di tangkap & di hukum jokodok,negative,0.995841,0.001904,0.002255,negative
5,Aiman kompoorrr,neutral,0.00074,0.995752,0.003507,neutral
6,"Cacat Konstitusi,Pemilu,Awal yang Buruk?",neutral,0.000601,0.991494,0.007905,neutral
7,Hati2 jangan bangun rute woosh lagi ke surabay...,negative,0.994215,0.001052,0.004732,negative
8,Yang harus di tangkap yang buat kebijakan dan ...,negative,0.995586,0.00231,0.002104,negative
9,"Oknum korupsi di whoosh harus diusut, biar dip...",negative,0.995099,0.000717,0.004185,negative
