# Sentiment Analysis on Indonesian Tweets about PPKM  
### Using SentenceTransformer Embeddings, Logistic Regression, and FAISS Similarity Search  
Author: *Sumitra Adriansyah*  


## Prediksi Model

### Instalasi & Load libraries

In [6]:
!pip install -q sentence-transformers huggingface_hub faiss-cpu scikit-learn joblib

In [8]:
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download
from huggingface_hub import snapshot_download
import joblib, json
import faiss
import numpy as np

### Load Model

In [16]:
repo_id = "sumitraadrian/ppkm-sentiment-model"

encoder_path = snapshot_download(
    repo_id=repo_id,
    allow_patterns=["sentence_transformer/*"]
)

encoder = SentenceTransformer(f"{encoder_path}/sentence_transformer")

clf_path = hf_hub_download(repo_id, "classifier.pkl")
clf = joblib.load(clf_path)


faiss_path = hf_hub_download(repo_id, "faiss.index")
index = faiss.read_index(faiss_path)

print("All components loaded successfully!")


Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

All components loaded successfully!


### Upload File Prediksi & Hasil

In [13]:
from google.colab import files
import pandas as pd

uploaded = files.upload()

csv_name = list(uploaded.keys())[0]
df = pd.read_csv(csv_name)

print(df.head())
print("Jumlah rows:", len(df))

Saving small_sample_data.csv to small_sample_data.csv
                                                text
0  krn masih ingat, 2019 itu sampai bbrp bulan pe...
1  apa itu obat molnupiravir viral viraltiktok vi...
2  pas covid gue akan setuju sama gubernur jakart...
3  satu nyawa terlalu sayang untuk meninggal kare...
4  sertu tri bintoro melaksanakan kegiatan secara...
Jumlah rows: 10


In [20]:
label_map = {
    "0": "positive",
    "1": "neutral",
    "2": "negative"
}

def predict_sentiment(text):
    emb = encoder.encode([text])
    pred = clf.predict(emb)[0]
    conf = clf.predict_proba(emb).max()

    # FAISS nearest
    D, I = index.search(np.array(emb), k=1)
    nearest_index = int(I[0][0])

    label_name = label_map[str(pred)] if str(pred) in label_map else pred

    return label_name, float(conf), nearest_index

In [25]:
df["label_pred"], df["confidence"], df["nearest_id"] = zip(*df["text"].apply(predict_sentiment))
pd.set_option('display.max_rows', None)

df

Unnamed: 0,text,label_pred,confidence,nearest_id
0,"krn masih ingat, 2019 itu sampai bbrp bulan pe...",positive,0.594337,8550
1,apa itu obat molnupiravir viral viraltiktok vi...,neutral,0.984278,2784
2,pas covid gue akan setuju sama gubernur jakart...,positive,0.519536,3729
3,satu nyawa terlalu sayang untuk meninggal kare...,positive,0.990181,17786
4,sertu tri bintoro melaksanakan kegiatan secara...,neutral,0.967057,345
5,status ppkm di wilayah jabodetabek naik menjad...,neutral,0.87859,11592
6,perpanjangan ppkm mikro level 2 dan 3 di aceh ...,neutral,0.996641,373
7,"terus karena covid diundur jadi juli, terus di...",negative,0.914634,13569
8,"pkkm diperpanjang, pandemi covid-19 ini paling...",positive,0.648368,698
9,kesel bgt liat manusia² yg lg pkkm malah main ...,negative,0.980317,14249
