# Advanced Sentiment Analysis on Economic News (IndoBERT)

This notebook applies a pretrained Indonesian language model (IndoBERT)
to analyze sentiment in economic news narratives and compares the results
with a baseline lexicon-based approach.

## 1. Load Libraries and Model

In [1]:
!pip install transformers torch



In [2]:
from transformers import pipeline



In [3]:
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="indobenchmark/indobert-base-p1"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/498M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at indobenchmark/indobert-base-p1 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


In [4]:
sentiment_pipeline("harga beras naik dan masyarakat merasa terbebani")

[{'label': 'LABEL_2', 'score': 0.2837033271789551}]

## 2. Load and Prepare Text Data


In [5]:
import pandas as pd

df = pd.read_csv("clean_text_data.csv")
df.head()


Unnamed: 0,clean_text
0,kkp setor pnbp rp m disokong izin pemanfaatan ...
1,bukan rp ribu gus ipul usul purbaya tambah ban...
2,amran bongkar ton bawang bombai ilegal selundu...
3,viva yoga koperasi tingkatkan aktivitas ekonom...
4,purbaya beri kredit rp t untuk industri furnit...


In [6]:
df_sample = df.head(50)


## 3. Apply IndoBERT Sentiment Analysis


In [7]:
results = sentiment_pipeline(df_sample["clean_text"].tolist())
results[:5]


[{'label': 'LABEL_2', 'score': 0.31672319769859314},
 {'label': 'LABEL_2', 'score': 0.2669069170951843},
 {'label': 'LABEL_2', 'score': 0.28942936658859253},
 {'label': 'LABEL_2', 'score': 0.2518157362937927},
 {'label': 'LABEL_2', 'score': 0.30059751868247986}]

In [8]:
df_sample["bert_label"] = [r["label"] for r in results]
df_sample["bert_score"] = [r["score"] for r in results]

df_sample.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sample["bert_label"] = [r["label"] for r in results]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sample["bert_score"] = [r["score"] for r in results]


Unnamed: 0,clean_text,bert_label,bert_score
0,kkp setor pnbp rp m disokong izin pemanfaatan ...,LABEL_2,0.316723
1,bukan rp ribu gus ipul usul purbaya tambah ban...,LABEL_2,0.266907
2,amran bongkar ton bawang bombai ilegal selundu...,LABEL_2,0.289429
3,viva yoga koperasi tingkatkan aktivitas ekonom...,LABEL_2,0.251816
4,purbaya beri kredit rp t untuk industri furnit...,LABEL_2,0.300598


In [10]:
from google.colab import sheets
sheet = sheets.InteractiveSheet(df=df_sample)

https://docs.google.com/spreadsheets/d/1jBiCHauVuWP3puG3EeyCcXxyxHCZ_848xHLMjeO2_98/edit#gid=0


## 4. Map Model Labels to Human-Readable Sentiment


In [11]:
label_mapping = {
    "LABEL_0": "negative",
    "LABEL_1": "positive"
}

df_sample["bert_sentiment"] = df_sample["bert_label"].map(label_mapping)
df_sample[["bert_label", "bert_sentiment"]].head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sample["bert_sentiment"] = df_sample["bert_label"].map(label_mapping)


Unnamed: 0,bert_label,bert_sentiment
0,LABEL_2,
1,LABEL_2,
2,LABEL_2,
3,LABEL_2,
4,LABEL_2,


## 5. Compare with Baseline Sentiment


In [12]:
df_baseline = pd.read_csv("baseline_sentiment_result.csv")
df_baseline.head()


Unnamed: 0,tokens_clean,sentiment_score,sentiment
0,"['kkp', 'setor', 'pnbp', 'rp', 'm', 'disokong'...",0,neutral
1,"['rp', 'ribu', 'gus', 'ipul', 'usul', 'purbaya...",0,neutral
2,"['amran', 'bongkar', 'ton', 'bawang', 'bombai'...",0,neutral
3,"['viva', 'yoga', 'koperasi', 'tingkatkan', 'ak...",0,neutral
4,"['purbaya', 'kredit', 'rp', 't', 'industri', '...",0,neutral


In [13]:
df_compare = df_baseline.head(50).copy()
df_compare["bert_sentiment"] = df_sample["bert_sentiment"].values

df_compare[["sentiment", "bert_sentiment"]].head()


Unnamed: 0,sentiment,bert_sentiment
0,neutral,
1,neutral,
2,neutral,
3,neutral,
4,neutral,


In [14]:
print("Baseline sentiment distribution:")
print(df_compare["sentiment"].value_counts())

print("\nIndoBERT sentiment distribution:")
print(df_compare["bert_sentiment"].value_counts())


Baseline sentiment distribution:
sentiment
neutral     46
negative     3
positive     1
Name: count, dtype: int64

IndoBERT sentiment distribution:
bert_sentiment
negative    2
Name: count, dtype: int64


## 6. Notes on Interpretation

The advanced model (IndoBERT) produces a more selective sentiment
distribution compared to the baseline approach.

This behavior reflects the model's reliance on contextual and semantic
signals, which results in fewer but more confident sentiment classifications.
