# I. Introduction


*This notebook is prepared to analyze news titles into 3 sentiment categories which are positive, negative and neutral on an existing dataset using a pretrained model*

*Model : [indonesian-roberta-base-sentiment-classifier](https://huggingface.co/w11wo/indonesian-roberta-base-sentiment-classifier)*

*Stopwords and Lemmatization : [NLP_bahasa_resources](https://github.com/louisowen6/NLP_bahasa_resources)*

*Prepared by* : **Achmad Dhani & Faris Arief Mawardi**

# II. Import Libraries and Setting Up Functions

## 2.1 Libraries

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [32]:
import pandas as pd
import nltk
import numpy as np
import nltk
from nltk.tokenize import word_tokenize
from nlp_id.lemmatizer import Lemmatizer
import re
import ast
from transformers import pipeline
from collections import Counter

In [3]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/achmaddhani/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [4]:
# loading
with open('/Users/achmaddhani/projects/resources/NLP_bahasa_resources/combined_stop_words.txt', 'r') as file_1:
    stop_words= file_1.read()
    
with open('/Users/achmaddhani/projects/resources/NLP_bahasa_resources/combined_slang_words.txt', 'r') as file_2:
    word_variations_string= file_2.read()
    word_variations = ast.literal_eval(word_variations_string)

## 2.2 Model

In [5]:
pretrained_name = "w11wo/indonesian-roberta-base-sentiment-classifier"

nlp = pipeline(
    "sentiment-analysis",
    model=pretrained_name,
    tokenizer=pretrained_name
)

## 2.3 Functions

In [6]:
special_char_removal = re.compile(r"[^a-zA-Z\s\']")
lemmatizer = Lemmatizer()

def text_preprocessing(text):
    text = text.lower()
    text = special_char_removal.sub(" ", text)
    tokens = word_tokenize(text) # tokenization

    tokens = [word_variations.get(word, word) for word in tokens]  # correcting the additional spellings
    tokens = [lemmatizer.lemmatize(word) for word in tokens]  # Lemmatize to get the root of the words
    tokens = [word for word in tokens if word not in stop_words]  # removing stop words
    
    processed_text = ' '.join(tokens)

    return processed_text

def sentiment_result(text):
    result= nlp(text)
    if result[0]['label'] == 'positive':
        return 'positif'
    elif result[0]['label'] == 'negative':
        return 'negatif'
    else:
        return 'netral'

# III. Sentiment Analysis

## 3.1 Predicting Sentiment

**Testing the model**

In [7]:
test = nlp('saya senang sekali hari ini')
print(test)

[{'label': 'positive', 'score': 0.99562007188797}]


**Predicting**

In [20]:
df= pd.read_csv('./media_datasets/cleaned_media.csv')

In [21]:
df

Unnamed: 0,judul_berita,portal_media,tanggal_publikasi,url
0,Live Now! Tim Prabowo-Gibran Beberkan Visi-Pro...,CNBC Indonesia,2023-11-15,https://www.cnbcindonesia.com/news/20231115130...
1,Waketum PAN Analogikan Prabowo-Gibran seperti ...,detik.com,2023-11-15,https://news.detik.com/pemilu/d-7038627/waketu...
2,"Prabowo-Gibran Nomor Urut 2, Gerindra Jatim: P...",detik.com,2023-11-15,https://www.detik.com/jatim/berita/d-7038617/p...
3,TKN Prabowo-Gibran tekankan fitnah rusak cara ...,Antaranews.com,2023-11-15,https://www.antaranews.com/berita/3824628/tkn-...
4,"Gibran Salim ke Megawati, TKN Prabowo-Gibran: ...",Kompas.com,2023-11-15,https://nasional.kompas.com/read/2023/11/15/21...
...,...,...,...,...
1981,Mahfud Md ke Pendukung: Jangan Terpengaruh Has...,detik.com,2023-12-15,https://www.detik.com/jabar/berita/d-7092431/m...
1982,Alumni HMI Prihatin Banyak Caleg Tak Pasang Fo...,DRberita.ID,2023-12-15,https://www.drberita.id/politik/alumni-hmi-pri...
1983,Gibran Slated for First Campaign Outside of Java,Jakarta Globe,2023-12-15,https://jakartaglobe.id/news/gibran-slated-for...
1984,"Anies Singgung Soal Oposisi, Prabowo Singgung ...",Kompas.com,2023-12-15,https://www.kompas.tv/video/469419/anies-singg...


In [22]:
df['hasil_sentimen']= df['judul_berita'].apply(sentiment_result)

In [23]:
df.sample(10)

Unnamed: 0,judul_berita,portal_media,tanggal_publikasi,url,hasil_sentimen
1924,Ketua TKD Prabowo-Gibran Sulteng Minta Koalisi...,TribunNews,2023-12-15,https://palu.tribunnews.com/2023/12/15/ketua-t...,netral
1427,"Indikator Politik: Prabowo-Gibran 45,8%, Mengu...",Databoks,2023-12-10,https://databoks.katadata.co.id/datapublish/20...,netral
1352,Prabowo-Gibran Diyakini Mampu Wujudkan Harapan...,detik.com,2023-12-09,https://news.detik.com/pemilu/d-7080978/prabow...,netral
1711,Lima Rekomendasi Relawan Penerus Negeri untuk ...,Merdeka.com,2023-12-13,https://www.merdeka.com/politik/lima-rekomenda...,netral
963,"Berkunjung ke Banten, Prabowo Dianggap Peduli ...",Radar Banten,2023-12-04,https://www.radarbanten.co.id/2023/12/04/berku...,netral
1221,Fahri Yakin Prabowo-Gibran Memenangi Pertarung...,Lombok Post,2023-12-07,https://lombokpost.jawapos.com/politika/150344...,netral
118,PSI Aceh Dukung Tim Hukum Prabowo-Gibran Lapor...,Serambinews.com,2023-11-17,https://aceh.tribunnews.com/2023/11/17/psi-ace...,netral
958,Sederet Blunder yang Diciptakan Gibran saat Ka...,Kabar24,2023-12-04,https://kabar24.bisnis.com/read/20231204/15/17...,netral
908,"Bantah Dukung Prabowo-Gibran, Abuya Muhtadi: J...",Video,2023-12-04,https://video.kompas.com/watch/1073762/bantah-...,netral
1064,Ketua Team Pemenangan Getar 08 Ir.Bambang Pria...,MediaPATRIOT.CO.ID,2023-12-05,https://www.mediapatriot.co.id/2023/12/05/ketu...,netral


In [24]:
df['hasil_sentimen'].value_counts()

hasil_sentimen
netral     1794
negatif     133
positif      59
Name: count, dtype: int64

## 3.2 Text Processing

In [25]:
df['olahan_teks']= df['judul_berita'].apply(text_preprocessing)

In [26]:
df.sample(10)

Unnamed: 0,judul_berita,portal_media,tanggal_publikasi,url,hasil_sentimen,olahan_teks
432,Relawan Taruna Pro Gibran Deklarasi Dukung Pra...,detik.com,2023-11-25,https://news.detik.com/pemilu/d-7056414/relawa...,netral,rawan taruna profesional gibran deklarasi duku...
91,Nasib IKN Setelah Jokowi hingga Kesepakatan Ha...,CNBC Indonesia,2023-11-16,https://www.cnbcindonesia.com/news/20231116210...,netral,nasib jokowi sepakat hamas israel
808,RKB Gaet Anak Muda Pemilih Prabowo-Gibran Lewa...,Viva,2023-12-02,https://www.viva.co.id/berita/nasional/1663670...,netral,rkb gaet muda pilih prabowo gibran motor milen...
1133,"Jadi Pemimpin Nasional, Bibit Waluyo Minta Gib...",Soloraya,2023-12-06,https://soloraya.solopos.com/jadi-pemimpin-nas...,netral,pimpin nasional bibit waluyo gibran kuasa panc...
323,Muzani ke Kader Gerindra Yogyakarta: Prabowo-G...,detik.com,2023-11-22,https://news.detik.com/pemilu/d-7051438/muzani...,netral,muzani kader gerindra yogyakarta prabowo gibra...
535,Forum Pendiri Demokrat Manuver ke Ganjar-Mahfu...,Kabar24,2023-11-28,https://kabar24.bisnis.com/read/20231128/15/17...,netral,forum demokrat manuver ganjar mahfud batal duk...
1385,"Ganjar–Mahfud Nobar, Prabowo–Gibran Konsolidasi",NUSABALI.com,2023-12-09,https://www.nusabali.com/berita/156617/ganjar-...,netral,ganjar mahfud nonton bareng prabowo gibran kon...
1662,Hendri Satrio Ungkap Gosip Alasan Ketum Parpol...,Optika.id,2023-12-12,https://optika.id/news-63490-hendri-satrio-ung...,netral,hendri satrio gosip alas tum partai politik pa...
1013,Bawaslu Telusuri Unsur Kampanye dalam Iklan Su...,Kompas.com,2023-12-05,https://nasional.kompas.com/read/2023/12/05/20...,netral,bawaslu telusur unsur kampanye dalam iklan sus...
1486,Gibran disambut Said Aqil saat kunjungi Ponpes...,Antaranews.com,2023-12-10,https://www.antaranews.com/berita/3864885/gibr...,netral,gibran sambut said aqil kunjung pondok pesantr...


## 3.3 Data Engineering

**Cleaning irrelevant news**

In [28]:
df= df[df['olahan_teks'].str.contains('prabowo|gibran', case=False, na=False)]

**Preparing a dataset for tableau**

In [33]:
# splitting the text into words and associating each word with its publication date
word_date_pairs = [(word, date) for text, date in zip(df['olahan_teks'], df['tanggal_publikasi']) for word in text.split()]

# counting the frequency of each word-date pair
word_date_freq = Counter(word_date_pairs)

# creating a new DataFrame from the word-date-frequency data
word_freq_date_df = pd.DataFrame([(word, date, freq) for (word, date), freq in word_date_freq.items()], columns=['Word', 'Date', 'Frequency'])

# displaying the first few rows of the new DataFrame
word_freq_date_df.to_csv('kata_tanggal.csv', index=False)

**Saving files**

In [35]:
df.to_csv('./media_datasets/media_sentiment_analysis.csv', index=False)

# IV. Conclusion

The sentiment analysis and text processing phases proceeded seamlessly, yielding favorable results. A slight data engineering step was undertaken towards the end to maintain data relevance by filtering out entries that did not discuss or contain the keywords related to Prabowo and Gibran. The final result dataset has been saved, and a dedicated dataset tailored for generating word clouds has also been created.