# Twitter Sentiment Analysis


*Notes : This research is prepared to crawl tweets from Twitter for sentiment analysis regarding Gibran-Prabowo as the presidential and vice-presidential candidates in the 2024 Indonesian election.*

*This notebook is prepared to add extra informations into the existing dataset, such as the amount of the followers and the verification status of the accounts*

*Prepared by* : **Achmad Dhani & Faris Arief Mawardi**

## I. Introduction

**Background:**

The sentiment analysis aims to gauge public opinion and feelings expressed on social media platforms, specifically Twitter, regarding these candidates' current political alliance and their active participation in the ongoing election campaign, leading up to the presidential and vice-presidential election scheduled for February 2024.

### 5W1H Key Factors:

**Who:**
- Prabowo Subianto and Gibran Rakabuming Raka, potential presidential and vice-presidential candidates.
- Twitter users expressing opinions and sentiments about these potential candidates.

**What:**
- Sentiment analysis on Twitter data discussing the alliance and active participataion of Prabowo and Gibran in the 2024 Indonesian election.
- Gathering tweets, analyzing the sentiment expressed, and understanding public opinion regarding this political alliance.

**When:**
- During the campaign leading up to the 2024 Indonesian election.
- Period of data collection is November 19th - December 17th of 2023.

**Where:**
- Twitter platform, particularly tweets written in Bahasa Indonesia, discussing #PrabowoGibran2024 or related hashtags.
- Focus might extend to specific regions within Indonesia where opinions may vary.

**Why:**
- To understand public sentiment, feelings, and opinions toward the candidacy of Gibran and Prabowo.
- To provide insights into the potential political alliance's reception among the electorate.

**How:**
- Collecting tweets related to #PrabowoGibran2024 and conducting sentiment analysis.
- Using Natural Language Processing (NLP) techniques to analyze the sentiment of tweets.
- Aggregating data, processing, and interpreting sentiment scores to derive insights.


**Problem Statement:**

Analyzing the sentiment polarity and intensity of Twitter discussions surrounding the Gibran-Prabowo alliance in preparation for the 2024 Indonesian election. The objective is to comprehend how public sentiment might impact their candidacy and overall political prospects as they actively engage in the ongoing election campaign leading up to the presidential and vice-presidential election scheduled for February 2024.

# II. Import Libraries and Packages

**Install Selenium**

In [1]:
pip install selenium

Note: you may need to restart the kernel to use updated packages.


**Import Libraries**

In [2]:
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from time import sleep
import pandas as pd

# III. Loading Data

In this section, we will load the datasets obtained from the scraping process done by using tweet harvest, crawler model built by : 
*Helmi Satria (helmisatria.com)*
*Notebook Source :* [GoogleColab](https://colab.research.google.com/drive/1f0dsbESPorxvS4CdFJF-FYW9c63_szg_#scrollTo=4UIL1x21P9rQ)

In [5]:
data = pd.read_csv('Prabowo-Gibran_15112023.csv', delimiter=";")
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url
0,Wed Nov 15 23:59:49 +0000 2023,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...
1,Wed Nov 15 23:59:18 +0000 2023,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...
2,Wed Nov 15 23:59:00 +0000 2023,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...
3,Wed Nov 15 23:58:29 +0000 2023,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...
4,Wed Nov 15 23:56:01 +0000 2023,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...
...,...,...,...,...,...,...,...,...,...,...,...,...
91,Wed Nov 15 20:26:52 +0000 2023,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...
92,Wed Nov 15 20:21:01 +0000 2023,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...
93,Wed Nov 15 20:19:20 +0000 2023,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...
94,Wed Nov 15 20:15:23 +0000 2023,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...


In [191]:
data = data.sample(520)
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url
140,Wed Nov 22 18:43:59 +0000 2023,1727397528935743760,@DedynurPalakka @prabowo @gibran_tweet Keren ...👍,0,0,0,1,hu,1580291944324141056,1727314324778291430,lahedom99,https://twitter.com/lahedom99/status/172739752...
352,Wed Nov 22 14:47:39 +0000 2023,1727338054573252665,Oktober-3 November 2023. Elektabilitas pasanga...,0,0,0,0,in,1623634307125510150,1727337734443032817,fachriman13,https://twitter.com/fachriman13/status/1727338...
372,Wed Nov 22 14:46:22 +0000 2023,1727337734443032817,Survei: Kritik Dinasti Politik Tak Berdampak t...,0,1,0,2,in,1623634307125510150,1727337734443032817,fachriman13,https://twitter.com/fachriman13/status/1727337...
569,Wed Nov 22 14:19:09 +0000 2023,1727330881633935541,"Jiwa muda tak kenal lelah, bersama Prabowo - G...",0,0,0,0,in,1676789299784843264,1727330881633935541,SRedhina63911,https://twitter.com/SRedhina63911/status/17273...
679,Wed Nov 22 13:38:37 +0000 2023,1727320682353357205,Prabowo Gibran terus melonjak dan mendominasi ...,0,145,1,166,in,1051356727793217537,1727320682353357205,gibran_gen,https://twitter.com/gibran_gen/status/17273206...
...,...,...,...,...,...,...,...,...,...,...,...,...
430,Wed Nov 22 14:34:02 +0000 2023,1727334630641295623,"Dalam sambutannya, AHY sempat memaparkan hasil...",0,1,0,0,in,909824927695589376,1727334616187699528,golbachgoals,https://twitter.com/golbachgoals/status/172733...
475,Wed Nov 22 14:26:27 +0000 2023,1727332722283258339,"Bersama Prabowo - Gibran, jiwa muda kita menja...",0,0,0,0,in,1682274873375346688,1727332722283258339,NurmilaYay38309,https://twitter.com/NurmilaYay38309/status/172...
332,Wed Nov 22 14:52:00 +0000 2023,1727339149580271977,@Guard31Tony @Bambang01250678 @Mone_fb @ErikaM...,0,1,0,0,in,1362017926304309248,1727314541476974815,OmpungGuru,https://twitter.com/OmpungGuru/status/17273391...
642,Wed Nov 22 14:07:44 +0000 2023,1727328011920519396,"Kunted, besok kau ngga ke istana ? Besok @joko...",0,0,0,1,in,1476461823465431040,1727328011920519396,Guard31Tony,https://twitter.com/Guard31Tony/status/1727328...


# IV. Data Exploration

**4.1 Converting Feature Created_at into datetime Format**

In [253]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 477 entries, 0 to 476
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   created_at           477 non-null    object
 1   id_str               477 non-null    int64 
 2   full_text            477 non-null    object
 3   quote_count          477 non-null    int64 
 4   reply_count          477 non-null    int64 
 5   retweet_count        477 non-null    int64 
 6   favorite_count       477 non-null    int64 
 7   lang                 477 non-null    object
 8   user_id_str          477 non-null    int64 
 9   conversation_id_str  477 non-null    int64 
 10  username             477 non-null    object
 11  tweet_url            477 non-null    object
dtypes: int64(7), object(5)
memory usage: 44.8+ KB


In [6]:
# Converting the 'created_at' column to datetime format
data['created_at'] = pd.to_datetime(data['created_at'])

# Sorting the DataFrame based on the 'created_at' column in descending order
data.sort_values(by='created_at', ascending=False, inplace=True)

# Displaying the DataFrame with the 'created_at' column cleaned and sorted
data

  data['created_at'] = pd.to_datetime(data['created_at'])


Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url
0,2023-11-15 23:59:49+00:00,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...
1,2023-11-15 23:59:18+00:00,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...
2,2023-11-15 23:59:00+00:00,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...
3,2023-11-15 23:58:29+00:00,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...
4,2023-11-15 23:56:01+00:00,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...
...,...,...,...,...,...,...,...,...,...,...,...,...
91,2023-11-15 20:26:52+00:00,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...
92,2023-11-15 20:21:01+00:00,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...
93,2023-11-15 20:19:20+00:00,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...
94,2023-11-15 20:15:23+00:00,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...


**4.2 Extracting The Followers and Account Verified Info from Twitter**

In [7]:
# Import libraries
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException  # Import NoSuchElementException
import time

# Initialize WebDriver
driver = webdriver.Edge()
driver.get("https://twitter.com/i/flow/login")
# Setup the login
time.sleep(5)
username = driver.find_element(By.XPATH,"//input[@name='text']")
username.send_keys("fariskoms")
next_button = driver.find_element(By.XPATH,"//span[contains(text(),'Next')]")
next_button.click()

time.sleep(5)
password = driver.find_element(By.XPATH,"//input[@name='password']")
password.send_keys('6Juli1996!')
log_in = driver.find_element(By.XPATH,"//span[contains(text(),'Log in')]")
log_in.click()

# Wait for login and directly navigate to the "latest" tweets search page
time.sleep(5)  # Wait for login process to complete

# Lists to store verified status and follower counts
VerifiedStatus = []
FollowersCount = []

# Function to scrape profile information
def scrape_profile_info(username):
    driver.get(f"https://twitter.com/{username}")
    time.sleep(5)
    
    # Initialize verified_status and follower_count
    verified_status = "Not Verified"
    follower_count = "Follower Count Not Found"
    
    try:
        # Check if the account is verified
        verified_element = driver.find_element(By.XPATH, "/html/body/div[1]/div/div/div[2]/main/div/div/div/div/div/div[3]/div/div/div/div/div[2]/div[1]/div/div[1]/div/div/span/span[2]/span/span/div/div/svg")
        verified_status = "Verified"
    except NoSuchElementException:
        pass
    
    try:
        # Get the follower count
        follower_element = driver.find_element(By.XPATH, "/html/body/div[1]/div/div/div[2]/main/div/div/div/div/div/div[3]/div/div/div/div/div[5]/div[2]/a/span[1]/span")
        follower_count = follower_element.text
    except NoSuchElementException:
        pass
    
    return verified_status, follower_count

# Iterate through each username in the DataFrame
for username in data['username']:
    verified_status, follower_count = scrape_profile_info(username)
    VerifiedStatus.append(verified_status)
    FollowersCount.append(follower_count)

# Add the verified status and follower count columns to the DataFrame
data['VerifiedStatus'] = VerifiedStatus
data['FollowersCount'] = FollowersCount

# Display the updated DataFrame
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url,VerifiedStatus,FollowersCount
0,2023-11-15 23:59:49+00:00,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...,Not Verified,Follower Count Not Found
1,2023-11-15 23:59:18+00:00,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...,Not Verified,10.5K
2,2023-11-15 23:59:00+00:00,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...,Not Verified,2306
3,2023-11-15 23:58:29+00:00,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...,Not Verified,Follower Count Not Found
4,2023-11-15 23:56:01+00:00,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...,Not Verified,12.2K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,2023-11-15 20:26:52+00:00,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...,Not Verified,30
92,2023-11-15 20:21:01+00:00,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...,Not Verified,Follower Count Not Found
93,2023-11-15 20:19:20+00:00,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...,Not Verified,72
94,2023-11-15 20:15:23+00:00,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...,Not Verified,15


**4.3 Predicting the Sentiments using indonesian-roberta-base-sentiment-classifier**

In [12]:
import warnings
warnings.filterwarnings("ignore")

In [18]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [9]:
from transformers import pipeline
import pandas as pd




In [10]:
pretrained_name = "w11wo/indonesian-roberta-base-sentiment-classifier"

nlp = pipeline(
    "sentiment-analysis",
    model=pretrained_name,
    tokenizer=pretrained_name
)




Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [11]:
def sentiment_result(text):
    result= nlp(text)
    if result[0]['label'] == 'positive':
        return 'positif'
    elif result[0]['label'] == 'negative':
        return 'negatif'
    else:
        return 'netral'

In [12]:
test = nlp('saya sedih sekali hari ini')

In [13]:
test[0]['label']

'negative'

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype              
---  ------               --------------  -----              
 0   created_at           96 non-null     datetime64[ns, UTC]
 1   id_str               96 non-null     int64              
 2   full_text            96 non-null     object             
 3   quote_count          96 non-null     int64              
 4   reply_count          96 non-null     int64              
 5   retweet_count        96 non-null     int64              
 6   favorite_count       96 non-null     int64              
 7   lang                 96 non-null     object             
 8   user_id_str          96 non-null     int64              
 9   conversation_id_str  96 non-null     int64              
 10  username             96 non-null     object             
 11  tweet_url            96 non-null     object             
 12  VerifiedStatus       96 

In [15]:
data['sentiment_label']= data['full_text'].apply(sentiment_result)

In [16]:
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url,VerifiedStatus,FollowersCount,sentiment_label
0,2023-11-15 23:59:49+00:00,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...,Not Verified,Follower Count Not Found,netral
1,2023-11-15 23:59:18+00:00,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...,Not Verified,10.5K,negatif
2,2023-11-15 23:59:00+00:00,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...,Not Verified,2306,netral
3,2023-11-15 23:58:29+00:00,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...,Not Verified,Follower Count Not Found,netral
4,2023-11-15 23:56:01+00:00,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...,Not Verified,12.2K,positif
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,2023-11-15 20:26:52+00:00,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...,Not Verified,30,netral
92,2023-11-15 20:21:01+00:00,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...,Not Verified,Follower Count Not Found,positif
93,2023-11-15 20:19:20+00:00,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...,Not Verified,72,negatif
94,2023-11-15 20:15:23+00:00,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...,Not Verified,15,netral


In [17]:
data['sentiment_label'].value_counts()

sentiment_label
netral     40
negatif    39
positif    17
Name: count, dtype: int64

In [18]:
# Fungsi untuk mengonversi label sentimen menjadi nilai numerik
def convert_sentiment_to_score(sentiment_label):
    if sentiment_label == 'positif':
        return 1
    elif sentiment_label == 'netral':
        return 0
    elif sentiment_label == 'negatif':
        return -1
    else:
        return None  # Handle nilai-nilai sentimen lain jika ada

# Menambahkan kolom baru 'sentiment_score' dengan nilai berdasarkan label sentimen
data['sentiment_score'] = data['sentiment_label'].apply(convert_sentiment_to_score)

# Menampilkan DataFrame dengan kolom baru 'sentiment_score'
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url,VerifiedStatus,FollowersCount,sentiment_label,sentiment_score
0,2023-11-15 23:59:49+00:00,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...,Not Verified,Follower Count Not Found,netral,0
1,2023-11-15 23:59:18+00:00,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...,Not Verified,10.5K,negatif,-1
2,2023-11-15 23:59:00+00:00,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...,Not Verified,2306,netral,0
3,2023-11-15 23:58:29+00:00,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...,Not Verified,Follower Count Not Found,netral,0
4,2023-11-15 23:56:01+00:00,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...,Not Verified,12.2K,positif,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,2023-11-15 20:26:52+00:00,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...,Not Verified,30,netral,0
92,2023-11-15 20:21:01+00:00,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...,Not Verified,Follower Count Not Found,positif,1
93,2023-11-15 20:19:20+00:00,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...,Not Verified,72,negatif,-1
94,2023-11-15 20:15:23+00:00,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...,Not Verified,15,netral,0


In [19]:
# Mencari dan mengekstrak hashtag dari kolom full_text
data['hashtag'] = data['full_text'].str.findall(r'#\w+')

In [20]:
data

Unnamed: 0,created_at,id_str,full_text,quote_count,reply_count,retweet_count,favorite_count,lang,user_id_str,conversation_id_str,username,tweet_url,VerifiedStatus,FollowersCount,sentiment_label,sentiment_score,hashtag
0,2023-11-15 23:59:49+00:00,1724940299250827575,@xquitavee @prabowo @gibran_tweet @psi_id @jok...,0,0,0,0,in,1420228891994517507,1724591423444713876,AmirMah36541437,https://twitter.com/AmirMah36541437/status/172...,Not Verified,Follower Count Not Found,netral,0,[]
1,2023-11-15 23:59:18+00:00,1724940167767785893,@yehovarapha_ Boleh gak mutualan sama pendukun...,0,0,0,0,in,781527721,1724770699737444761,numadayana,https://twitter.com/numadayana/status/17249401...,Not Verified,10.5K,negatif,-1,[]
2,2023-11-15 23:59:00+00:00,1724940091058163837,Nurul Arifin Apresiasi Keputusan KPU Atas Pene...,0,0,49,11,in,721898354502930433,1724940091058163837,golkarpedia,https://twitter.com/golkarpedia/status/1724940...,Not Verified,2306,netral,0,"[#airlanggahartarto, #kuningkeren, #PrabowoGib..."
3,2023-11-15 23:58:29+00:00,1724939961080991996,@uludagerdi @hariqosatria @prabowo @gibran_twe...,0,0,0,0,in,891872532,1723518365321568366,hagj12,https://twitter.com/hagj12/status/172493996108...,Not Verified,Follower Count Not Found,netral,0,[]
4,2023-11-15 23:56:01+00:00,1724939341246697524,Saatnya Rakyat memilih untuk Indonesia....Prog...,0,2,9,22,in,1227499861580251136,1724939341246697524,Lembayung071,https://twitter.com/Lembayung071/status/172493...,Not Verified,12.2K,positif,1,[#PrabowoGibran2024]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,2023-11-15 20:26:52+00:00,1724886707932397694,Menurut Kalian Gimana Gaes! Tinggalkan Komenta...,0,0,0,0,in,1369326817643991045,1724886707932397694,evavpos,https://twitter.com/evavpos/status/17248867079...,Not Verified,30,netral,0,[]
92,2023-11-15 20:21:01+00:00,1724885233168945375,@dimaskerik @Uki23 tetap Jokowi dihati Prabow...,0,0,0,0,in,1689679655979323392,1724486254736412774,roy99siandong,https://twitter.com/roy99siandong/status/17248...,Not Verified,Follower Count Not Found,positif,1,[]
93,2023-11-15 20:19:20+00:00,1724884813054914584,@Projo_Pusat @prabowo @gibran_tweet @RcyberPro...,0,0,0,0,in,1860445802,1724733186205442358,Simbah_merapi,https://twitter.com/Simbah_merapi/status/17248...,Not Verified,72,negatif,-1,[]
94,2023-11-15 20:15:23+00:00,1724883819374031042,@MichelAdam7__ Antum buat dan upload Video ini...,0,0,0,0,in,829083735412781057,1724609333798068252,avivink_jhi,https://twitter.com/avivink_jhi/status/1724883...,Not Verified,15,netral,0,[#AM1NkanIndonesia]


In [282]:
data['FollowersCount'].value_counts()
# data['FollowersCount'].value_counts()

FollowersCount
Follower Count Not Found    36
2,306                        3
0                            2
9                            2
92                           2
41                           2
64                           2
8                            1
3,868                        1
5,110                        1
2,903                        1
2,412                        1
5                            1
788                          1
495                          1
24                           1
194                          1
2                            1
26                           1
108                          1
35                           1
1,115                        1
256                          1
4.7M                         1
1,678                        1
30                           1
72                           1
28                           1
742                          1
688                          1
577                          1
12.2K                   

In [283]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype              
---  ------               --------------  -----              
 0   created_at           96 non-null     datetime64[ns, UTC]
 1   id_str               96 non-null     int64              
 2   full_text            96 non-null     object             
 3   quote_count          96 non-null     int64              
 4   reply_count          96 non-null     int64              
 5   retweet_count        96 non-null     int64              
 6   favorite_count       96 non-null     int64              
 7   lang                 96 non-null     object             
 8   user_id_str          96 non-null     int64              
 9   conversation_id_str  96 non-null     int64              
 10  username             96 non-null     object             
 11  tweet_url            96 non-null     object             
 12  VerifiedStatus       96 

**Export Dataset**

In [272]:
# Export dataset into csv
data.to_csv('cleaned_15112023.csv', index=False)

# V. Conclusions 

1. The Sentiment Analysis Prediction has been successfully executed and is ready for implementation in the case.
2. Advantages of using tweet harvest scraping include:
    - Faster and more efficient data scraping process.
    - Ability to gather a larger volume of data within a shorter timeframe.
3. Disadvantages include:
    - Inability to retrieve information from Verified Accounts, which could indicate user influence or a private user status.
    - Inability to retrieve user follower counts without registering for Twitter Developer options, a process that may take up to two weeks to obtain consumer keys and authorizations.