![Pictures/Twitter-Sentiment-Analysis.png](Pictures/Twitter-Sentiment-Analysis.png)

# <span style='color:dodgerBlue; font-weight:bold;'> Twitter Sentiment Analysis </span>

### <span style='color:aqua; font-weight:bold;'> Project Context: </span>

In this project, we aim to explore the sentiments and opinions of people in Gaza, as expressed on social media, specifically Twitter. The focus is to gather real-time tweets, process and clean the data, and perform sentiment analysis. The insights gained can help better understand the prevailing mood, attitudes, and general discourse around specific topics related to Gaza, including social, political, and humanitarian issues.

<b>The project involves the use of the following technologies:</b>

* **Twitter Scraping**: Selenium is used to scrape real-time tweets, targeting specific hashtags and topics, due to limitations with paid APIs.
* **Data Cleaning**: Python is employed to clean the scraped data by removing unwanted symbols, URLs, and filtering the text.
* **Sentiment Analysis and LLM Integration**: Groq’s LLM, particularly the llama3-70b-8192 model, is integrated to conduct sentiment analysis and provide real-time question-answering capabilities on the collected tweets, enabling a deeper understanding of the data.

The goal is to provide meaningful insights into the feelings and reactions of individuals in Gaza based on public social media posts, which could support researchers, humanitarian organizations, and policymakers in understanding the sentiment landscape.

<b>Key Components:</b>

1. **Data Collection**: Tweets are scraped using Selenium, focusing on specific hashtags and topics.
2. **Data Cleaning**: Removing unwanted symbols, URLs, and filtering English texts to ensure clean and usable data.
3. **Sentiment Analysis & LLM Integration**: Using Groq’s LLM to perform sentiment analysis and answer questions about the data simultaneously.
4. **Insights**: The resulting sentiment analysis provides critical insights into public opinions, which could support researchers and humanitarian organizations.


### <span style='color:aqua; font-weight:bold;'> Import Packages </span>

In [18]:
# Essentials
import numpy as np
import pandas as pd
import time

# Web Scraping
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

# Text Preprocessing
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# LLM
from groq import Groq

In [19]:
#nltk.download('stopwords')
#nltk.download('punkt')

# <span style='color:dodgerblue; font-weight:bold;'>Web Scraping</span>

<b>Why Selenium?</b>

Selenium is a powerful tool for web scraping dynamic content, especially when data is loaded through JavaScript and cannot be accessed through static HTML alone. In my project, Twitter’s dynamic page content requires interaction with a live webpage, making Selenium ideal for automating the browser to extract tweets directly. It allows for flexible handling of infinite scroll and JavaScript-rendered content, which other scraping tools might miss.

<b>Note:</b> Make sure that your google chrome and chromedriver are at the same versions to avoid errors.

### <span style='color:aqua; font-weight:bold;'> Set up WebDriver </span>

In [20]:
# Read credentials from a text file
with open('credentials.txt', 'r') as file:
    credentials = file.readlines()
    username = credentials[0].strip()
    password = credentials[1].strip()

In [21]:
# Specify the path to your ChromeDriver
driver_path = 'C:\\Users\\ppc\\Desktop\\chromedriver-win64\\chromedriver.exe'  # path to chromedriver
login_url = 'https://x.com/login'
# Chose a term to search (english tweet only)
search_term = 'gaza'
encoded_search_term = search_term.replace(' ', '%20').replace('#', '%23')
search_url = f'https://x.com/search?f=top&q={encoded_search_term}%20({encoded_search_term})%20lang%3Aen&src=typed_query'

# Set up the WebDriver
service = Service(driver_path)
driver = webdriver.Chrome(service=service)

# Open the Twitter login page
driver.get(login_url)
time.sleep(5)  # Give it time to load

# Locate the username and password input fields
username_input = driver.find_element(By.NAME, "text")
username_input.send_keys(username)
username_input.send_keys(Keys.RETURN)  # Press Enter after typing the username

time.sleep(5)  # Wait for password page to load

# Locate the password input field
password_input = driver.find_element(By.NAME, "password")
password_input.send_keys(password)
password_input.send_keys(Keys.RETURN)  # Press Enter after typing the password

# Wait for login to complete
time.sleep(5)

# Navigate to the search page
driver.get(search_url)

### <span style='color:aqua; font-weight:bold;'> Scraping tweets </span>

In [22]:
# Give the page time to load
time.sleep(20)

# Define a function to scrape tweets
def get_tweets():
    tweets = driver.find_elements(By.CSS_SELECTOR, 'div[data-testid="tweetText"]')
    return [tweet.text for tweet in tweets]

# Limit for the number of tweets to collect
tweet_limit = 25
tweets = []

# Keep track of attempts when no new tweets are loaded
max_stagnation_attempts = 5
stagnation_count = 0

In [23]:
# Infinite scrolling and scraping loop
while len(tweets) < tweet_limit and stagnation_count < max_stagnation_attempts:
    # Scroll to the bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
    # Wait for new tweets to load
    time.sleep(3)  # You can increase this if the internet connection is slow

    # Get the new set of tweets after scrolling
    new_tweets = get_tweets()
    
    # Append only unique tweets to avoid duplicates
    initial_tweet_count = len(tweets)
    
    for tweet in new_tweets:
        if tweet not in tweets:
            tweets.append(tweet)
        if len(tweets) >= tweet_limit:
            break  # Exit the loop if we reach the tweet limit

    # Check if new tweets were added
    if len(tweets) > initial_tweet_count:
        stagnation_count = 0  # Reset stagnation count if new tweets are found
    else:
        stagnation_count += 1  # Increment stagnation count if no new tweets are loaded
    
    print(f"Collected {len(tweets)} tweets so far...")

# Print all gathered tweets
for tweet in tweets:
    print(tweet)

# Close the browser when done
driver.quit()

Collected 5 tweets so far...
Collected 5 tweets so far...
Collected 5 tweets so far...
Collected 5 tweets so far...
Collected 5 tweets so far...
Collected 8 tweets so far...
Collected 15 tweets so far...
Collected 20 tweets so far...
Collected 20 tweets so far...
Collected 20 tweets so far...
Collected 20 tweets so far...
Collected 20 tweets so far...
Collected 20 tweets so far...
#BREAKING An Israeli military helicopter crashed in Rafah, Gaza last night; reportedly resulting in the deaths of three Israeli personnel and injuries to eight others.
22-year-old Amit Levi was murdered by Hamas at the Nova Music Festival on October 7th.

Her twin sister Shani, who was by her side in her last moments, survived the attack.

Just a reminder in case people start forgetting why the IDF is in the Gaza Strip
Will followers of Allah appreciate this dance for Gaza?
They were peacefully sleeping in their tents, now their bodies disappeared into the sand. 

lsraeli warplanes target a tent camp to the w

# <span style='color:dodgerblue; font-weight:bold;'>Text Cleaning</span>

Process the tweets to remove unnecessary elements like URLs, mentions, hashtags, and other noise.

In [24]:
def clean_tweet(tweet):
    """
    Clean a tweet by removing URLs, mentions, hashtags, special characters, numbers,
    and punctuation. Converts text to lowercase, removes stop words, and applies stemming.
    """
    # Initialize stopwords and stemmer
    stop_words = set(stopwords.words('english'))
    stemmer = PorterStemmer()
    
    # Remove URLs
    tweet = re.sub(r'http\S+', '', tweet)
    # Remove mentions and hashtags
    tweet = re.sub(r'@\w+|#\w+', '', tweet)
    # Remove special characters, numbers, and punctuation
    tweet = re.sub(r'[^A-Za-z\s]+', '', tweet)
    # Convert to lowercase
    tweet = tweet.lower()
    # Remove stop words
    tweet_tokens = tweet.split()
    tweet = ' '.join([word for word in tweet_tokens if word not in stop_words])
    # Apply stemming
    tweet = ' '.join([stemmer.stem(word) for word in tweet.split()])
    # Remove extra spaces
    tweet = re.sub(r'\s+', ' ', tweet).strip()
    
    return tweet

Arabic Version of **clean_tweet** Function

**Note:** I'm not going to use this function in my case, because I scraped English tweets only.

In [25]:
"""
from nltk.stem.isri import ISRIStemmer

def clean_arabic_tweet(tweet):
    '''
    Clean an Arabic tweet by removing URLs, mentions, hashtags, special characters, numbers,
    and punctuation. Converts text to lowercase, removes Arabic stop words, and applies stemming.
    '''
    # Initialize Arabic stopwords and stemmer
    arabic_stop_words = set(stopwords.words('arabic'))
    stemmer = ISRIStemmer()
    
    # Remove URLs
    tweet = re.sub(r'http\S+', '', tweet)
    # Remove mentions and hashtags
    tweet = re.sub(r'@\w+|#\w+', '', tweet)
    # Remove non-Arabic characters, numbers, and punctuation
    tweet = re.sub(r'[^\u0600-\u06FF\s]+', '', tweet)  # This keeps only Arabic letters
    # Remove extra spaces
    tweet = re.sub(r'\s+', ' ', tweet).strip()
    
    # Split the tweet into tokens
    tweet_tokens = tweet.split()
    
    # Remove Arabic stop words
    tweet = ' '.join([word for word in tweet_tokens if word not in arabic_stop_words])
    
    # Apply stemming
    tweet = ' '.join([stemmer.stem(word) for word in tweet.split()])
    
    return tweet
""";

In [26]:
# Clean your collected tweets
cleaned_tweets = [clean_tweet(tweet) for tweet in tweets]

# Print cleaned tweets
for tweet in cleaned_tweets:
    print(tweet)

isra militari helicopt crash rafah gaza last night reportedli result death three isra personnel injuri eight other
yearold amit levi murder hama nova music festiv octob th twin sister shani side last moment surviv attack remind case peopl start forget idf gaza strip
follow allah appreci danc gaza
peac sleep tent bodi disappear sand lsraeli warplan target tent camp west khan yuni five missil kill tear apart displac palestinian tent along
could parent sibl boyfriend girlfriend best friend isra held hostag gaza dont look away
america actresssing zendaya use massiv follow help rais money children gaza palestin children relief fund rais stagger humanitarian aid victim gaza
celebr birthday perfect place orphan camp perfect peopleel elna elak team team work daili tirelessli without hesit serv peopl gaza strip condit call action make sure visit link bio
children gaza vs children israel see differ
turkey expos turkey threaten invad israel stop attack gaza howev realiti turkey send oil israel mo

### <span style='color:aqua; font-weight:bold;'> Save the DataFrame to a CSV file </span>

In [27]:
# Save cleaned tweets into a DataFrame
df = pd.DataFrame(cleaned_tweets, columns=["tweet"])

# Save the DataFrame to a CSV file
df.to_csv('cleaned_tweets.csv', index=False)

# <span style='color:dodgerblue; font-weight:bold;'>Sentiment Analysis</span>

we used the **GroqCloud Llama-3 LLM (Large Language Model)** to perform sentiment analysis on the tweets we scraped and cleaned in previous steps. By leveraging the LLM, we aimed to gain deeper insights into how people feel about the situation in Gaza based on their tweets.

We passed the cleaned tweets to the model and asked it to analyze the emotional tone and sentiment of the text. The model then provided a breakdown of various emotions (e.g.,fear, sadness, anger, hope) and categorized the tweets into **negative**, **neutral**, and **positive** sentiments.

This method helped us extract valuable insights into how individuals are reacting emotionally to the current situation, with the model summarizing overall sentiments and key themes.

**Note:** You can get your API Key from [Groq Website](https://groq.com/).

In [28]:
# Initialize the client with your API key
client = Groq(api_key="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")

# Combine cleaned tweets into a single string
tweets_text = "\n".join(cleaned_tweets)

# Construct a prompt for the LLM
prompt = f"""Here are some recent tweets about Gaza:
{tweets_text}

Based on the tweets, how do the people of Gaza feel right now? Please perform a sentiment analysis and provide insights."""

# Send the prompt to the LLM for sentiment analysis
completion = client.chat.completions.create(
    model="llama3-70b-8192",  # I have used llama3-70b model, there is another models as well.
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ],
    temperature=1,
    max_tokens=1024,
    top_p=1,
    stream=True,
    stop=None,
)

# Print the output from the LLM
for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

After analyzing the tweets, I can provide some insights into the sentiment and feelings of the people of Gaza:

**Overwhelming sense of fear and concern**: Many tweets express fear, concern, and empathy for the people of Gaza, highlighting the devastating effects of war, displacement, and humanitarian crises.

**Anger and frustration**: There is a strong sense of anger and frustration towards the Israeli government and military, with many tweets accusing them of war crimes, human rights violations, and brutality.

**Sense of injustice and oppression**: Tweets express a sense of injustice and oppression, with many feeling that the international community is not doing enough to stop the violence and protect Palestinian civilians.

**Grief and mourning**: There are several tweets that express grief and mourning for the victims of violence, including children and civilians, and pay tribute to those who have lost their lives.

**Solidarity and support**: Many tweets offer support and solida

# <span style='color:dodgerblue; font-weight:bold;'>Project Overview</span>

The primary goal of this project is to collect and analyze tweets related to Gaza, using natural language processing (NLP) techniques to extract meaningful insights about the sentiment and emotions expressed by users. Given the limitations of paid APIs, I opted to use Selenium for web scraping, as it is a free and robust tool for collecting real-time tweets. Specifically, I collected 20 tweets related to Gaza for the sentiment analysis task.

After successfully scraping the tweets, the next step involved cleaning the data to ensure it was suitable for sentiment analysis. This was achieved by removing unwanted elements such as URLs, mentions, hashtags, and special characters, converting text to lowercase, removing stop words, and applying stemming techniques. Additionally, I implemented a separate cleaning function tailored for Arabic tweets to handle multilingual data.

Once the tweets were cleaned, they were stored in a CSV file for easy reuse in future analyses or related tasks. For sentiment analysis, I integrated the GroqCloud platform's Llama-3 model, a powerful large language model (LLM), to perform in-depth sentiment evaluation. This model was used to gain a deeper understanding of the emotional content of the tweets. Following the analysis, I posed a question to the LLM: "Based on the tweets, how do the people of Gaza feel right now?"

The LLM's response highlighted a complex sentiment landscape. The prevailing emotions among the people of Gaza and those concerned about the situation included fear, anger, and frustration due to the ongoing conflict. However, the analysis also revealed a sense of resilience, solidarity, and hope. The tweets frequently expressed a desire for peace, accountability for human rights violations, and a resolution to the conflict that honors the dignity and rights of all involved.
