<a href="https://colab.research.google.com/github/andrybrew/IHT-SEM1302-30Okt/blob/main/practice_material/002_sentiment_analysis_vader_english.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis: VADER (English)**

##**Importing required libraries**

In [None]:
# Import the necessary libraries
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import seaborn as sns

##**Importing Dataset**

In [None]:
# Fetching the dataset from GitHub
data_url = "https://raw.githubusercontent.com/andrybrew/IHT-SEM1302-30Okt/main/data/002_cbdc.csv"

# Using pandas read_csv function to load the data from the URL directly into a DataFrame
df_comment = pd.read_csv(data_url)

# Display the dataframe to check the imported data
df_comment

##**Data Preprocessing for Sentiment Analysis**

In [None]:
# Remove mentions entirely
df_comment['comment'] = df_comment['comment'].str.replace('@\S+', '', regex=True)

# Remove non-word characters except for spaces
df_comment['comment'] = df_comment['comment'].str.replace('[^\w\s]', '', regex=True)

# Convert to lowercase
df_comment['comment'] = df_comment['comment'].str.lower()

# Trim leading and trailing spaces and replace multiple spaces with a single space
df_comment['comment'] = df_comment['comment'].str.strip().str.replace('\s+', ' ', regex=True)

# Show preprocessed 'comment' column of df_comment
df_comment[['comment']]

## **Performing Sentiment Analysis**



In [None]:
# Download the corpus
nltk.download('vader_lexicon')

In [None]:
# Create a vader sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Create a list of sentences
sentences = df_comment['comment']

# Calculate the compound sentiment score of each sentence
scores = [sia.polarity_scores(sentence)['compound'] for sentence in sentences]

In [None]:
# Visualize the compound sentiment using seaborn
sns.distplot(x=scores)

In [None]:
# Store the sentiment scores in a new column 'sentiment_score' in df_comment
df_comment['sentiment_score'] = scores

In [None]:
# Define a function to categorize sentiment based on the score
def categorize_sentiment(score):
    if score > 0.05:
        return 'Positive'
    elif score < -0.05:
        return 'Negative'
    else:
        return 'Neutral'

# Applying the function to the sentiment_score column
df_comment['sentiment'] = df_comment['sentiment_score'].apply(categorize_sentiment)

In [None]:
# Show the result
df_comment[['comment', 'sentiment_score', 'sentiment']]

In [None]:
# Visualise the sentiment distribution
sns.countplot(x ='sentiment', data = df_comment)