# Task 2: Sentiment Analysis

---

##  Objective
Perform sentiment analysis on the cleaned reviews using **VADER**.

**Steps:**
1. Load Clean Data
2. Apply VADER Sentiment Scoring
3. Classify into Positive/Negative/Neutral
4. Save Results

**Output:** `data/processed/sentiment_results.csv`

---

In [1]:
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from pathlib import Path

# Ensure NLTK resources
nltk.download('vader_lexicon', quiet=True)

INPUT_FILE = Path("../data/clean/reviews_clean.csv")
OUTPUT_FILE = Path("../data/processed/sentiment_results.csv")
OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)

df = pd.read_csv(INPUT_FILE)
print(f"Loaded {len(df)} reviews.")

Loaded 1500 reviews.


## VADER Analysis

In [2]:
sia = SentimentIntensityAnalyzer()

def get_sentiment(text):
    if not isinstance(text, str):
        return 0.0, "Neutral"
    score = sia.polarity_scores(text)['compound']
    
    if score >= 0.05:
        label = "Positive"
    elif score <= -0.05:
        label = "Negative"
    else:
        label = "Neutral"
    return score, label

# Apply
df[['sentiment_score', 'sentiment_label']] = df['cleaned_text'].apply(
    lambda x: pd.Series(get_sentiment(x))
)

print("Sentiment Analysis Complete.")

Sentiment Analysis Complete.


## ðŸ“Š Preliminary Results

In [3]:
print(df['sentiment_label'].value_counts())

print("\nSample Data:")
print(df[['cleaned_text', 'sentiment_label', 'sentiment_score']].head())

sentiment_label
Positive    831
Negative    440
Neutral     229
Name: count, dtype: int64

Sample Data:
                                        cleaned_text sentiment_label  \
0  WHAT A USELESS APP! Transfers, wallet payments...        Negative   
1  Most of the time when I try to open the app, i...        Positive   
2  I use the Commercial Bank of Ethiopia mobile a...        Positive   
3  It is good app and really user friendly , but ...        Positive   
4  I love this app. really, but with some downsid...        Positive   

   sentiment_score  
0          -0.8279  
1           0.1779  
2           0.9637  
3           0.9296  
4           0.8945  


## ðŸ’¾ Save Results

In [4]:
df.to_csv(OUTPUT_FILE, index=False)
print(f"Saved to {OUTPUT_FILE}")

Saved to ..\data\processed\sentiment_results.csv
