This notebook applies a sentiment score to review sentences and marks their polarity

In [1]:
!pip install vaderSentiment




[notice] A new release of pip is available: 25.0 -> 25.1.1
[notice] To update, run: C:\Users\nkash\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json, os

IN_FILE   = 'data/processed/cleaned.json'
OUT_FILE  = 'data/processed/phrases.json'
os.makedirs(os.path.dirname(OUT_FILE), exist_ok=True)

analyzer = SentimentIntensityAnalyzer()

In [3]:
# Load cleaned data
with open(IN_FILE, 'r',encoding='utf-8') as f:
    data = json.load(f)

# For each review, filter sentences by VADER score
for entry in data:
    phrases = []
    for sent in entry.get('sentences', []):
        score = analyzer.polarity_scores(sent)['compound']
        if abs(score) > 0.4:            # threshold for “strong” sentiment
            polarity = 'positive' if score > 0 else 'negative'
        else: 
            polarity='neutral'
        phrases.append({'text': sent, 'polarity': polarity})
    entry['phrases'] = phrases

# Write data to JSON file
with open(OUT_FILE, 'w',encoding='utf-8') as f:
    json.dump(data, f, indent=2)

print(f"Extracted phrases for {len(data)} reviews → {OUT_FILE}")
# Preview a few examples
print(data[0]['phrases'][:5])  


Extracted phrases for 398639 reviews → data/processed/phrases.json
[{'text': 'a friend gave me this book, and said you gotta read this.', 'polarity': 'positive'}, {'text': 'and you know what, im really happy she did, because i learned a lot from dispatches.', 'polarity': 'positive'}, {'text': 'harden does a great job of weaving different short stories, each with a unique slant and look at african life.', 'polarity': 'positive'}, {'text': 'both entertaining and educational, im now fascinated with africa and ready to read more!', 'polarity': 'positive'}]
