# Part A: Single-Program Architecture Sentiment Analyzer

This notebook implements sentiment analysis using a **single-program architecture**.
The program reads a dataset, processes each line sequentially, classifies sentiment, and aggregates results.

**Architectural characteristics:**
- Simple, monolithic design
- Sequential processing
- Direct data flow
- Single pass through data

## 1) Configuration and Setup

In [None]:
import csv
import re
from collections import defaultdict
import matplotlib.pyplot as plt

# Dataset path
DATA_PATH = 'data/sample_us_posts.txt'
KEYWORDS_PATH = 'data/keywords.csv'

print(f'Using dataset: {DATA_PATH}')
print(f'Using keywords: {KEYWORDS_PATH}')

## 2) Load Keywords from CSV

In [None]:
def load_keywords(keywords_path):
    """Load positive and negative keywords from CSV file."""
    positive_keywords = set()
    negative_keywords = set()
    
    with open(keywords_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            keyword = row['keyword'].lower()
            sentiment = row['sentiment'].lower()
            
            if sentiment == 'positive':
                positive_keywords.add(keyword)
            elif sentiment == 'negative':
                negative_keywords.add(keyword)
    
    return positive_keywords, negative_keywords

# Load keywords
POS_KEYWORDS, NEG_KEYWORDS = load_keywords(KEYWORDS_PATH)
print(f'Positive keywords: {POS_KEYWORDS}')
print(f'Negative keywords: {NEG_KEYWORDS}')

## 3) Single-Program Sentiment Classifier

In [None]:
def tokenize(text):
    """Simple tokenization: letters and apostrophes only, case-insensitive."""
    # Extract words containing letters and apostrophes
    words = re.findall(r"[a-zA-Z']+", text.lower())
    return words

def classify_sentiment(text, pos_keywords, neg_keywords):
    """Classify sentiment of a text line based on keyword presence."""
    words = set(tokenize(text))
    
    has_positive = bool(words & pos_keywords)
    has_negative = bool(words & neg_keywords)
    
    if has_positive and has_negative:
        return 'Mixed'
    elif has_positive:
        return 'Positive'
    elif has_negative:
        return 'Negative'
    else:
        return 'Neutral'

# Test the classifier
test_cases = [
    "I am so happy today!",
    "I feel sad and depressed.",
    "I love this but I'm also upset.",
    "The weather is nice."
]

print("Testing classifier:")
for test in test_cases:
    result = classify_sentiment(test, POS_KEYWORDS, NEG_KEYWORDS)
    print(f'  "{test}" -> {result}')

## 4) Main Processing Function

In [None]:
def analyze_sentiment_file(file_path, pos_keywords, neg_keywords):
    """Single-program architecture: read file, classify each line, aggregate results."""
    counts = defaultdict(int)
    total_lines = 0
    
    print(f"Processing {file_path}...")
    
    # Single pass through the data
    with open(file_path, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if line:  # Skip empty lines
                sentiment = classify_sentiment(line, pos_keywords, neg_keywords)
                counts[sentiment] += 1
                total_lines += 1
                
                # Progress indicator for large files
                if line_num % 1000 == 0:
                    print(f"  Processed {line_num} lines...")
    
    print(f"Completed processing {total_lines} posts.")
    return dict(counts)

# Process the dataset
results = analyze_sentiment_file(DATA_PATH, POS_KEYWORDS, NEG_KEYWORDS)
print(f"\nResults: {results}")

## 5) Generate Output and Verdict

In [None]:
def generate_verdict(counts):
    """Generate verdict based on positive vs negative counts."""
    positive = counts.get('Positive', 0)
    negative = counts.get('Negative', 0)
    
    if positive > negative:
        return 'Happier'
    elif negative > positive:
        return 'Sadder'
    else:
        return 'Tied'

def print_results(counts):
    """Print results in the required format."""
    positive = counts.get('Positive', 0)
    negative = counts.get('Negative', 0)
    mixed = counts.get('Mixed', 0)
    neutral = counts.get('Neutral', 0)
    
    verdict = generate_verdict(counts)
    
    # Required output format
    print(f"Positive={positive} Negative={negative} Mixed={mixed} Neutral={neutral}")
    print(f"Verdict: {verdict}")
    
    return verdict

# Generate and print results
print("\n" + "="*50)
print("FINAL RESULTS")
print("="*50)
verdict = print_results(results)

## 6) Optional: Visualization

In [None]:
def create_bar_chart(counts, title="Sentiment Analysis Results"):
    """Create a bar chart of sentiment counts."""
    labels = ['Positive', 'Negative', 'Mixed', 'Neutral']
    values = [counts.get(label, 0) for label in labels]
    colors = ['green', 'red', 'orange', 'gray']
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(labels, values, color=colors, alpha=0.7)
    
    # Add value labels on bars
    for bar, value in zip(bars, values):
        if value > 0:
            plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                    str(value), ha='center', va='bottom', fontweight='bold')
    
    plt.title(title, fontsize=14, fontweight='bold')
    plt.xlabel('Sentiment Category', fontweight='bold')
    plt.ylabel('Number of Posts', fontweight='bold')
    plt.grid(axis='y', alpha=0.3)
    
    # Add total count
    total = sum(values)
    plt.text(0.02, 0.95, f'Total Posts: {total}', transform=plt.gca().transAxes,
             bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))
    
    plt.tight_layout()
    plt.show()

# Create visualization
create_bar_chart(results, f"Sentiment Analysis - {DATA_PATH} (Single-Program Architecture)")

## 7) Test with Different Datasets

In [None]:
# Test with mixed dataset if available
MIXED_DATA_PATH = 'data/sample_us_posts_mixed.txt'

try:
    print("\n" + "="*50)
    print(f"TESTING WITH: {MIXED_DATA_PATH}")
    print("="*50)
    
    mixed_results = analyze_sentiment_file(MIXED_DATA_PATH, POS_KEYWORDS, NEG_KEYWORDS)
    mixed_verdict = print_results(mixed_results)
    
    # Create chart for mixed dataset
    create_bar_chart(mixed_results, f"Sentiment Analysis - {MIXED_DATA_PATH} (Single-Program Architecture)")
    
except FileNotFoundError:
    print(f"Mixed dataset {MIXED_DATA_PATH} not found. Skipping...")

## 8) Architecture Summary

**Single-Program Architecture Characteristics:**

### Structure & Responsibilities
- **Monolithic design**: All functionality in one program
- **Sequential processing**: One line at a time, in order
- **Single responsibility**: Read → Classify → Aggregate → Output
- **Direct data flow**: Input file → Processing → Results

### Advantages
- **Simplicity**: Easy to understand and debug
- **Low overhead**: No coordination between components
- **Immediate results**: No intermediate storage needed
- **Memory efficient**: Processes one line at a time

### Limitations
- **No parallelism**: Cannot utilize multiple cores effectively
- **Scalability constraints**: Limited by single machine resources
- **Fault tolerance**: Single point of failure
- **Flexibility**: Harder to modify individual processing steps

### Use Cases
- Small to medium datasets (< 1GB)
- Development and prototyping
- Simple processing requirements
- Single-machine environments