## Sentiment & Tone Analyzer

## Problem Statement

<details>
<summary><strong>Problem Statement (click to expand)</strong></summary>

Organizations receive large volumes of unstructured textual feedback through sources such as customer reviews, surveys, support tickets, social media comments, and internal communications. Manually understanding the emotional intent behind this feedback is time-consuming, subjective, and does not scale.

Traditional keyword-based or rule-based sentiment analysis systems fail to capture:
- Subtle emotional cues and mixed sentiments
- Context-dependent tone such as urgency, formality, or dissatisfaction
- Negative intent expressed without explicit negative keywords

This project addresses these challenges by implementing a **Sentiment & Tone Analyzer powered by Large Language Models (LLMs)**. The system automatically classifies text into:
- **Overall sentiment** (Positive, Negative, Neutral)
- **Dominant tone** (e.g., Joyful, Angry, Formal, Urgent, Derogatory)

The solution processes feedback in bulk, produces structured and explainable results, and is designed as a **standalone, production-ready analytics component** that can be integrated into customer experience platforms, CRM systems, and monitoring dashboards.

By leveraging Gen AI with strict JSON output enforcement, the system ensures accuracy, consistency, and scalability for real-world sentiment intelligence use cases.

</details>


## Architecture


<details>
<summary><strong>System Architecture & Workflow (click to expand)</strong></summary>

```text
┌───────────────────────────────────────────────┐
│                 USER INTERFACE                │
│         Streamlit Web Application UI          │
│   - Upload Excel file / Run Demo Sample       │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│               DATA INGESTION                  │
│   - Reads Excel (.xlsx) files                 │
│   - Requires 'review_text' column             │
│   - Supports batch text processing            │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│           PROMPT CONSTRUCTION LAYER           │
│   - Injects each review into prompt           │
│   - Defines sentiment & tone rules            │
│   - Enforces JSON-only output                 │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│        GEN AI ANALYSIS ENGINE (LLM)           │
│                                               │
│  - Model: Gemini 2.5 Pro via OpenRouter       │
│  - Performs sentiment classification          │
│  - Identifies dominant tone                   │
│  - Generates short explanation                │
│                                               │
│  Structured JSON enforced in response         │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│          RESPONSE PARSING & VALIDATION        │
│   - JSON parsing & error handling             │
│   - Fallback on invalid responses             │
│   - Safe extraction of fields                 │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│            RESULTS AGGREGATION                │
│   - Sentiment, Tone, Explanation              │
│   - Stored in Pandas DataFrame                │
│   - Analysis time tracked                     │
└───────────────────────────────────────────────┘
                     │
                     ▼
┌───────────────────────────────────────────────┐
│            VISUALIZATION & OUTPUT             │
│   - Summary metrics                           │
│   - Sentiment & tone distributions            │
│   - Full results table                        │
│   - CSV download                              │
└───────────────────────────────────────────────┘



## Requirements


<details>
<summary><strong>System Requirements & Dependencies (click to expand)</strong></summary>

### 1️⃣ Model Requirements
- **Gemini 2.5 Pro (via OpenRouter)**  
  Used for advanced sentiment classification, tone detection, and generating short human-readable explanations.
- **JSON-only response format**  
  Enforced to ensure structured, machine-parseable output for reliable downstream processing.

---

### 2️⃣ Python Libraries & Purpose

- **pandas**  
  Used for tabular data handling, batch processing of text, and result aggregation.

- **streamlit**  
  Provides the interactive UI for file upload, demo execution, progress tracking, and result visualization.

- **openai (OpenRouter client)**  
  Handles communication with the LLM for sentiment and tone analysis.

- **json**  
  Parses structured LLM responses safely.

- **time**  
  Tracks total and per-item analysis duration.

- **pathlib**  
  Handles file system paths for demo data loading.

- **typing (List, Dict, Any)**  
  Improves code clarity and maintainability.

- **os, httpx**  
  Used for environment configuration and API communication handling.

---

### 3️⃣ Infrastructure Requirements
- **Execution Environment**: Local system or cloud VM
- **GPU**: Not required (API-based inference)
- **Internet Access**: Required for OpenRouter API calls
- **Secure API Key Storage**:
  - Environment Variables (recommended)
  - Streamlit Secrets (for deployment)

---

### 4️⃣ Input & Output Specifications
- **Input File**: Excel (.xlsx)
- **Required Column**: `review_text`
- **Output Fields**:
  - Sentiment (Positive / Negative / Neutral)
  - Tone (dominant tone)
  - Explanation (short reasoning)
- **Export**: Downloadable CSV report

</details>


## Repository & Demo


<details>
<summary><strong>Repository & Live Demo (click to expand)</strong></summary>

### Source Code Repository
- GitHub: [Sentiment & Tone Analyzer – Source Code](<https://github.com/AARD-IT/sentiment_and_tone_analyzer-master>)

### Live Demo / Deployment
- Streamlit App: [Live Sentiment & Tone Analyzer](<https://sentimentandtoneanalyzer-master-3lymcywxgkv5ecwfzyxayc.streamlit.app/>)

### Notes
- Repository contains the complete sentiment and tone analysis pipeline
- Uses LLM-based classification with strict JSON output enforcement
- Supports batch analysis, summary statistics, and CSV export

</details>


## Interview Questions – Technical


<details>
<summary><strong>Technical Interview Questions (click to expand)</strong></summary>

1. What is sentiment analysis and how does it differ from tone detection?
2. Why is an LLM-based approach better than traditional NLP for tone analysis?
3. How does enforcing JSON output improve system reliability?
4. Why was Gemini 2.5 Pro chosen for this task?
5. How does the prompt guide sentiment and tone classification?
6. What challenges exist in multi-class tone detection?
7. How does the system handle ambiguous or mixed sentiment text?
8. What role does prompt engineering play in accuracy?
9. How do you prevent hallucinated labels from the LLM?
10. How is batch processing handled efficiently?
11. What are common failure cases in sentiment analysis?
12. How does caching improve performance in Streamlit?
13. How do you handle invalid or empty responses from the LLM?
14. How would you extend this to multilingual sentiment analysis?
15. What are the limitations of LLM-based sentiment systems?

</details>


## Interview Questions – Use Case


<details>
<summary><strong>Use Case & Business Interview Questions (click to expand)</strong></summary>

1. How can this system be used for customer feedback analysis?
2. How does sentiment and tone analysis improve customer experience?
3. How can businesses use this for social media monitoring?
4. How would this help support teams prioritize urgent issues?
5. How can this be integrated into CRM platforms?
6. How does tone detection add value beyond sentiment?
7. How would HR teams use this for employee feedback?
8. How can this system help in brand reputation monitoring?
9. What business KPIs can be derived from sentiment trends?
10. How can this system be used for product review analysis?
11. How does automation reduce manual analysis effort?
12. How would marketing teams benefit from tone insights?
13. How can this system detect abusive or derogatory content?
14. How would you adapt this for real-time feedback streams?
15. What industries benefit most from sentiment intelligence?

</details>


## Interview Questions – Implementation


<details>
<summary><strong>Implementation Interview Questions (click to expand)</strong></summary>

1. How is the Excel input validated before processing?
2. Why is the `review_text` column mandatory?
3. How is the prompt dynamically generated per row?
4. How is progress tracked during batch processing?
5. How does the system handle partial failures?
6. How is response parsing made robust?
7. Why is JSON parsing wrapped in try/except blocks?
8. How is session state managed in Streamlit?
9. How is analysis time calculated and displayed?
10. How are results aggregated into a DataFrame?
11. How is the CSV export generated?
12. How would you refactor this into an API service?
13. How do you manage API rate limits?
14. How do you ensure reproducibility of results?
15. How would you modularize this code further?

</details>


## Interview Questions – Evaluation


<details>
<summary><strong>Evaluation & Quality Interview Questions (click to expand)</strong></summary>

1. How do you evaluate sentiment classification accuracy?
2. How do you validate tone predictions?
3. How would you create a labeled evaluation dataset?
4. How do you measure consistency across runs?
5. How do you handle subjective tone interpretation?
6. How would you benchmark this against traditional NLP models?
7. How do you detect systematic misclassification?
8. How would you perform error analysis?
9. How do you measure explanation quality?
10. How do you test edge cases like sarcasm?
11. How do you evaluate performance on short texts?
12. How do you evaluate long-form feedback?
13. How do you monitor quality drift over time?
14. How would you add human-in-the-loop validation?
15. How do you report evaluation metrics to stakeholders?

</details>


## Interview Questions – Deployment


<details>
<summary><strong>Deployment & Operations Interview Questions (click to expand)</strong></summary>

1. How is this application deployed using Streamlit?
2. How do you manage API keys securely in production?
3. How would you deploy this using Docker?
4. How do you handle concurrent users?
5. How do you log errors without exposing data?
6. How do you monitor application health?
7. How do you handle LLM downtime?
8. How would you add authentication?
9. How do you ensure data privacy compliance?
10. How would you deploy this on cloud platforms?
11. How do you handle file upload limits?
12. How do you version prompts safely?
13. How do you roll out updates without downtime?
14. How do you implement CI/CD for this app?
15. What deployment risks exist for LLM-based apps?

</details>


## Interview Questions – Scaling


<details>
<summary><strong>Scaling & Enterprise Interview Questions (click to expand)</strong></summary>

1. How would you scale this system to millions of reviews?
2. How would you introduce asynchronous processing?
3. How would you reduce LLM API costs?
4. How would you batch requests efficiently?
5. How would you cache results safely?
6. How would you support multi-tenant usage?
7. How would you add role-based access control?
8. How would you handle multilingual scaling?
9. How would you process streaming data?
10. How would you integrate message queues?
11. How would you ensure SLA guarantees?
12. How would you monitor system performance at scale?
13. How would you audit sentiment decisions?
14. How would you evolve this into a SaaS product?
15. How would you support enterprise reporting dashboards?

</details>
