Policy Intelligence Agent

A lightweight, fully-offline AI agent for X (Twitter) policy discourse analysis.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    DATA LAYER                                   │
│  Sentiment140 CSV  ──►  data/loader.py  ──►  Cleaned DataFrame  │
│  (up to 10,000 keyword-filtered tweets)                         │
└───────────────────────────────┬─────────────────────────────────┘
                                │
           ┌────────────────────┼────────────────────┐
           ▼                    ▼                    ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────────┐
│  SENTIMENT LAYER │  │   TOPIC LAYER    │  │  EMBEDDING LAYER     │
│  VADER           │  │  LDA (sklearn)   │  │  all-MiniLM-L6-v2    │
│  → pos/neg/neu   │  │  → 3 topics +    │  │  → 384-dim vectors   │
│  → compound      │  │    keywords      │  │  → FAISS IndexFlatL2 │
│    score         │  │  → examples      │  │    (persisted)       │
└────────┬─────────┘  └────────┬─────────┘  └──────────┬───────────┘
         │                     │                        │
         └──────────────┬──────┘                        │
                        ▼                               ▼
              ┌──────────────────┐            ┌─────────────────────┐
              │  AGENT LAYER     │◄───────────│   RAG RETRIEVER     │
              │  PolicyAgent     │            │  FAISS search +     │
              │                  │            │  template synthesis  │
              │  Intent router   │            └─────────────────────┘
              │  ┌─────────────┐ │
              │  │ Tool 1:     │ │
              │  │ sentiment   │ │
              │  │ _summary    │ │
              │  ├─────────────┤ │
              │  │ Tool 2:     │ │
              │  │ topic_      │ │
              │  │ exploration │ │
              │  ├─────────────┤ │
              │  │ Tool 3:     │ │
              │  │ rag_qa      │ │
              │  └─────────────┘ │
              └────────┬─────────┘
                       │
                       ▼
              ┌──────────────────┐
              │  REPORT LAYER    │
              │  ReportGenerator │
              │  executive_      │
              │  report.txt      │
              └──────────────────┘

Component Breakdown

1. Data Collection & Processing (`data/loader.py`)

Reads the Sentiment140 CSV (1.6 M tweets, Latin-1 encoding, no header)
Filters rows where text contains the chosen policy keyword (default: healthcare)
Caps at MAX_TWEETS (default: 10,000)
Cleaning pipeline: strip URLs → strip @mentions → keep hashtag words → remove punctuation → lowercase → collapse whitespace

2. Sentiment Analysis (`sentiment/analyzer.py`)

Model: VADER (vaderSentiment)
Classification thresholds (per original VADER paper):

Compound Label

≥ 0.05 Positive

≤ -0.05 Negative

otherwise Neutral
Outputs: sentiment column, sentiment_score, distribution dict, confidence score

3. Topic Modelling (`topics/modeler.py`)

Model: LDA via sklearn.decomposition.LatentDirichletAllocation
Vectorisation: CountVectorizer (max 5,000 features, English stop words)
Extracts top-3 topics with 8 keywords each
Assigns a dominant topic + confidence to every tweet

4. Vector Database & RAG (`embeddings/vectorstore.py`, `rag/retriever.py`)

Embeddings: all-MiniLM-L6-v2 (SentenceTransformers, 384 dims, ~80 MB)
Index: FAISS IndexFlatL2 — exact L2 nearest-neighbour, no approximation
Persisted to vectorstore/index.faiss + vectorstore/metadata.pkl
RAG loop: encode query → FAISS top-K search → aggregate sentiment/topic breakdown → template-synthesised answer (fully offline, no LLM API)

5. Agent Layer (`agent/orchestrator.py`)

See next section.

6. Executive Report (`report/generator.py`)

Auto-generated plain-text report with:

Sentiment distribution (visual bar chart)
Confidence score
Narrative summary with positive/negative tweet examples
Top LDA topics + keywords
Risk insight (thresholded: Low / Moderate / High)
Methodology table

Agent Explanation & Tool Orchestration

Architecture

PolicyAgent is a single-agent, tool-routing architecture — one agent that selects between three registered tools based on query intent.

Tool Registry

Tool Name	Trigger Keywords	Responsibility
`sentiment_summary`	sentiment, positive, negative, mood, opinion, distribution…	Reports overall sentiment distribution + confidence
`topic_exploration`	topic, theme, subject, discuss, keyword, issue, narrative…	LDA topic keywords + representative tweet examples
`rag_qa`	(default / fallback)	Semantic retrieval + grounded evidence-based answer

Routing Logic

query (str)
    │
    ▼
_classify_intent()
    │  → count overlap with SENTIMENT_KW set  → s_score
    │  → count overlap with TOPIC_KW set      → t_score
    │
    ├── s_score > t_score  →  sentiment_summary
    ├── t_score > s_score  →  topic_exploration
    └── tie                →  rag_qa  (most expressive tool, safe default)

Design choice: Keyword-scoring is intentionally transparent and auditable — critical in a policy context where stakeholders need to understand why a query went to a particular tool.

Orchestration Flow (per query)

User query
    → PolicyAgent.run(query)
        → _classify_intent(query)          # route
        → tool_fn()                        # execute
            [sentiment_summary]  →  format distribution + confidence
            [topic_exploration]  →  format LDA topics + examples
            [rag_qa]             →  RAGRetriever.answer(query)
                                       → VectorStore.search(query, top_k)
                                       → aggregate + synthesise
        → return {query, intent, tool_description, output}

Model Justifications

Component	Model	Justification
Sentiment	VADER	Specifically built for social media; handles slang, caps, punctuation emphasis, emoticons. No training, no GPU, no API. Rule-based = fully auditable.
Topic Modelling	LDA (sklearn)	Probabilistic, interpretable per-word topic distributions. Lightweight (CPU, seconds). Keywords are directly meaningful to policy analysts. BERTopic would be richer but violates the "not overbuilt" principle.
Embeddings	all-MiniLM-L6-v2	Distilled model (~80 MB). Excellent semantic similarity benchmark results. Runs on CPU in seconds per batch. No API key.
Vector DB	FAISS IndexFlatL2	Exact k-NN (no approximation) — appropriate for 10k vectors. Zero infrastructure. Serialises to two files.
Agent Routing	Keyword scoring	Zero latency, zero dependencies, fully transparent. Explainability > complexity for a policy tool.

Setup & Installation

Prerequisites

Python 3.10+
Sentiment140 dataset from Kaggle: Sentiment140 dataset

Steps

# 1. Clone the repo
git clone https://github.com/your-username/policy-intelligence-agent.git
cd policy-intelligence-agent

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Download Sentiment140
#    Place the CSV at: data/training.1600000.processed.noemoticon.csv
#    (Kaggle → download → unzip → move file)

# 5. (Optional) Change keyword in config.py
#    POLICY_KEYWORD = "healthcare"   ← edit this

Usage

# Full pipeline → generate report → interactive mode
python main.py --interactive

# Custom keyword
python main.py --keyword "climate" --interactive

# Single query and exit
python main.py --query "What do people feel about healthcare costs?"

# Skip re-embedding if vectorstore/ already exists (fast restart)
python main.py --load-index --interactive

# All options
python main.py --help

Example queries and expected tool routing

Query	Tool Selected	Why
`What is the overall sentiment?`	`sentiment_summary`	"sentiment" in query
`Show me what topics people discuss`	`topic_exploration`	"topics" in query
`Why are people angry about insurance?`	`rag_qa`	No strong keyword match → RAG default
`What are the main themes?`	`topic_exploration`	"themes" ≈ topic keyword
`How positive is the public mood?`	`sentiment_summary`	"positive", "mood"

Executive Report Sample

╔══════════════════════════════════════════════════════════════════╗
║          POLICY INTELLIGENCE  —  EXECUTIVE REPORT               ║
╚══════════════════════════════════════════════════════════════════╝
  Generated  : 2025-01-15  14:32
  Keyword    : "healthcare"
  Dataset    : Sentiment140 (offline X/Twitter proxy)
  Tweets     : 4,821

1. SENTIMENT DISTRIBUTION
──────────────────────────────────────────────────────────────────
  Positive    ████████████████          49.8%
  Negative    █████████████             41.2%
  Neutral     ██                         9.0%

  Confidence Score : 0.312 / 1.000

2. NARRATIVE SUMMARY
...
4. RISK INSIGHT
  Risk Level  : 🟡 MODERATE
  Detail      : 41.2% of tweets are negative. Monitoring recommended.

Project Structure

policy_agent/
├── config.py                  # Central configuration
├── main.py                    # CLI entry point + pipeline orchestration
├── requirements.txt
├── README.md
│
├── data/
│   ├── __init__.py
│   └── loader.py              # CSV load, keyword filter, text cleaning
│
├── sentiment/
│   ├── __init__.py
│   └── analyzer.py            # VADER sentiment classification
│
├── topics/
│   ├── __init__.py
│   └── modeler.py             # LDA topic modelling
│
├── embeddings/
│   ├── __init__.py
│   └── vectorstore.py         # SentenceTransformer + FAISS index
│
├── rag/
│   ├── __init__.py
│   └── retriever.py           # Semantic retrieval + answer synthesis
│
├── agent/
│   ├── __init__.py
│   └── orchestrator.py        # PolicyAgent — intent routing + tool dispatch
│
└── report/
    ├── __init__.py
    └── generator.py           # Executive report auto-generation

Design Principles

Modular — each concern in its own package; zero circular imports
Offline-first — no paid APIs, no live scraping, no cloud dependencies
Auditable — every routing and scoring decision is human-readable
Not overbuilt — right-sized models for a 10k-tweet corpus; complexity added only where it earns its keep
Logged — logging.getLogger(__name__) in every module; INFO-level by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Intelligence Agent

Table of Contents

Architecture Overview

Component Breakdown

1. Data Collection & Processing (`data/loader.py`)

2. Sentiment Analysis (`sentiment/analyzer.py`)

3. Topic Modelling (`topics/modeler.py`)

4. Vector Database & RAG (`embeddings/vectorstore.py`, `rag/retriever.py`)

5. Agent Layer (`agent/orchestrator.py`)

6. Executive Report (`report/generator.py`)

Agent Explanation & Tool Orchestration

Architecture

Tool Registry

Routing Logic

Orchestration Flow (per query)

Model Justifications

Setup & Installation

Prerequisites

Steps

Usage

Example queries and expected tool routing

Executive Report Sample

Project Structure

Design Principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
analyzer.py		analyzer.py
config.py		config.py
generator.py		generator.py
loader.py		loader.py
main.py		main.py
modeler.py		modeler.py
orchestrator.py		orchestrator.py
requirements.txt		requirements.txt
retriever.py		retriever.py
vectorstore.py		vectorstore.py

Compound	Label
≥ 0.05	Positive
≤ -0.05	Negative
otherwise	Neutral

Folders and files

Latest commit

History

Repository files navigation

Policy Intelligence Agent

Table of Contents

Architecture Overview

Component Breakdown

1. Data Collection & Processing (data/loader.py)

2. Sentiment Analysis (sentiment/analyzer.py)

3. Topic Modelling (topics/modeler.py)

4. Vector Database & RAG (embeddings/vectorstore.py, rag/retriever.py)

5. Agent Layer (agent/orchestrator.py)

6. Executive Report (report/generator.py)

Agent Explanation & Tool Orchestration

Architecture

Tool Registry

Routing Logic

Orchestration Flow (per query)

Model Justifications

Setup & Installation

Prerequisites

Steps

Usage

Example queries and expected tool routing

Executive Report Sample

Project Structure

Design Principles

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Collection & Processing (`data/loader.py`)

2. Sentiment Analysis (`sentiment/analyzer.py`)

3. Topic Modelling (`topics/modeler.py`)

4. Vector Database & RAG (`embeddings/vectorstore.py`, `rag/retriever.py`)

5. Agent Layer (`agent/orchestrator.py`)

6. Executive Report (`report/generator.py`)

Packages