A multi-agent system that autonomously monitors business signals, detects billing anomalies (revenue leakage, duplicate refunds, tier mismatches), takes remediation actions, and learns from human feedback to improve over time — powered by LLM reasoning at every decision point.
Finance teams lose millions annually to billing anomalies — duplicate refunds, underbilling gaps, tier mismatches, manual credit abuse. These issues are caught late (if at all), investigated manually, and the same mistakes repeat because systems don't learn.
OpsIQ solves this with a closed-loop autonomous agent:
- Detect — Ingest signals, run 5 anomaly detectors, score and rank findings
- Act — Create remediation actions (cases, alerts, approval tasks) with audit trails
- Learn — Human feedback flows through an LLM-powered memory agent that adjusts detection thresholds, penalties, and confidence scoring
- Improve — Rerun triage with updated memory → fewer false positives, better calibration
- Signal ingestion from monitoring sources → LLM-powered investigation strategy
- 5 anomaly detectors: duplicate refunds, underbilling, tier mismatch, refund spikes, manual credits
- Severity/confidence/impact scoring with sentiment analysis on evidence text
- Remediation actions with workflow audit trails
- Feedback capture — approve, reject, false positive on each case
- LLM-powered memory — AI reasons about feedback to decide threshold adjustments
- LLM-powered evaluation — AI assesses run quality and generates calibration advice
- Visible improvement — rerun triage and see confidence downgrades, threshold adjustments, and impact penalties
- Ask business questions in plain English
- Get answers with charts, SQL, confidence scores, and follow-up suggestions
- Revenue analysis, refund trends, underbilling by tier, regional breakdowns
- Orchestrator — analyzes signals, decides investigation strategy, synthesizes findings
- Memory Agent — reasons about feedback to generate learning updates
- Evaluator — assesses run quality and generates calibration advice
- All reasoning traces visible in the UI for full observability
┌─────────────────────────────────────────────────────────┐
│ Streamlit Frontend │
│ Mission Control │ Triage Cases │ Analyst │ QA Lab │
└────────────────────────┬────────────────────────────────┘
│ HTTP
┌────────────────────────▼────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌──────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Monitor │→│ Triage │→│ Action │ │
│ │ Agent │ │ Agent │ │ Engine │ │
│ └────┬─────┘ └──┬──┬─────┘ └────┬───────┘ │
│ │ │ │ │ │
│ ┌────▼─────┐ ┌───▼──┘ ┌──────┐ │ │
│ │ Signal │ │Anomaly│ │Senti-│ │ │
│ │ Adapter │ │+Score │ │ment │ │ │
│ └──────────┘ │Tools │ │Engine│ │ │
│ └───────┘ └──────┘ │ │
│ ┌──────────┐ ┌──────────┐ ┌─────▼────┐ │
│ │ Metric │ │Evaluator │ │ Memory │ │
│ │ Layer │ │ Agent │ │ Agent │ │
│ └──────────┘ │ (+ LLM) │ │ (+ LLM) │ │
│ └──────────┘ └──────────┘ │
│ │
│ ┌────────────────┐ ┌─────────────────────┐ │
│ │ Groq/OpenAI │ │ Orchestrator │ │
│ │ LLM Client │ │ (LLM reasoning │ │
│ └────────────────┘ │ at every step) │ │
│ └─────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ DuckDB (analytics) │ SQLite (state) │ │
│ └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
- Python 3.11+ / FastAPI — backend API + agent orchestration
- Streamlit — frontend UI (4 pages)
- Groq (or OpenAI) — LLM reasoning (llama-3.3-70b-versatile, free tier)
- DuckDB — in-memory analytics engine (loaded from CSV seed data)
- SQLite — persistence for feedback, evals, memory, traces, cases
- Plotly — interactive charts
- Pydantic — data models and validation
opsiq/
├── app/
│ ├── main.py # FastAPI application
│ ├── config.py # Settings from .env (LLM keys, server)
│ ├── models/schemas.py # Pydantic models
│ ├── api/ # REST endpoints
│ ├── agents/
│ │ ├── orchestrator.py # LLM-powered autonomous pipeline
│ │ ├── monitor_agent.py # Signal ingestion
│ │ ├── triage_agent.py # Anomaly detection → scoring → cases
│ │ ├── analyst_agent.py # Business Q&A
│ │ ├── evaluator_agent.py # LLM-powered quality scoring
│ │ └── memory_agent.py # LLM-powered feedback → memory updates
│ ├── tools/ # Anomaly detectors, scoring, SQL, charts
│ ├── adapters/ # Signal, metric, action, sentiment, LLM
│ ├── services/ # DuckDB data loader
│ └── storage/ # SQLite persistence layer
├── frontend/
│ └── streamlit_app.py # 4-page Streamlit UI
├── data/ # Seed CSV data with planted anomalies
├── tests/ # 133 tests (schemas, adapters, agents, API)
├── requirements.txt
├── .env.example
└── README.md
- Python 3.11+
cd opsiq
pip install -r requirements.txtcp .env.example .envEdit .env:
# LLM reasoning (free — recommended)
GROQ_API_KEY=your_groq_key # https://console.groq.com → API Keys
# Or use OpenAI instead
OPENAI_API_KEY=your_key # optional fallbackNo LLM key? The system still works — agents fall back to deterministic rule-based logic. With an LLM key, the agents reason about signals, synthesize findings, and learn from feedback intelligently.
python data/seed_data.pycd opsiq
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadcd opsiq
python -m streamlit run frontend/streamlit_app.py --server.port 8501python -m pytest tests/ -v- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
This is the simplest production setup for OpsIQ:
- Backend (FastAPI) on Render
- Frontend (Streamlit) on Streamlit Community Cloud
Create a new Web Service from your GitHub repo.
- Root Directory:
opsiq - Build Command:
pip install -r requirements.txt - Start Command:
uvicorn app.main:app --host 0.0.0.0 --port $PORT
Set env vars in Render:
GROQ_API_KEY(recommended)OPENAI_API_KEY(optional)
After deploy, copy your backend URL, e.g.:
https://opsiq-backend.onrender.com
Create a new Streamlit app pointing to this repo:
- Main file path:
opsiq/frontend/streamlit_app.py
In Streamlit app settings, add secrets/environment variables:
BACKEND_URL="https://opsiq-backend.onrender.com"This is read by the frontend at runtime via BACKEND_URL.
- Open backend docs:
https://<your-backend>/docs - Open Streamlit app
- Run "Mission Control" investigation
- Confirm cases, traces, and analyst queries all work
- Render free tier may cold-start; first API call can take a few seconds.
- SQLite storage is ephemeral on many free hosts. For durable production state, move to a managed DB.
-
Mission Control — Click "Run Autonomous Investigation"
- LLM analyzes signals and decides investigation strategy
- ~6 anomaly cases detected, ranked by impact
- Remediation actions created (case, alert, approval task)
- Full LLM reasoning trace visible at every step
-
Triage Cases — Review cases, mark one as False Positive
- See evidence, recommended action, sentiment risk score
- Feedback triggers the self-improvement loop
-
QA Lab — See the learning in action
- Memory updated: LLM decides which thresholds to adjust and by how much
- Evaluation: LLM analyzes calibration and suggests improvements
- Full reasoning log for observability
-
Triage Cases — Click "Rerun with Memory"
- False-positive case now shows lower confidence (was high → medium)
- Impact reduced by 15% penalty
- System learned from one interaction
-
Analyst — Ask "Why is revenue down this month?"
- Get answer with chart, SQL, confidence, follow-ups
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /monitor/run |
Run autonomous investigation |
| GET | /monitor/signals |
Fetch all signals |
| GET | /triage/cases |
List all cases (with sentiment scores) |
| POST | /triage/rerun |
Rerun with updated memory |
| POST | /analyst/query |
Ask a business question |
| POST | /feedback |
Submit feedback |
| GET | /feedback/improvement |
Self-improvement summary |
| GET | /eval/latest |
Latest evaluation |
| GET | /llm/status |
LLM provider status |
| GET | /llm/reasoning |
Full LLM reasoning log |
| POST | /sentiment/analyze |
Analyze text sentiment |
| GET | /sentiment/log |
Sentiment analysis audit trail |
| GET | /memory |
Current memory state |
| GET | /traces/latest |
Latest run trace |
| POST | /demo/reset |
Reset all state |
User Feedback → Memory Agent (LLM) → Memory Store → Triage Agent (rerun)
↓
Evaluator Agent (LLM) → Eval Store → QA Lab UI
What changes on rerun after feedback:
- LLM reasons about feedback → decides which thresholds to adjust and by how much
false_positive_penaltyincreases → confidence downgraded for that anomaly type- Type-specific thresholds adjust (e.g. duplicate window narrows, underbilling threshold rises)
- Scoring tool applies penalty → lower impact scores
- Evaluator recalculates correctness and generates calibration advice
| Component | With API Key | Without API Key |
|---|---|---|
| LLM (Groq/OpenAI) | Real AI reasoning at every decision point | Deterministic fallback (rule-based) |
The system is fully functional without any API keys — every feature works with deterministic logic. With a Groq key (free), the agents become truly intelligent.
MIT