Built by DevOps ARG Β· powered with Claude
An AI-powered FinOps agent that analyzes AWS cloud costs and infrastructure using conversational AI. Ask questions in natural language β the agent reasons across Cost Explorer, infrastructure metrics, and AWS's native recommendation APIs (Cost Optimization Hub, Compute Optimizer, Rightsizing, Savings Plans) to answer them.
Read-only by design. The agent uses a dedicated IAM user with the AWS-managed
ReadOnlyAccesspolicy. It can't create, modify, or delete anything in your account β it reads metrics and suggests changes that you apply yourself.
The main entry point. Users ask questions in plain English; the right-hand panel streams the reasoning β every tool call, every intermediate result, and the final synthesis. The sidebar on the left holds the 27 preset questions grouped by category.
A layered question ("how much traffic is flowing between AZs through NAT Gateway instead of staying within AZ?") produces a multi-round response: the agent queries Cost Explorer by usage type, correlates with the traffic pattern data, quantifies the optimization opportunity ($2,893/month), and recommends the concrete fix (AWS PrivateLink endpoints).
Landing page of the dashboard tab β last-7-days spend, monthly projection, savings-identified counter, active-services count, a 4-week trend line, service-breakdown donut, and the latest cost anomalies detected by the AWS Cost Anomaly Detection integration.
Per-service grid view β every active AWS service with last-week / this-week / monthly projection, and a mini sparkline for the 4-week trend. Clicking any service drills into that service's cost detail on the chat tab.
EC2 / RDS / EKS / ElastiCache / OpenSearch / S3 cards, each showing monthly $ + health indicator (OK / warning / critical). Single-region by default; region=all fans out to every enabled region in parallel (~20s round-trip on an 18-region account). Warnings surface the exact issue β "Primary at 80% CPU, downsize candidate", "2 clusters missing system metrics on 2% data upload".
Cost Optimization Hub results ranked by monthly savings. Each card explains WHAT to change, the estimated monthly $ savings, a difficulty tag (easy / medium / hard), and an "Ask AI β" button that opens the chat with the recommendation as context β so you can ask "why this specific instance class?" or "what's the risk of this migration?" before executing.
- What it answers β the 27 preset questions and what tools they trigger
- Architecture β services, data flow, diagram
- Quick start β mock mode + live AWS mode
- Feature flags β all
.envvariables - Endpoints β HTTP API reference
- The reasoning engine β multi-round loop, reflection, SSE events
- The read-only setup script β IAM provisioning + write-block verification
- Project structure
- Security posture
- Want help running it in production?
The sidebar ships with 27 high-value FinOps questions across 9 categories, all drawn from real DevOps ARG case studies. Pick one with a click, or ask your own in free-form text.
| Category | Example question | What it does under the hood |
|---|---|---|
| β‘ Quick insights | "What's driving my AWS bill this month?" | get_current_date β query_aws_costs grouped by service β ranks top 10 by $ |
| π Networking & data transfer | "How much am I spending on NAT Gateway?" | Cost Explorer filter on AWS Data Transfer + NAT usage type β summarizes by AZ |
| π₯ Compute optimization | "Which EC2 instances are oversized?" | Queries get_rightsizing_recommendations + get_compute_optimizer_recommendations β annotates with monthly savings |
| πΈ Commitments | "What's my Savings Plans coverage?" | get_savings_plans_coverage β compares covered vs on-demand, flags gaps |
| πΎ Storage & databases | "Do I have orphaned EBS volumes?" | list_ebs_volumes β filters unattached/available β sums monthly $ at gp2/gp3 rates |
| π Observability | "How is my CloudWatch Logs cost trending?" | Cost Explorer filter on AWS CloudWatch service + Logs usage type, 4-week series |
| π Real-time workloads | "How many WebSocket connections am I running?" | describe_load_balancers + CloudWatch active connections metric |
| π Predictive scaling | "What's my safe baseline with Spot?" | get_spot_instance_price_history + EC2 inventory β spot interruption risk per family |
| π€ AI Ops | "What's the ROI of reducing MTTR?" | Knowledge-base lookup for past incident ARR impact + recent cost burst patterns |
Every preset question in the sidebar has a hover tooltip explaining which tools it triggers β a nice teaching moment for anyone new to FinOps.
docker compose up --build (3 services)
βββββββββββββββββββ ββββββββββββββββββββββββββββββ ββββββββββββββββββββ
β Frontend ββββββΆβ FastAPI Backend ββββββΆβ AWS APIs β
β nginx (:3000) βββββββ (:8000) βββββββ (read-only) β
β β SSE β β β β
β Chat + Dash β β Reasoning Engine (4 rounds)β β Cost Explorer β
β Infrastructure β β 14 tool definitions β β EC2/RDS/EKS/... β
β Optimizer β β Claude Sonnet 4 β β Cost Opt Hub β
βββββββββββββββββββ ββββββββββββββββββββββββββββββ ββββββββββββββββββββ
Data flow for the chat: user message β reasoning engine picks tools β boto3 calls AWS (or returns mock) β LLM synthesizes answer β streamed back over SSE so the trace panel shows reasoning in real time.
Great for demos, screencasts, and playing with the UI.
cp .env.example .env
# Edit .env: ANTHROPIC_API_KEY=sk-ant-...
# Set USE_MOCK_DATA=true
docker compose up --build
# Open http://localhost:3000The mock data is a fictional Series B LatAm fintech called "Ribbon" with ~$28K/mo AWS spend across 3 regions. All dates are relative to today, so the demo never looks stale.
For running against your real AWS account with safety guarantees.
# Step 1: create a dedicated read-only IAM user
./create-read-only.sh <your-admin-profile>
# This creates an IAM user named `finops-agent-readonly` with ReadOnlyAccess,
# generates keys, writes them to .env + ~/.aws/credentials as profile "finops",
# and verifies write attempts are blocked (tries s3 mb, expects 403).
# Step 2: start the stack
docker compose up --build
# Open http://localhost:3000On startup the backend prints an identity check so you know which ARN is being used:
============================================================
AWS IDENTITY CHECK
Account: <your-aws-account-id>
ARN: arn:aws:iam::<your-aws-account-id>:user/finops-agent-readonly
UserId: <IAM-user-unique-id>
============================================================
If the ARN doesn't contain "readonly" you'll get a WARNING in the logs β the identity check fails the startup if it can't validate live-mode credentials.
| Variable | Default | What it does |
|---|---|---|
AI_PROVIDER |
anthropic |
anthropic or openai |
ANTHROPIC_API_KEY |
β | Required for anthropic |
ANTHROPIC_MODEL |
claude-sonnet-4-20250514 |
Model for reasoning |
USE_LOCALSTACK |
false |
Force LocalStack demo (no real AWS) |
USE_MOCK_DATA |
false in live, true in localstack |
Override: return mock data from /api/report, /api/infrastructure, /api/optimize even in live mode |
AWS_DEFAULT_REGION |
us-east-1 |
Default region for single-region infra scans |
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY |
β | Read-only keys from create-read-only.sh |
When USE_MOCK_DATA=true the backend returns the fictional "Ribbon" data. The
dashboard header shows a yellow MOCK DATA badge so users can't mistake it for
real numbers.
| Method | Path | Description |
|---|---|---|
GET |
/api/health |
Status + mode + USE_MOCK_DATA flag |
POST |
/api/chat/stream |
SSE chat with live reasoning trace |
POST |
/api/chat |
Non-streaming chat (returns full response) |
GET |
/api/report |
Weekly cost report (by service/account/region/env/team) |
POST |
/api/report/refresh |
Regenerate from live data |
GET |
/api/infrastructure?region=<name> |
EC2/RDS/EKS/... health. region=all scans every enabled region in parallel (~20s). Omitted β uses AWS_DEFAULT_REGION. |
GET |
/api/optimize |
Recommendations from AWS Cost Optimization Hub |
backend/reasoning/engine.py runs a multi-round agentic loop β not a single prompt with a canned answer. This is what lets the agent adapt when the first query doesn't return enough data, or when a user asks a layered question (e.g. "compare this month vs last month and tell me what changed").
- User query arrives over
/api/chat/stream. - LLM (Claude Sonnet 4 by default) sees:
SYSTEM_PROMPTβ role, constraints, conversational tone- Full conversation history (for follow-ups)
- 14 tool definitions β each with JSON schema and usage hints
- Round 1 β the LLM calls one or more tools. Typical trajectory:
get_current_dateβquery_aws_costsβ maybeget_cost_forecast. - Reflection step β the engine injects a short system message asking "do you have enough data to answer? If not, what would you call next?". This is the hook that catches incomplete reasoning.
- Rounds 2-4 β up to 3 additional tool-call rounds if the LLM decides it needs more. Hard cap at 4 total rounds to control cost.
- Final synthesis β structured markdown with real numbers, formatted tables, and next-step recommendations. If a tool returned no data (empty account, no RIs, etc.), the agent states that explicitly rather than hallucinating.
| Event | When fired | Payload |
|---|---|---|
thinking |
LLM generates a <thinking> block |
{text} |
tool_call |
LLM decides to call a tool | {name, args} |
tool_result |
Backend returns tool output | {name, result} |
answer |
LLM produces final markdown | {text} (streamed token-by-token) |
done |
Conversation turn ends | {rounds, total_tokens} |
error |
Any failure | {message, retriable} |
The frontend renders these in the Reasoning Trace panel on the right of the chat tab, colored by type, in real time. No spinner theater β if the agent is on round 3 of 4, the user sees exactly which tool is running.
Most "chat with your data" demos use a single tool call and hope for the best. FinOps questions often need cross-referencing: cost by service then instances in that service then rightsizing recs for those instances. A multi-round loop with reflection lets the agent plan, execute, check, and re-plan β which is why the answers cite specific instance IDs and real dollar figures instead of vague strategies.
- Cost Explorer (
/api/report) β callsGetCostAndUsageinus-east-1without a region filter, so you get all-region totals by default. Optionally groups byREGIONfor per-region breakdown. - Infrastructure (
/api/infrastructure) β by default scans the region set inAWS_DEFAULT_REGION. Pass?region=allto parallel-scan every enabled region (18+ on typical accounts). The UI exposes a dropdown to switch.
create-read-only.sh is the safety moat. You run it once with an admin AWS profile; it provisions everything the agent needs and proves the agent can't write:
- Creates IAM user
finops-agent-readonlyusing your admin profile - Attaches the AWS-managed
ReadOnlyAccesspolicy (coversce:*,ec2:Describe*,rds:Describe*, all the*List*/*Get*β pluscoh:*for Cost Optimization Hub) - Generates access keys and writes them to:
.env(for the backend container βAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)~/.aws/credentialsas profilefinops(for your shell / debugging)
- Verifies read works β runs
aws sts get-caller-identityusing the new keys - Verifies writes are BLOCKED β runs
aws s3 mb s3://devopsarg-finops-verify-readonlywith the new keys; expects HTTP 403 Forbidden. If the bucket gets created, the script aborts with a security warning and rolls back. - Prints the ARN for audit (redacted in public output β see
AWS IDENTITY CHECKblock below)
Re-running the script rotates the access keys β deletes old, creates new, rewrites .env and ~/.aws/credentials. Safe to run on a schedule.
- Dev machines: every 30 days
- Shared demo boxes: every 7 days
- CI runners: per-run (the script takes ~15 seconds end to end)
finops-agent/
βββ backend/
β βββ config/manager.py β env-based config + feature flags
β βββ llm/ β Anthropic / OpenAI providers
β βββ models/ β Pydantic models
β βββ tools/
β β βββ aws_costs.py β 8 Cost Explorer tools
β β βββ aws_resources.py β 7 infra inspection tools
β β βββ live_resources.py β multi-region live AWS queries
β β βββ mock_data.py β "Ribbon" fictional fintech data
β β βββ knowledge.py β KB search tool
β β βββ registry.py β tool registry
β βββ reasoning/engine.py β 4-round loop with reflection
β βββ reports/generator.py β cost report builder
β βββ server/main.py β FastAPI app, SSE streaming
βββ frontend/
β βββ index.html β single-page UI with 27 preset questions
β βββ devopsarg-logo.png β brand asset
βββ scripts/
β βββ seed_localstack.py β LocalStack AWS resource seed
β βββ setup.py β knowledge base setup
β βββ test_connection.py β AWS + LLM connectivity test
βββ create-read-only.sh β IAM read-only user provisioning
βββ docker-compose.yml β 3 services (localstack, backend, frontend)
βββ nginx.conf β reverse proxy with SSE-safe buffering
βββ .env.example β all config documented
- Read-only IAM by construction (managed policy, not custom).
- No write paths in any tool β grep the
backend/tools/directory forcreate_*,delete_*,put_*,modify_*etc. β there aren't any. - Identity check on startup logs the ARN and fails-fast if invalid.
- Cost Optimization Hub recommendations are read-only queries; implementation is always human-in-the-loop (the agent suggests, you execute).
DevOps ARG builds this kind of tooling for Series A/B fintechs and scale-ups. If you want:
- A custom FinOps agent for your stack
- A managed deployment with auth + multi-tenant
- Actual execution on the recommendations (rightsizing, Savings Plans, Karpenter migrations)
Book a call at devopsarg.com or read our case studies:
- $237K/year AWS savings β concrete breakdown, real customer
- Karpenter + Spot + scale-to-zero β $392K/year
- FinOps dashboard with Grafana + Prometheus β $8K/mo waste found day 1
MIT
Built in Buenos Aires Β· devopsarg.com







