Skip to content

devops-arg/finops-agent

Repository files navigation

FinOps Intelligence Platform

Built by DevOps ARG Β· powered with Claude

DevOps ARG

An AI-powered FinOps agent that analyzes AWS cloud costs and infrastructure using conversational AI. Ask questions in natural language β€” the agent reasons across Cost Explorer, infrastructure metrics, and AWS's native recommendation APIs (Cost Optimization Hub, Compute Optimizer, Rightsizing, Savings Plans) to answer them.

Read-only by design. The agent uses a dedicated IAM user with the AWS-managed ReadOnlyAccess policy. It can't create, modify, or delete anything in your account β€” it reads metrics and suggests changes that you apply yourself.


πŸ–Ό Screenshots

1. Conversational chat with live reasoning trace

The main entry point. Users ask questions in plain English; the right-hand panel streams the reasoning β€” every tool call, every intermediate result, and the final synthesis. The sidebar on the left holds the 27 preset questions grouped by category.

Chat interface with reasoning trace β€” user asks 'What's our biggest cost driver this week?' and the agent responds with a structured breakdown

2. Deep-dive chat example β€” NAT Gateway traffic analysis

A layered question ("how much traffic is flowing between AZs through NAT Gateway instead of staying within AZ?") produces a multi-round response: the agent queries Cost Explorer by usage type, correlates with the traffic pattern data, quantifies the optimization opportunity ($2,893/month), and recommends the concrete fix (AWS PrivateLink endpoints).

Chat showing cross-AZ and NAT Gateway cost analysis with savings recommendation

3. Cost Overview dashboard

Landing page of the dashboard tab β€” last-7-days spend, monthly projection, savings-identified counter, active-services count, a 4-week trend line, service-breakdown donut, and the latest cost anomalies detected by the AWS Cost Anomaly Detection integration.

Cost Overview dashboard: $6,500 last 7 days, $29.5k monthly projection, $4,470 savings identified, 13 active services, weekly trend chart, by-service donut, anomalies panel

4. Services breakdown

Per-service grid view β€” every active AWS service with last-week / this-week / monthly projection, and a mini sparkline for the 4-week trend. Clicking any service drills into that service's cost detail on the chat tab.

Services breakdown grid showing EC2, RDS, ElastiCache, OpenSearch, S3, Data Transfer, EKS, CloudWatch, MSK, Lambda, Route 53, Secrets Manager, WAF with weekly costs and trend sparklines

5. Live infrastructure health

EC2 / RDS / EKS / ElastiCache / OpenSearch / S3 cards, each showing monthly $ + health indicator (OK / warning / critical). Single-region by default; region=all fans out to every enabled region in parallel (~20s round-trip on an 18-region account). Warnings surface the exact issue β€” "Primary at 80% CPU, downsize candidate", "2 clusters missing system metrics on 2% data upload".

Infrastructure Health view with EC2, RDS, EKS, ElastiCache, OpenSearch, S3 cards β€” each with monthly cost, status, and warnings for CPU/memory/storage issues

6. Optimization recommendations

Cost Optimization Hub results ranked by monthly savings. Each card explains WHAT to change, the estimated monthly $ savings, a difficulty tag (easy / medium / hard), and an "Ask AI β†’" button that opens the chat with the recommendation as context β€” so you can ask "why this specific instance class?" or "what's the risk of this migration?" before executing.

Cost Optimizer view: Savings Score 62/100, $4,470/month identified across 5 recommendations β€” RDS downsize -$520, VPC endpoints -$450, Savings Plans -$870, Graviton migration -$320, more β€” each with Ask AI button


πŸ“– Table of contents

What it answers

The sidebar ships with 27 high-value FinOps questions across 9 categories, all drawn from real DevOps ARG case studies. Pick one with a click, or ask your own in free-form text.

Category Example question What it does under the hood
⚑ Quick insights "What's driving my AWS bill this month?" get_current_date β†’ query_aws_costs grouped by service β†’ ranks top 10 by $
🌐 Networking & data transfer "How much am I spending on NAT Gateway?" Cost Explorer filter on AWS Data Transfer + NAT usage type β†’ summarizes by AZ
πŸ–₯ Compute optimization "Which EC2 instances are oversized?" Queries get_rightsizing_recommendations + get_compute_optimizer_recommendations β†’ annotates with monthly savings
πŸ’Έ Commitments "What's my Savings Plans coverage?" get_savings_plans_coverage β†’ compares covered vs on-demand, flags gaps
πŸ’Ύ Storage & databases "Do I have orphaned EBS volumes?" list_ebs_volumes β†’ filters unattached/available β†’ sums monthly $ at gp2/gp3 rates
πŸ“Š Observability "How is my CloudWatch Logs cost trending?" Cost Explorer filter on AWS CloudWatch service + Logs usage type, 4-week series
πŸ”„ Real-time workloads "How many WebSocket connections am I running?" describe_load_balancers + CloudWatch active connections metric
πŸ“ˆ Predictive scaling "What's my safe baseline with Spot?" get_spot_instance_price_history + EC2 inventory β†’ spot interruption risk per family
πŸ€– AI Ops "What's the ROI of reducing MTTR?" Knowledge-base lookup for past incident ARR impact + recent cost burst patterns

Every preset question in the sidebar has a hover tooltip explaining which tools it triggers β€” a nice teaching moment for anyone new to FinOps.

Architecture

docker compose up --build    (3 services)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Frontend       │────▢│  FastAPI Backend            │────▢│  AWS APIs        β”‚
β”‚  nginx (:3000)  │◀────│  (:8000)                    │◀────│  (read-only)     β”‚
β”‚                 β”‚ SSE β”‚                             β”‚     β”‚                  β”‚
β”‚  Chat + Dash    β”‚     β”‚  Reasoning Engine (4 rounds)β”‚     β”‚  Cost Explorer   β”‚
β”‚  Infrastructure β”‚     β”‚  14 tool definitions        β”‚     β”‚  EC2/RDS/EKS/... β”‚
β”‚  Optimizer      β”‚     β”‚  Claude Sonnet 4            β”‚     β”‚  Cost Opt Hub    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data flow for the chat: user message β†’ reasoning engine picks tools β†’ boto3 calls AWS (or returns mock) β†’ LLM synthesizes answer β†’ streamed back over SSE so the trace panel shows reasoning in real time.

Quick start

1. Mock mode (no AWS account needed)

Great for demos, screencasts, and playing with the UI.

cp .env.example .env
# Edit .env: ANTHROPIC_API_KEY=sk-ant-...
# Set USE_MOCK_DATA=true
docker compose up --build
# Open http://localhost:3000

The mock data is a fictional Series B LatAm fintech called "Ribbon" with ~$28K/mo AWS spend across 3 regions. All dates are relative to today, so the demo never looks stale.

2. Live AWS mode (read-only)

For running against your real AWS account with safety guarantees.

# Step 1: create a dedicated read-only IAM user
./create-read-only.sh <your-admin-profile>

# This creates an IAM user named `finops-agent-readonly` with ReadOnlyAccess,
# generates keys, writes them to .env + ~/.aws/credentials as profile "finops",
# and verifies write attempts are blocked (tries s3 mb, expects 403).

# Step 2: start the stack
docker compose up --build
# Open http://localhost:3000

On startup the backend prints an identity check so you know which ARN is being used:

============================================================
AWS IDENTITY CHECK
  Account: <your-aws-account-id>
  ARN:     arn:aws:iam::<your-aws-account-id>:user/finops-agent-readonly
  UserId:  <IAM-user-unique-id>
============================================================

If the ARN doesn't contain "readonly" you'll get a WARNING in the logs β€” the identity check fails the startup if it can't validate live-mode credentials.

Feature flags (.env)

Variable Default What it does
AI_PROVIDER anthropic anthropic or openai
ANTHROPIC_API_KEY β€” Required for anthropic
ANTHROPIC_MODEL claude-sonnet-4-20250514 Model for reasoning
USE_LOCALSTACK false Force LocalStack demo (no real AWS)
USE_MOCK_DATA false in live, true in localstack Override: return mock data from /api/report, /api/infrastructure, /api/optimize even in live mode
AWS_DEFAULT_REGION us-east-1 Default region for single-region infra scans
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY β€” Read-only keys from create-read-only.sh

When USE_MOCK_DATA=true the backend returns the fictional "Ribbon" data. The dashboard header shows a yellow MOCK DATA badge so users can't mistake it for real numbers.

Endpoints

Method Path Description
GET /api/health Status + mode + USE_MOCK_DATA flag
POST /api/chat/stream SSE chat with live reasoning trace
POST /api/chat Non-streaming chat (returns full response)
GET /api/report Weekly cost report (by service/account/region/env/team)
POST /api/report/refresh Regenerate from live data
GET /api/infrastructure?region=<name> EC2/RDS/EKS/... health. region=all scans every enabled region in parallel (~20s). Omitted β†’ uses AWS_DEFAULT_REGION.
GET /api/optimize Recommendations from AWS Cost Optimization Hub

The reasoning engine

backend/reasoning/engine.py runs a multi-round agentic loop β€” not a single prompt with a canned answer. This is what lets the agent adapt when the first query doesn't return enough data, or when a user asks a layered question (e.g. "compare this month vs last month and tell me what changed").

Flow

  1. User query arrives over /api/chat/stream.
  2. LLM (Claude Sonnet 4 by default) sees:
    • SYSTEM_PROMPT β€” role, constraints, conversational tone
    • Full conversation history (for follow-ups)
    • 14 tool definitions β€” each with JSON schema and usage hints
  3. Round 1 β€” the LLM calls one or more tools. Typical trajectory: get_current_date β†’ query_aws_costs β†’ maybe get_cost_forecast.
  4. Reflection step β€” the engine injects a short system message asking "do you have enough data to answer? If not, what would you call next?". This is the hook that catches incomplete reasoning.
  5. Rounds 2-4 β€” up to 3 additional tool-call rounds if the LLM decides it needs more. Hard cap at 4 total rounds to control cost.
  6. Final synthesis β€” structured markdown with real numbers, formatted tables, and next-step recommendations. If a tool returned no data (empty account, no RIs, etc.), the agent states that explicitly rather than hallucinating.

Reasoning trace panel showing round-by-round tool calls β€” Round 1 get_current_date, Round 2 query_aws_costs, Round 3 ...

SSE event types (live streaming)

Event When fired Payload
thinking LLM generates a <thinking> block {text}
tool_call LLM decides to call a tool {name, args}
tool_result Backend returns tool output {name, result}
answer LLM produces final markdown {text} (streamed token-by-token)
done Conversation turn ends {rounds, total_tokens}
error Any failure {message, retriable}

The frontend renders these in the Reasoning Trace panel on the right of the chat tab, colored by type, in real time. No spinner theater β€” if the agent is on round 3 of 4, the user sees exactly which tool is running.

Why this matters

Most "chat with your data" demos use a single tool call and hope for the best. FinOps questions often need cross-referencing: cost by service then instances in that service then rightsizing recs for those instances. A multi-round loop with reflection lets the agent plan, execute, check, and re-plan β€” which is why the answers cite specific instance IDs and real dollar figures instead of vague strategies.

How multi-region scan works

  • Cost Explorer (/api/report) β€” calls GetCostAndUsage in us-east-1 without a region filter, so you get all-region totals by default. Optionally groups by REGION for per-region breakdown.
  • Infrastructure (/api/infrastructure) β€” by default scans the region set in AWS_DEFAULT_REGION. Pass ?region=all to parallel-scan every enabled region (18+ on typical accounts). The UI exposes a dropdown to switch.

The read-only setup script

create-read-only.sh is the safety moat. You run it once with an admin AWS profile; it provisions everything the agent needs and proves the agent can't write:

  1. Creates IAM user finops-agent-readonly using your admin profile
  2. Attaches the AWS-managed ReadOnlyAccess policy (covers ce:*, ec2:Describe*, rds:Describe*, all the *List* / *Get* β€” plus coh:* for Cost Optimization Hub)
  3. Generates access keys and writes them to:
    • .env (for the backend container β€” AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    • ~/.aws/credentials as profile finops (for your shell / debugging)
  4. Verifies read works β€” runs aws sts get-caller-identity using the new keys
  5. Verifies writes are BLOCKED β€” runs aws s3 mb s3://devopsarg-finops-verify-readonly with the new keys; expects HTTP 403 Forbidden. If the bucket gets created, the script aborts with a security warning and rolls back.
  6. Prints the ARN for audit (redacted in public output β€” see AWS IDENTITY CHECK block below)

Re-running the script rotates the access keys β€” deletes old, creates new, rewrites .env and ~/.aws/credentials. Safe to run on a schedule.

Recommended rotation cadence

  • Dev machines: every 30 days
  • Shared demo boxes: every 7 days
  • CI runners: per-run (the script takes ~15 seconds end to end)

Project structure

finops-agent/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ config/manager.py          β€” env-based config + feature flags
β”‚   β”œβ”€β”€ llm/                       β€” Anthropic / OpenAI providers
β”‚   β”œβ”€β”€ models/                    β€” Pydantic models
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ aws_costs.py           β€” 8 Cost Explorer tools
β”‚   β”‚   β”œβ”€β”€ aws_resources.py       β€” 7 infra inspection tools
β”‚   β”‚   β”œβ”€β”€ live_resources.py      β€” multi-region live AWS queries
β”‚   β”‚   β”œβ”€β”€ mock_data.py           β€” "Ribbon" fictional fintech data
β”‚   β”‚   β”œβ”€β”€ knowledge.py           β€” KB search tool
β”‚   β”‚   └── registry.py            β€” tool registry
β”‚   β”œβ”€β”€ reasoning/engine.py        β€” 4-round loop with reflection
β”‚   β”œβ”€β”€ reports/generator.py       β€” cost report builder
β”‚   └── server/main.py             β€” FastAPI app, SSE streaming
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html                 β€” single-page UI with 27 preset questions
β”‚   └── devopsarg-logo.png         β€” brand asset
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ seed_localstack.py         β€” LocalStack AWS resource seed
β”‚   β”œβ”€β”€ setup.py                   β€” knowledge base setup
β”‚   └── test_connection.py         β€” AWS + LLM connectivity test
β”œβ”€β”€ create-read-only.sh            β€” IAM read-only user provisioning
β”œβ”€β”€ docker-compose.yml             β€” 3 services (localstack, backend, frontend)
β”œβ”€β”€ nginx.conf                     β€” reverse proxy with SSE-safe buffering
└── .env.example                   β€” all config documented

Security posture

  • Read-only IAM by construction (managed policy, not custom).
  • No write paths in any tool β€” grep the backend/tools/ directory for create_*, delete_*, put_*, modify_* etc. β€” there aren't any.
  • Identity check on startup logs the ARN and fails-fast if invalid.
  • Cost Optimization Hub recommendations are read-only queries; implementation is always human-in-the-loop (the agent suggests, you execute).

Want help running it in production?

DevOps ARG builds this kind of tooling for Series A/B fintechs and scale-ups. If you want:

  • A custom FinOps agent for your stack
  • A managed deployment with auth + multi-tenant
  • Actual execution on the recommendations (rightsizing, Savings Plans, Karpenter migrations)

Book a call at devopsarg.com or read our case studies:

License

MIT


Built in Buenos Aires Β· devopsarg.com

About

Chat with your AWS costs in plain English. Multi-round agentic FinOps platform powered by Claude.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors