Comprehensive observability platform for monitoring multiple AI development tools with cost tracking, user attribution, and privacy-first design.
Quick Start | Documentation | Dashboards | Architecture
- Overview
- Key Features
- Quick Start
- Supported AI Tools
- Architecture
- Dashboards
- Configuration
- Cost Tracking
- Deployment Options
- Documentation
- Project Status
- Contributing
- License
- Acknowledgments
As AI-powered development tools become essential to modern software engineering, understanding their usage patterns, costs, and performance has become critical. AI Development Insight Stack provides comprehensive observability for your AI tooling ecosystem, giving you the visibility you need to optimize costs, improve performance, and ensure compliance.
Modern development teams use multiple AI tools (Claude Code, GitHub Copilot, OpenAI Codex, etc.), but lack visibility into:
- Cost attribution - Which teams, projects, and users are driving AI costs?
- Usage patterns - Who's using what, when, and how effectively?
- Performance metrics - Are API calls fast enough? Are we hitting rate limits?
- Error tracking - What's failing and why?
This platform provides a production-ready, privacy-first observability stack that:
- Collects telemetry from multiple AI tools using OpenTelemetry standards
- Tracks costs in real-time with accurate pricing across 15+ AI models
- Provides 8 specialized Grafana dashboards for comprehensive insights
- Supports three deployment options: Docker Compose (local), AWS ECS (cloud), and Kubernetes (enterprise)
- Respects privacy with GDPR-compliant user ID hashing and PII removal
- Engineering Leaders - Understand AI tool ROI and optimize team spending
- DevOps Teams - Monitor AI tool performance and infrastructure health
- Finance Teams - Track and forecast AI development costs
- Security Teams - Ensure compliance and audit AI tool usage
- Individual Developers - Understand personal usage patterns and optimize workflows
Monitor Claude Code, GitHub Copilot, OpenAI API, and any tool with OpenTelemetry integration - all in one unified platform.
Accurate cost calculation for 15+ AI models with automatic updates from pricing tables. Track costs by user, project, team, and organization.
Understand who's using what, when, and for which projects. Perfect for chargeback, capacity planning, and usage optimization.
Pre-built dashboards covering:
- Overview & health monitoring
- Performance analysis & optimization
- Error tracking & debugging
- Cost analysis & forecasting
- User activity & attribution
- Project analytics
- Multi-tool comparison
- Resource usage & rate limits
- GDPR Compliant - User IDs automatically hashed using SHA-256
- PII Removal - Usernames, email addresses, and user agents stripped
- Configurable Retention - Control how long data is stored
- Secure by Default - Best practices baked into all deployment options
- Docker Compose - 5-minute local setup for development
- AWS ECS - Production-ready cloud deployment with Terraform
- Kubernetes - Enterprise-grade orchestration with high availability
- Auto-scaling based on load
- High availability across all components
- Persistent storage with automatic backups
- Health checks and self-healing
- Comprehensive logging and monitoring
Built on industry-standard open-source tools:
- OpenTelemetry - Vendor-neutral telemetry collection
- Prometheus - Time-series metrics storage
- Loki - Log aggregation and querying
- Tempo - Distributed tracing
- Grafana - Visualization and alerting
Get up and running in 5 minutes with Docker Compose:
- Docker and Docker Compose installed (Get Docker)
- 4GB RAM available for Docker
- Ports available: 3001, 3100, 3200, 4317, 4318, 9090
# 1. Clone the repository
git clone git@github.com:PimpMyNines/AI-Development-Insight-Stack.git
cd AI-Development-Insight-Stack
# 2. Configure environment variables
cp .env.example .env
# Edit .env to set your Grafana password (IMPORTANT!)
# Default is admin/admin - CHANGE THIS for production!
# 3. Start the stack
docker-compose -f docker/docker-compose.yml up -d
# 4. Verify all services are running
docker-compose -f docker/docker-compose.yml psExpected output: All 5 services (otel-collector, prometheus, loki, tempo, grafana) should show "Up".
# Set environment variable
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Or add to Claude Code settings.json
{
"telemetry": {
"enabled": true,
"endpoint": "http://localhost:4317",
"protocol": "grpc"
}
}# Install the wrapper
pip install -r bridges/openai-wrapper/python/requirements.txt
# Use as drop-in replacement
import openai_wrapper as openai
client = openai.OpenAI(api_key="your-key")
# All API calls automatically tracked!# Use our Python SDK
from otlp_client import OTLPClient
client = OTLPClient(
endpoint="http://localhost:4317",
tool_name="my-ai-tool",
user_id="developer@company.com",
project_name="my-project"
)
client.track_completion(
model="gpt-4",
input_tokens=100,
output_tokens=50,
latency_ms=1234
)See bridges/generic-otlp/ for Node.js examples.
- Open your browser to http://localhost:3001
- Login with credentials from
.env(default:admin/admin)β οΈ IMPORTANT: Change the default password immediately!
- Navigate to Dashboards β AI Dev Tools folder
- Open any dashboard to see your telemetry
# Check OpenTelemetry Collector is receiving data
curl http://localhost:4318/v1/traces
# Check Prometheus has metrics
curl http://localhost:9090/api/v1/query?query=ai_dev_insight_request_total
# View container logs
docker-compose -f docker/docker-compose.yml logs -f otel-collectorThat's it! You now have a complete AI development observability platform running locally.
| AI Tool | Status | Integration Method | Documentation |
|---|---|---|---|
| Claude Code | β Production | Native OTLP support | Setup Guide |
| OpenAI API | β Production | Python wrapper library | Integration Guide |
| Generic OTLP | β Production | Python/Node.js SDK | SDK Examples |
| GitHub Copilot | π§ Planned | VS Code extension | Coming in Phase 2 |
| AWS CodeWhisperer | π§ Planned | IDE plugin | Coming in Phase 3 |
| Cursor AI | π§ Planned | Native integration | Coming in Phase 3 |
| Codeium | π§ Planned | SDK wrapper | Coming in Phase 3 |
Adding Your Own Tool? Use our Generic OTLP SDK to integrate any AI tool in minutes.
graph TB
subgraph AI Tools
CC[Claude Code]
OAI[OpenAI API]
GHC[GitHub Copilot]
CUST[Custom Tools]
end
subgraph Telemetry Collection
OTEL[OpenTelemetry Collector<br/>Port 4317/4318]
end
subgraph Storage Backends
PROM[Prometheus<br/>Metrics Storage<br/>15 days retention]
LOKI[Loki<br/>Log Storage<br/>31 days retention]
TEMPO[Tempo<br/>Trace Storage<br/>30 days retention]
end
subgraph Visualization
GRAF[Grafana Dashboards<br/>8 Pre-built Views]
end
CC -->|OTLP gRPC| OTEL
OAI -->|OTLP HTTP| OTEL
GHC -->|OTLP gRPC| OTEL
CUST -->|OTLP| OTEL
OTEL -->|Metrics Export| PROM
OTEL -->|Logs Export| LOKI
OTEL -->|Traces Export| TEMPO
PROM --> GRAF
LOKI --> GRAF
TEMPO --> GRAF
- Collection - AI tools send telemetry to OpenTelemetry Collector via OTLP (gRPC port 4317 or HTTP port 4318)
- Processing - Collector processes telemetry:
- Hashes user IDs for privacy (SHA-256)
- Removes PII (names, emails, user agents)
- Calculates costs using pricing table
- Adds standard attributes for multi-tool support
- Batches data for efficiency
- Storage - Processed data exported to specialized backends:
- Prometheus - Metrics (request rates, latency, costs)
- Loki - Logs (errors, debug info, events)
- Tempo - Traces (end-to-end request flows)
- Visualization - Grafana queries all backends to provide unified dashboards
| Component | Role | Endpoint | Retention |
|---|---|---|---|
| OpenTelemetry Collector | Receive, process, and route telemetry | localhost:4317 (gRPC)localhost:4318 (HTTP) |
N/A |
| Prometheus | Time-series metrics storage | localhost:9090 |
15 days |
| Loki | Log aggregation and storage | localhost:3100 |
31 days |
| Tempo | Distributed tracing backend | localhost:3200 |
30 days |
| Grafana | Dashboards and visualization | localhost:3001 |
N/A |
Resource Usage (typical single user):
- CPU: ~0.5 cores
- Memory: ~2GB total
- Disk: ~10GB for 30 days of data
Purpose: High-level health monitoring and quick status check across all AI tools.
Key Metrics:
- Request rate (requests/second)
- Success rate (% of 2xx responses)
- P50/P95/P99 latency
- Error count by tool
- Active sessions
- Recent logs
Best For: Daily check-ins, incident detection, executive summaries
Purpose: Deep dive into how each AI tool is being used.
Key Metrics:
- Requests by tool over time
- Model usage distribution (GPT-4 vs Claude vs others)
- Request throughput by endpoint
- Average response time by method
- Concurrent requests per tool
Best For: Usage pattern analysis, tool comparison, capacity planning
Purpose: Understand who's using AI tools and how.
Key Metrics:
- Active users by tool
- Session duration analysis
- User activity heatmap
- Top users by request count
- User retention metrics
Best For: User adoption tracking, engagement analysis, chargeback
Purpose: Track and optimize AI development costs.
Key Metrics:
- Cost per hour/day/month
- Token usage (input vs output)
- Cost by user, project, and team
- Cost by model (which models are expensive?)
- Budget alerts and forecasting
Best For: Cost optimization, budget planning, financial reporting
π‘ Pro Tip: Set budget alerts to notify when daily costs exceed thresholds!
Purpose: Optimize latency and throughput.
Key Metrics:
- Latency percentiles by endpoint
- Token processing speed
- Queue depth and wait times
- Rate limit headroom
Purpose: Identify and resolve issues quickly.
Key Metrics:
- Error rate trends
- HTTP status code distribution (4xx vs 5xx)
- Top error sources
- Error logs with context
- Failed request traces
Purpose: Compare performance across different AI tools.
Key Metrics:
- Cost per request by tool
- Latency comparison
- Success rate by tool
- Feature usage patterns
Purpose: Monitor infrastructure and avoid rate limits.
Key Metrics:
- Rate limit consumption
- Infrastructure CPU/memory
- Network bandwidth
- Storage usage
Screenshots coming soon - dashboards auto-populate once you start sending telemetry!
Key configuration options in .env:
# Grafana Credentials (CHANGE THESE!)
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=change-this-password
# Environment Label
ENVIRONMENT=development
# Data Retention
PROMETHEUS_RETENTION=15d # Metrics retention
LOKI_RETENTION_HOURS=744 # 31 days
TEMPO_RETENTION_HOURS=720 # 30 days
# Resource Limits
OTEL_MEMORY_LIMIT=512M
PROMETHEUS_MEMORY_LIMIT=2GSee .env.example for all available options.
Increase retention for compliance:
# Edit .env
PROMETHEUS_RETENTION=90d
LOKI_RETENTION_HOURS=2160 # 90 days
TEMPO_RETENTION_HOURS=2160 # 90 days
# Restart services
docker-compose -f docker/docker-compose.yml restartNote: Longer retention requires more disk space (approximately 3GB per 30 days per user).
AI model pricing changes frequently. Update costs in configs/pricing/pricing-table.yaml:
claude_models:
claude-opus-4:
input_price_per_mtok: 15.00 # Update this
output_price_per_mtok: 75.00 # And thisThe OpenTelemetry Collector automatically reloads changes within 60 seconds.
- OpenTelemetry Collector: configs/otel/collector-config.yaml
- Prometheus: configs/prometheus/prometheus.yml
- Loki: configs/loki/local-config.yaml
- Tempo: configs/tempo/tempo.yaml
- Grafana: configs/grafana/
The platform calculates costs in real-time using a hybrid approach:
- API-Provided Costs (preferred) - If the AI tool's API includes cost data, use it directly
- Calculated Costs (fallback) - Calculate from token counts using the pricing table
Cost Calculation Formula:
Total Cost = (Input Tokens / 1,000,000 Γ Input Price) +
(Output Tokens / 1,000,000 Γ Output Price)
Precision: $0.0001 (4 decimal places)
Current pricing as of January 2025:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Provider |
|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | Anthropic |
| Claude Sonnet 4 | $3.00 | $15.00 | Anthropic |
| Claude Haiku 3.5 | $0.80 | $4.00 | Anthropic |
| GPT-4 Turbo | $10.00 | $30.00 | OpenAI |
| GPT-4 | $30.00 | $60.00 | OpenAI |
| GPT-3.5 Turbo | $0.50 | $1.50 | OpenAI |
| Gemini Pro | $0.50 | $1.50 |
Plus 9 more models - See configs/pricing/pricing-table.yaml for complete pricing.
Configure automatic alerts when costs exceed thresholds:
| Level | User (per day) | Project (per day) | Organization (per day) |
|---|---|---|---|
| Warning | $50 | $200 | $1,000 |
| Critical | $100 | $500 | $2,500 |
Alerts appear in Grafana dashboards and can trigger notifications (email, Slack, PagerDuty).
The platform automatically suggests optimizations:
- Model downgrade - "Switch from GPT-4 to GPT-3.5 for simple tasks (10x cost reduction)"
- Wasteful patterns - "User retries failed requests 50+ times/day"
- Inefficient usage - "Project generates 90% output tokens in error responses"
Choose the deployment that fits your needs:
| Feature | Docker Compose | AWS ECS | Kubernetes |
|---|---|---|---|
| Setup Time | 5 minutes | 30 minutes | 45 minutes |
| Best For | Local dev, single user | Production, small-medium teams | Enterprise, large teams |
| High Availability | β No | β Yes | β Yes |
| Auto-Scaling | β No | β Yes | β Yes |
| Cost | Free (local) | ~$100-300/month | ~$150-500/month |
| Maintenance | Low | Medium | Medium-High |
| Multi-Region | β No | β Yes | |
| Load Balancing | β No | β Built-in | β Built-in |
| Persistent Storage | Local volumes | EBS + S3 | PV + S3 |
| Backup/Recovery | Manual | Automated | Automated |
Pros:
- Fastest setup (5 minutes)
- No cloud costs
- Perfect for testing and development
- Easy to modify and experiment
Cons:
- No high availability
- Single machine only
- Manual backups
- Not suitable for production
Quick Start: See Quick Start section above.
Pros:
- Production-ready with high availability
- Auto-scaling based on load
- Managed infrastructure (less maintenance)
- AWS-native security and monitoring
- Cost-effective for small-medium teams
Cons:
- AWS-specific (vendor lock-in)
- Requires AWS account and permissions
- Monthly costs (~$100-300)
Quick Start:
# Coming in Phase 2!
cd infrastructure/ecs
terraform init
terraform apply
# Outputs: Load balancer URL and endpointsDocumentation: Coming soon in docs/deployment/aws-ecs.md
Pros:
- Cloud-agnostic (runs on AWS, GCP, Azure, on-prem)
- Best scalability and flexibility
- Advanced features (service mesh, CRDs, operators)
- Industry standard for large deployments
Cons:
- Most complex setup
- Requires Kubernetes expertise
- Higher infrastructure costs
- More maintenance overhead
Quick Start:
# Coming in Phase 3!
cd infrastructure/kubernetes
terraform init
terraform apply
# Or use Helm charts
helm install ai-insight ./helm-chartDocumentation: Coming soon in docs/deployment/kubernetes.md
Comprehensive documentation organized by use case:
- Quick Start Guide - Get running in 5 minutes
- Installation & Setup - Detailed installation instructions
- Architecture Overview - Understand the system design
- Claude Code Configuration - Configure Claude Code telemetry
- OpenAI Wrapper Setup - Integrate OpenAI API
- Generic OTLP SDK - Add any AI tool
- Environment Variables - All configuration options
- Docker Compose - Local development setup
- AWS ECS Guide - Coming in Phase 2
- Kubernetes Guide - Coming in Phase 3
- Dashboard Guide - Understand each dashboard
- Cost Tracking - Optimize AI spending
- Troubleshooting - Common issues and solutions
- Maintenance & Backups - Keep your system healthy
- Contributing Guide - Add features and fix bugs
- Architecture Deep Dive - Detailed system design
- API Reference - Integration APIs
- SDK Development - Build custom integrations
Current Version: 1.0.0 (Production Ready) Last Updated: January 7, 2025 Overall Progress: 38% Complete
β Core Infrastructure
- Multi-tool OpenTelemetry collector configuration
- Prometheus, Loki, Tempo backends
- Grafana with auto-provisioned data sources
- Docker Compose deployment
β Cost Tracking
- Comprehensive pricing table (15+ AI models)
- Real-time cost calculation
- Budget alerts and thresholds
β Privacy & Security
- GDPR-compliant user ID hashing
- PII removal from telemetry
- Secure default configurations
β Integration SDKs
- Python OTLP SDK with examples
- Node.js OTLP SDK with TypeScript
- OpenAI API wrapper (drop-in replacement)
β Dashboards
- AI Development Overview
- Tool Activity & Usage
- Sessions & Users
- Token & Cost Analysis
π§ Dashboard Enhancements
- Performance Deep Dive dashboard
- Error Tracking dashboard
- Multi-Tool Comparison dashboard
- Resource Usage dashboard
π§ Documentation
- Deployment guides (AWS ECS, Kubernetes)
- Advanced configuration tutorials
- Video walkthroughs
Phase 2 (Q1 2025) - Advanced Dashboards & Cloud Deployments
- Complete remaining 4 dashboards
- AWS ECS Terraform deployment
- Kubernetes Helm charts
- Alert rule templates
Phase 3 (Q2 2025) - Extended Tool Support
- GitHub Copilot VS Code extension
- AWS CodeWhisperer integration
- Cursor AI native support
- Multi-region deployments
Phase 4 (Q3 2025) - Enterprise Features
- SSO/SAML authentication
- Multi-tenancy support
- Advanced RBAC
- Audit logging
Phase 5 (Q4 2025) - SaaS Platform
- Hosted offering (optional)
- Advanced analytics and ML insights
- Benchmark reports
- API rate limit management
See PROGRESS.md for detailed task tracking.
We welcome contributions! This is an open-source project under the MIT license.
-
Fork the repository
git fork git@github.com:PimpMyNines/AI-Development-Insight-Stack.git
-
Create a feature branch
git checkout -b feature/amazing-new-feature
-
Make your changes
- Add new AI tool integrations
- Improve dashboards
- Fix bugs
- Enhance documentation
-
Test thoroughly
# Start the stack docker-compose -f docker/docker-compose.yml up -d # Run your tests # Verify dashboards still work
-
Submit a pull request
- Describe what you changed and why
- Include screenshots for UI changes
- Reference any related issues
- Tool Integrations - Add support for more AI tools
- Dashboard Improvements - Enhance existing dashboards or create new ones
- Cloud Deployments - Help with AWS ECS and Kubernetes configurations
- Documentation - Tutorials, guides, and video content
- Testing - Add automated tests and CI/CD pipelines
# Clone the repo
git clone git@github.com:PimpMyNines/AI-Development-Insight-Stack.git
cd AI-Development-Insight-Stack
# Start development environment
docker-compose -f docker/docker-compose.yml up -d
# Make changes to configs
vim configs/otel/collector-config.yaml
# Reload configuration
docker-compose -f docker/docker-compose.yml restart otel-collector
# Test your changes
# Send test telemetry, check dashboards- Be respectful and inclusive
- Provide constructive feedback
- Focus on what's best for the project
- Help newcomers get started
This project is licensed under the MIT License - see the LICENSE file for details.
What this means:
- β Commercial use allowed
- β Modification allowed
- β Distribution allowed
- β Private use allowed
β οΈ Liability and warranty limitations
This project is built on the shoulders of giants. Special thanks to:
- OpenTelemetry - For creating the vendor-neutral observability standard
- Prometheus - For reliable time-series metrics storage
- Grafana Labs - For Grafana, Loki, and Tempo
- Anthropic - For Claude and native OTLP support
- The Open Source Community - For endless inspiration and support
- OpenTelemetry Collector
- Prometheus
- Grafana Loki
- Grafana Tempo
- Grafana
- Docker
- Terraform
- Python & Node.js
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/
- Quick Start in 5 Minutes
- Troubleshooting Common Issues
- Dashboard Guide
- Contributing Guidelines
- Project Roadmap
Made with β€οΈ by developers, for developers
Start monitoring your AI development tools in 5 minutes