-
Notifications
You must be signed in to change notification settings - Fork 1
Prometheus Setup
Time-series metrics database for monitoring and alerting
Prometheus collects metrics from your application at regular intervals and provides:
- Time-series metrics database
- Powerful query language (PromQL)
- Alerting based on metric thresholds
- Integration with Grafana for visualization
Metrics Endpoint: http://localhost:8000/metrics
-
http_requests_total- Total HTTP requests by method, endpoint, status -
http_request_duration_seconds- Request duration histogram
-
model_inference_requests_total- Inference requests by provider, model, status -
model_inference_duration_seconds- Inference duration -
tokens_used_total- Tokens consumed (input/output) -
credits_used_total- Credits consumed
-
database_queries_total- Total queries by table and operation -
database_query_duration_seconds- Query duration
-
cache_hits_total- Cache hits by cache name -
cache_misses_total- Cache misses -
cache_size_bytes- Cache size
-
rate_limited_requests_total- Blocked requests -
current_rate_limit- Rate limit remaining
-
provider_availability- Provider status (1=available, 0=unavailable) -
provider_error_rate- Error rate (0-1) -
provider_response_time_seconds- Response time
-
api_key_usage_total- API key usage by key and status -
active_api_keys- Active/inactive counts
-
user_credit_balance- User credit balance -
trial_active- Active trials by status -
subscription_count- Active subscriptions
-
active_connections- Active connections (db/redis/provider) -
queue_size- Request queue size
# Using Docker Compose
docker-compose -f docker-compose.prometheus.yml up -d- Open Prometheus: http://localhost:9090
- Go to Status → Targets
- Verify
gatewayz-apitarget is "UP" - Go to Graph tab and query:
http_requests_total
- Open Grafana: http://localhost:3000
- Login (default: admin/admin)
- Add Prometheus datasource:
- Navigate to Configuration → Data Sources
- Click "Add data source"
- Select "Prometheus"
- URL:
http://prometheus:9090 - Click "Save & Test"
- Use Railway template: https://railway.com/deploy/8TLSQD
- Note internal URLs for Prometheus and Grafana
- Deploy Gatewayz API to Railway
Update prometheus.yml in Railway:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'gatewayz-api'
scrape_interval: 10s
scrape_timeout: 5s
static_configs:
- targets: ['gatewayz-api:8000'] # Internal Railway URL
labels:
service: 'gatewayz-api'
environment: 'production'
metrics_path: '/metrics'- Access Prometheus in Railway
- Check Status → Targets
- Verify
gatewayz-apiis "UP"
rate(http_requests_total[5m])
rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m])
histogram_quantile(0.95, http_request_duration_seconds_bucket)
increase(tokens_used_total[24h])
topk(10, increase(credits_used_total[24h]) by (user_id))
cache_hits_total / (cache_hits_total + cache_misses_total)
provider_availability
- Request Rate (requests/sec)
- Error Rate (%)
- Response Time (percentiles)
- Tokens Used (daily)
- Credits Used (daily)
- Cache Hit Rate (%)
- Provider Availability (gauge)
- Top Models by Usage
- Top Users by Requests
- Rate Limited Requests
- In Grafana, click "+" → Create → Dashboard
- Add Panel
- Use Prometheus queries above
- Configure visualization (graph, gauge, table)
- Save dashboard
from src.services.prometheus_metrics import (
track_model_inference,
record_tokens_used,
record_credits_used,
track_http_request,
)
# Track model inference
with track_model_inference("openrouter", "gpt-4"):
response = await openrouter_client.create_completion(...)
# Record token usage
record_tokens_used(
provider="openrouter",
model="gpt-4",
input_tokens=100,
output_tokens=50
)
# Record credit usage
record_credits_used(
provider="openrouter",
model="gpt-4",
user_id="user123",
credits=1.5
)@app.post("/v1/chat/completions")
async def chat_completions(request: Request, api_key: str = Depends(get_api_key)):
with track_http_request("POST", "/v1/chat/completions"):
# Track model inference
with track_model_inference(request.model.split("/")[0], request.model):
response = await client.create_completion(request)
# Record tokens
record_tokens_used(
provider=request.model.split("/")[0],
model=request.model,
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
# Record credits
credits = calculate_credits(pricing, response.usage)
record_credits_used(
provider=request.model.split("/")[0],
model=request.model,
user_id=api_key,
credits=credits,
)
return responseSymptom: Prometheus can't scrape application
Solution:
- Verify application is running
- Check URL is correct (
localhost:8000orgatewayz-api:8000) - Test metrics endpoint:
curl http://localhost:8000/metrics
- Check network connectivity
Symptom: Dashboard shows "No data"
Solution:
- Verify Prometheus has collected metrics
- Check dashboard queries (metric names)
- Ensure datasource is configured correctly
- Wait 1-2 minutes for data collection
Symptom: Prometheus consuming excessive memory
Solution:
- Reduce retention:
--storage.tsdb.retention.time=7d - Increase scrape interval
- Use recording rules to pre-aggregate
- Remove high-cardinality labels
Symptom: /metrics takes > 5 seconds
Solution:
- Check metric cardinality (unique series count)
- Remove high-cardinality labels (e.g., user_id)
- Use metric relabeling
- Consider remote storage
High cardinality causes performance issues. Be cautious with:
-
credits_used_total(uses user_id label) -
api_key_usage_total(uses api_key_id label)
Recommendation: Aggregate by higher-level labels in production.
Default: 15 seconds
Adjust based on needs:
- Real-time monitoring: 5-10s
- Long-term trends: 30-60s
- Memory constrained: 30s+
Default: 15 days
For longer retention:
command:
- '--storage.tsdb.retention.time=90d' # 90 days- Configure persistent storage for Prometheus data
- Set up alerting rules for critical metrics
- Configure remote storage backup
- Set up Grafana authentication
- Create monitoring dashboards
- Test scraping and metrics
- Configure log aggregation (Loki)
- Set up alert notifications (email, Slack, PagerDuty)
- Document custom dashboards
- Set metric retention policy
Prometheus metrics complement OpenTelemetry:
- Prometheus: Simpler deployment, better aggregation
- OTel: Distributed tracing, event logging
Both can coexist and be used together.
Use both for comprehensive monitoring:
- Prometheus: Performance metrics
- Sentry: Error tracking and debugging
- Performance Monitoring - Detailed performance tracking
- Error Monitoring - Sentry integration
- Grafana Dashboards - Dashboard creation
Last Updated: December 2024 Status: Production Ready
- Monitoring-System — What gets monitored
- Performance-Monitoring — Latency and performance metrics
- Error-Monitoring — Error tracking and alerting
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References