Skip to content
arminrad edited this page Mar 16, 2026 · 2 revisions

Prometheus Setup

Time-series metrics database for monitoring and alerting


Overview

Prometheus collects metrics from your application at regular intervals and provides:

  • Time-series metrics database
  • Powerful query language (PromQL)
  • Alerting based on metric thresholds
  • Integration with Grafana for visualization

Metrics Endpoint: http://localhost:8000/metrics


Metrics Categories

1. HTTP Request Metrics

  • http_requests_total - Total HTTP requests by method, endpoint, status
  • http_request_duration_seconds - Request duration histogram

2. Model Inference Metrics

  • model_inference_requests_total - Inference requests by provider, model, status
  • model_inference_duration_seconds - Inference duration
  • tokens_used_total - Tokens consumed (input/output)
  • credits_used_total - Credits consumed

3. Database Metrics

  • database_queries_total - Total queries by table and operation
  • database_query_duration_seconds - Query duration

4. Cache Metrics

  • cache_hits_total - Cache hits by cache name
  • cache_misses_total - Cache misses
  • cache_size_bytes - Cache size

5. Rate Limiting Metrics

  • rate_limited_requests_total - Blocked requests
  • current_rate_limit - Rate limit remaining

6. Provider Health Metrics

  • provider_availability - Provider status (1=available, 0=unavailable)
  • provider_error_rate - Error rate (0-1)
  • provider_response_time_seconds - Response time

7. API Key Metrics

  • api_key_usage_total - API key usage by key and status
  • active_api_keys - Active/inactive counts

8. Business Metrics

  • user_credit_balance - User credit balance
  • trial_active - Active trials by status
  • subscription_count - Active subscriptions

9. System Metrics

  • active_connections - Active connections (db/redis/provider)
  • queue_size - Request queue size

Quick Setup

Local Development

1. Start Prometheus Stack

# Using Docker Compose
docker-compose -f docker-compose.prometheus.yml up -d

2. Verify Metrics Collection

  1. Open Prometheus: http://localhost:9090
  2. Go to StatusTargets
  3. Verify gatewayz-api target is "UP"
  4. Go to Graph tab and query:
    http_requests_total
    

3. Configure Grafana

  1. Open Grafana: http://localhost:3000
  2. Login (default: admin/admin)
  3. Add Prometheus datasource:
    • Navigate to ConfigurationData Sources
    • Click "Add data source"
    • Select "Prometheus"
    • URL: http://prometheus:9090
    • Click "Save & Test"

Railway Deployment

Deploy Grafana Stack

  1. Use Railway template: https://railway.com/deploy/8TLSQD
  2. Note internal URLs for Prometheus and Grafana
  3. Deploy Gatewayz API to Railway

Configure Prometheus Scraping

Update prometheus.yml in Railway:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'gatewayz-api'
    scrape_interval: 10s
    scrape_timeout: 5s
    static_configs:
      - targets: ['gatewayz-api:8000']  # Internal Railway URL
        labels:
          service: 'gatewayz-api'
          environment: 'production'
    metrics_path: '/metrics'

Verify Setup

  1. Access Prometheus in Railway
  2. Check StatusTargets
  3. Verify gatewayz-api is "UP"

Prometheus Queries

Request Rate

rate(http_requests_total[5m])

Error Rate

rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m])

Response Time (p95)

histogram_quantile(0.95, http_request_duration_seconds_bucket)

Tokens Used Today

increase(tokens_used_total[24h])

Credits Used by User (Top 10)

topk(10, increase(credits_used_total[24h]) by (user_id))

Cache Hit Rate

cache_hits_total / (cache_hits_total + cache_misses_total)

Provider Status

provider_availability

Grafana Dashboards

Pre-built Dashboard Panels

  1. Request Rate (requests/sec)
  2. Error Rate (%)
  3. Response Time (percentiles)
  4. Tokens Used (daily)
  5. Credits Used (daily)
  6. Cache Hit Rate (%)
  7. Provider Availability (gauge)
  8. Top Models by Usage
  9. Top Users by Requests
  10. Rate Limited Requests

Creating Custom Dashboards

  1. In Grafana, click "+" → Create → Dashboard
  2. Add Panel
  3. Use Prometheus queries above
  4. Configure visualization (graph, gauge, table)
  5. Save dashboard

Code Instrumentation

Adding Metrics to Code

from src.services.prometheus_metrics import (
    track_model_inference,
    record_tokens_used,
    record_credits_used,
    track_http_request,
)

# Track model inference
with track_model_inference("openrouter", "gpt-4"):
    response = await openrouter_client.create_completion(...)

# Record token usage
record_tokens_used(
    provider="openrouter",
    model="gpt-4",
    input_tokens=100,
    output_tokens=50
)

# Record credit usage
record_credits_used(
    provider="openrouter",
    model="gpt-4",
    user_id="user123",
    credits=1.5
)

Complete Example

@app.post("/v1/chat/completions")
async def chat_completions(request: Request, api_key: str = Depends(get_api_key)):
    with track_http_request("POST", "/v1/chat/completions"):
        # Track model inference
        with track_model_inference(request.model.split("/")[0], request.model):
            response = await client.create_completion(request)

        # Record tokens
        record_tokens_used(
            provider=request.model.split("/")[0],
            model=request.model,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
        )

        # Record credits
        credits = calculate_credits(pricing, response.usage)
        record_credits_used(
            provider=request.model.split("/")[0],
            model=request.model,
            user_id=api_key,
            credits=credits,
        )

        return response

Troubleshooting

Target Shows as "DOWN"

Symptom: Prometheus can't scrape application

Solution:

  1. Verify application is running
  2. Check URL is correct (localhost:8000 or gatewayz-api:8000)
  3. Test metrics endpoint:
    curl http://localhost:8000/metrics
  4. Check network connectivity

No Data in Grafana

Symptom: Dashboard shows "No data"

Solution:

  1. Verify Prometheus has collected metrics
  2. Check dashboard queries (metric names)
  3. Ensure datasource is configured correctly
  4. Wait 1-2 minutes for data collection

High Memory Usage

Symptom: Prometheus consuming excessive memory

Solution:

  1. Reduce retention: --storage.tsdb.retention.time=7d
  2. Increase scrape interval
  3. Use recording rules to pre-aggregate
  4. Remove high-cardinality labels

Slow Metrics Endpoint

Symptom: /metrics takes > 5 seconds

Solution:

  1. Check metric cardinality (unique series count)
  2. Remove high-cardinality labels (e.g., user_id)
  3. Use metric relabeling
  4. Consider remote storage

Performance Considerations

Metric Cardinality

High cardinality causes performance issues. Be cautious with:

  • credits_used_total (uses user_id label)
  • api_key_usage_total (uses api_key_id label)

Recommendation: Aggregate by higher-level labels in production.

Scrape Interval

Default: 15 seconds

Adjust based on needs:

  • Real-time monitoring: 5-10s
  • Long-term trends: 30-60s
  • Memory constrained: 30s+

Storage Retention

Default: 15 days

For longer retention:

command:
  - '--storage.tsdb.retention.time=90d'  # 90 days

Production Checklist

  • Configure persistent storage for Prometheus data
  • Set up alerting rules for critical metrics
  • Configure remote storage backup
  • Set up Grafana authentication
  • Create monitoring dashboards
  • Test scraping and metrics
  • Configure log aggregation (Loki)
  • Set up alert notifications (email, Slack, PagerDuty)
  • Document custom dashboards
  • Set metric retention policy

Integration

With OpenTelemetry

Prometheus metrics complement OpenTelemetry:

  • Prometheus: Simpler deployment, better aggregation
  • OTel: Distributed tracing, event logging

Both can coexist and be used together.

With Sentry

Use both for comprehensive monitoring:

  • Prometheus: Performance metrics
  • Sentry: Error tracking and debugging

Related Documentation


Reference Links


Last Updated: December 2024 Status: Production Ready


Related

Clone this wiki locally