Prometheus Setup

Time-series metrics database for monitoring and alerting

Overview

Prometheus collects metrics from your application at regular intervals and provides:

Time-series metrics database
Powerful query language (PromQL)
Alerting based on metric thresholds
Integration with Grafana for visualization

Metrics Endpoint: http://localhost:8000/metrics

Metrics Categories

1. HTTP Request Metrics

http_requests_total - Total HTTP requests by method, endpoint, status
http_request_duration_seconds - Request duration histogram

2. Model Inference Metrics

model_inference_requests_total - Inference requests by provider, model, status
model_inference_duration_seconds - Inference duration
tokens_used_total - Tokens consumed (input/output)
credits_used_total - Credits consumed

3. Database Metrics

database_queries_total - Total queries by table and operation
database_query_duration_seconds - Query duration

4. Cache Metrics

cache_hits_total - Cache hits by cache name
cache_misses_total - Cache misses
cache_size_bytes - Cache size

5. Rate Limiting Metrics

rate_limited_requests_total - Blocked requests
current_rate_limit - Rate limit remaining

6. Provider Health Metrics

provider_availability - Provider status (1=available, 0=unavailable)
provider_error_rate - Error rate (0-1)
provider_response_time_seconds - Response time

7. API Key Metrics

api_key_usage_total - API key usage by key and status
active_api_keys - Active/inactive counts

8. Business Metrics

user_credit_balance - User credit balance
trial_active - Active trials by status
subscription_count - Active subscriptions

9. System Metrics

active_connections - Active connections (db/redis/provider)
queue_size - Request queue size

Quick Setup

Local Development

1. Start Prometheus Stack

# Using Docker Compose
docker-compose -f docker-compose.prometheus.yml up -d

2. Verify Metrics Collection

Open Prometheus: http://localhost:9090
Go to Status → Targets
Verify gatewayz-api target is "UP"
Go to Graph tab and query:
```
http_requests_total
```

3. Configure Grafana

Open Grafana: http://localhost:3000
Login (default: admin/admin)
Add Prometheus datasource:
- Navigate to Configuration → Data Sources
- Click "Add data source"
- Select "Prometheus"
- URL: http://prometheus:9090
- Click "Save & Test"

Railway Deployment

Deploy Grafana Stack

Use Railway template: https://railway.com/deploy/8TLSQD
Note internal URLs for Prometheus and Grafana
Deploy Gatewayz API to Railway

Configure Prometheus Scraping

Update prometheus.yml in Railway:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'gatewayz-api'
    scrape_interval: 10s
    scrape_timeout: 5s
    static_configs:
      - targets: ['gatewayz-api:8000']  # Internal Railway URL
        labels:
          service: 'gatewayz-api'
          environment: 'production'
    metrics_path: '/metrics'

Verify Setup

Access Prometheus in Railway
Check Status → Targets
Verify gatewayz-api is "UP"

Prometheus Queries

Request Rate

rate(http_requests_total[5m])

Error Rate

rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m])

Response Time (p95)

histogram_quantile(0.95, http_request_duration_seconds_bucket)

Tokens Used Today

increase(tokens_used_total[24h])

Credits Used by User (Top 10)

topk(10, increase(credits_used_total[24h]) by (user_id))

Cache Hit Rate

cache_hits_total / (cache_hits_total + cache_misses_total)

Provider Status

provider_availability

Grafana Dashboards

Pre-built Dashboard Panels

Request Rate (requests/sec)
Error Rate (%)
Response Time (percentiles)
Tokens Used (daily)
Credits Used (daily)
Cache Hit Rate (%)
Provider Availability (gauge)
Top Models by Usage
Top Users by Requests
Rate Limited Requests

Creating Custom Dashboards

In Grafana, click "+" → Create → Dashboard
Add Panel
Use Prometheus queries above
Configure visualization (graph, gauge, table)
Save dashboard

Code Instrumentation

Adding Metrics to Code

from src.services.prometheus_metrics import (
    track_model_inference,
    record_tokens_used,
    record_credits_used,
    track_http_request,
)

# Track model inference
with track_model_inference("openrouter", "gpt-4"):
    response = await openrouter_client.create_completion(...)

# Record token usage
record_tokens_used(
    provider="openrouter",
    model="gpt-4",
    input_tokens=100,
    output_tokens=50
)

# Record credit usage
record_credits_used(
    provider="openrouter",
    model="gpt-4",
    user_id="user123",
    credits=1.5
)

Complete Example

@app.post("/v1/chat/completions")
async def chat_completions(request: Request, api_key: str = Depends(get_api_key)):
    with track_http_request("POST", "/v1/chat/completions"):
        # Track model inference
        with track_model_inference(request.model.split("/")[0], request.model):
            response = await client.create_completion(request)

        # Record tokens
        record_tokens_used(
            provider=request.model.split("/")[0],
            model=request.model,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
        )

        # Record credits
        credits = calculate_credits(pricing, response.usage)
        record_credits_used(
            provider=request.model.split("/")[0],
            model=request.model,
            user_id=api_key,
            credits=credits,
        )

        return response

Troubleshooting

Target Shows as "DOWN"

Symptom: Prometheus can't scrape application

Solution:

Verify application is running
Check URL is correct (localhost:8000 or gatewayz-api:8000)
Test metrics endpoint:
```
curl http://localhost:8000/metrics
```
Check network connectivity

No Data in Grafana

Symptom: Dashboard shows "No data"

Solution:

Verify Prometheus has collected metrics
Check dashboard queries (metric names)
Ensure datasource is configured correctly
Wait 1-2 minutes for data collection

High Memory Usage

Symptom: Prometheus consuming excessive memory

Solution:

Reduce retention: --storage.tsdb.retention.time=7d
Increase scrape interval
Use recording rules to pre-aggregate
Remove high-cardinality labels

Slow Metrics Endpoint

Symptom: /metrics takes > 5 seconds

Solution:

Check metric cardinality (unique series count)
Remove high-cardinality labels (e.g., user_id)
Use metric relabeling
Consider remote storage

Performance Considerations

Metric Cardinality

High cardinality causes performance issues. Be cautious with:

credits_used_total (uses user_id label)
api_key_usage_total (uses api_key_id label)

Recommendation: Aggregate by higher-level labels in production.

Scrape Interval

Default: 15 seconds

Adjust based on needs:

Real-time monitoring: 5-10s
Long-term trends: 30-60s
Memory constrained: 30s+

Storage Retention

Default: 15 days

For longer retention:

command:
  - '--storage.tsdb.retention.time=90d'  # 90 days

Production Checklist

Integration

With OpenTelemetry

Prometheus metrics complement OpenTelemetry:

Prometheus: Simpler deployment, better aggregation
OTel: Distributed tracing, event logging

Both can coexist and be used together.

With Sentry

Use both for comprehensive monitoring:

Prometheus: Performance metrics
Sentry: Error tracking and debugging

Reference Links

Last Updated: December 2024 Status: Production Ready

Prometheus Setup

Prometheus Setup

Overview

Metrics Categories

1. HTTP Request Metrics

2. Model Inference Metrics

3. Database Metrics

4. Cache Metrics

5. Rate Limiting Metrics

6. Provider Health Metrics

7. API Key Metrics

8. Business Metrics

9. System Metrics

Quick Setup

Local Development

1. Start Prometheus Stack

2. Verify Metrics Collection

3. Configure Grafana

Railway Deployment

Deploy Grafana Stack

Configure Prometheus Scraping

Verify Setup

Prometheus Queries

Request Rate

Error Rate

Response Time (p95)

Tokens Used Today

Credits Used by User (Top 10)

Cache Hit Rate

Provider Status

Grafana Dashboards

Pre-built Dashboard Panels

Creating Custom Dashboards

Code Instrumentation

Adding Metrics to Code

Complete Example

Troubleshooting

Target Shows as "DOWN"

No Data in Grafana

High Memory Usage

Slow Metrics Endpoint

Performance Considerations

Metric Cardinality

Scrape Interval

Storage Retention

Production Checklist

Integration

With OpenTelemetry

With Sentry

Related Documentation

Reference Links

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!