# Metrics & Observability - Interactive Tutorial
# ==========================================
In this notebook, you'll learn about metrics and observability.

## ðŸ“š Learning Objectives

By the end of you will:
- Understand Prometheus metrics
- Implement counters, histograms, gauges
- Learn about OpenTelemetry tracing
- Track request latency
- Monitor error rates

## ðŸ”§ Prerequisites

Ensure you have installed:
- prometheus-client
- opentelemetry-api
- python 3.11+

## ðŸ“¦ Setup

Let's start by importing necessary libraries.


In [1]:
# Import required libraries
from prometheus_client import Counter, Histogram, Gauge, start_http_server, exposition
import time
from contextlib import contextmanager

# Print setup confirmation
print("âœ… Libraries imported successfully!")
print(f"   - Prometheus client: {Counter.__module__}")


âœ… Libraries imported successfully!
   - Prometheus client: prometheus_client.metrics


## 1. Prometheus Metrics

### 1.1 Counters

Counters track cumulative values (e.g., total requests).


In [2]:
# Define counters
REQUEST_COUNT = Counter(
    'api_requests_total',
    'API Total Requests',
    ['endpoint', 'method', 'status']
)

ERROR_COUNT = Counter(
    'api_errors_total',
    'API Total Errors',
    ['error_type']
)

# Increment counters
REQUEST_COUNT.labels(endpoint='graphql', method='POST', status='200').inc()
ERROR_COUNT.labels(error_type='validation').inc()

print("âœ… Counters defined!")
print(f"   - api_requests_total: {REQUEST_COUNT}")
print(f"   - api_errors_total: {ERROR_COUNT}")



âœ… Counters defined!
   - api_requests_total: counter:api_requests
   - api_errors_total: counter:api_errors


### 1.2 Histograms

Histograms track distributions of values (e.g., request latency).


In [3]:
# Define histogram
REQUEST_LATENCY = Histogram(
    'api_request_duration_seconds',
    'API Request Duration',
    buckets=[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 10.0],
)

# Observe latency
start = time.time()
time.sleep(0.1)
latency = time.time() - start
REQUEST_LATENCY.observe(latency)

print(f"âœ… Latency observed: {latency:.3f}s")



âœ… Latency observed: 0.101s


### 1.3 Gauges

Gauges track current values (e.g., active connections, queue size).


In [4]:
# Define gauges
ACTIVE_WEBHOOKS = Gauge(
    'webhooks_active',
    'Active Webhooks',
)

CACHE_HIT_RATE = Gauge(
    'cache_hit_rate',
    'Cache Hit Rate',
)

# Set gauge values
ACTIVE_WEBHOOKS.set(5)
CACHE_HIT_RATE.set(0.75)

print("âœ… Gauges defined and set!")
print(f"   - webhooks_active: {ACTIVE_WEBHOOKS}")
print(f"   - cache_hit_rate: {CACHE_HIT_RATE}")


âœ… Gauges defined and set!
   - webhooks_active: gauge:webhooks_active
   - cache_hit_rate: gauge:cache_hit_rate


## 2. Exposing Metrics

### 2.1 Metrics Endpoint

Prometheus provides a /metrics endpoint for scraping.


In [5]:
from prometheus_client import make_asgi_app
from fastapi import FastAPI

# Create sample app
app = FastAPI()

# Create metrics app
metrics_app = make_asgi_app()

# Mount metrics endpoint
app.mount("/metrics", metrics_app)

print("âœ… Metrics endpoint mounted at /metrics")
print(f"   Access at: http://localhost:8000/metrics")


âœ… Metrics endpoint mounted at /metrics
   Access at: http://localhost:8000/metrics


## 3. Practice Exercise

### Task: Create custom metrics

Create metrics for:
1. Document retrieval time
2. Embedding cache hit rate
3. LLM token usage
4. Reranking latency
5. Webhook delivery success rate

Implement these metrics below.


In [6]:
# Implemented: custom metrics

# Custom metrics for RAG system
RETRIEVAL_TIME = Histogram(
    "document_retrieval_seconds",
    "Document retrieval latency in seconds",
    buckets=[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0],
)

EMBEDDING_CACHE_HIT_RATE = Gauge(
    "embedding_cache_hit_rate",
    "Embedding cache hit rate (0-1)",
)

LLM_TOKEN_USAGE = Counter(
    "llm_tokens_total",
    "Total LLM tokens consumed",
    ["model", "type"],
)

RERANK_LATENCY = Histogram(
    "rerank_latency_seconds",
    "Reranking latency in seconds",
    buckets=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0],
)

WEBHOOK_DELIVERY_TOTAL = Counter(
    "webhook_delivery_total",
    "Webhook delivery attempts",
    ["status"],
)

# Example observations
RETRIEVAL_TIME.observe(0.12)
EMBEDDING_CACHE_HIT_RATE.set(0.82)
LLM_TOKEN_USAGE.labels(model="gpt-4.1", type="prompt").inc(512)
LLM_TOKEN_USAGE.labels(model="gpt-4.1", type="completion").inc(128)
RERANK_LATENCY.observe(0.03)
WEBHOOK_DELIVERY_TOTAL.labels(status="success").inc()

print("? Custom metrics defined successfully!")





? Custom metrics defined successfully!


## 4. Summary

In this notebook, you learned:

1. **Prometheus Metrics** - Counters, histograms, gauges
2. **Metric Types** - Counters (cumulative), Histograms (distribution), Gauges (current)
3. **Metrics Endpoint** - /metrics for Prometheus scraping
4. **Labeling** - Use labels for filtering (endpoint, method, status)
5. **Best Practices** - Track latency, errors, resource usage

### ðŸŽ¯ Key Takeaways

- Use counters for cumulative values (total requests)
- Use histograms for distributions (latency, response times)
- Use gauges for current values (active connections, queue size)
- Add labels for multi-dimensional metrics
- Expose /metrics endpoint for Prometheus
- Track business metrics (retrieval time, cache hit rate)

### ðŸš€ Next Steps

1. Implement custom metrics in production
2. Add Grafana dashboard configuration
3. Set up alerting rules
4. Monitor metrics in production

### ðŸ“š Further Reading

- [Prometheus Best Practices](https://prometheus.io/docs/practices/)
- [OpenTelemetry](https://opentelemetry.io/)
- [Metrics Types](https://prometheus.io/docs/concepts/metric_types/)
