# Advanced Monitoring & Observability

This notebook provides a guide to integrating the Self-Critique pipeline with a production monitoring stack, focusing on Prometheus, Grafana, and OpenTelemetry for deep system insights.

## Learning Objectives

- **Metrics Export**: Expose custom application metrics for Prometheus scraping.
- **Structured Logging**: Implement JSON-formatted logs for easier parsing and querying.
- **Distributed Tracing**: Use OpenTelemetry to trace requests across the pipeline stages.
- **Dashboarding**: Learn how to visualize key metrics in Grafana.
- **Alerting**: Define alert rules for proactive issue detection.

---


## Section 1: Structured Logging

Structured logs (e.g., in JSON format) are machine-readable and allow for powerful querying in log aggregation systems like Elasticsearch or Loki.


In [None]:
import logging
import json
from pythonjsonlogger import jsonlogger

# Get a logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Add a JSON formatter
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter('%(asctime)s %(name)s %(levelname)s %(message)s')
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)

# Example log
logger.info("Pipeline execution started", extra={
    'request_id': 'xyz-123',
    'model': 'claude-sonnet-4-20250514',
    'paper_length': 4500
})


## Section 2: Prometheus Metrics Export

We'll use the `prometheus-fastapi-instrumentator` library to automatically expose standard metrics and add our own custom ones.


In [None]:
from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
from prometheus_client import Counter

# This would be in your main api.py file
app = FastAPI()

# Add custom metrics
pipeline_runs_total = Counter(
    "pipeline_runs_total", 
    "Total number of pipeline runs",
    ['model_name']
)
xml_parsing_errors = Counter(
    "xml_parsing_errors_total",
    "Total number of XML parsing errors"
)

@app.get("/")
def read_root():
    # Increment custom metric
    pipeline_runs_total.labels(model_name='claude-sonnet-4-20250514').inc()
    return {"Hello": "World"}

# Instrument the app
Instrumentator().instrument(app).expose(app, endpoint="/metrics")

print("âœ“ FastAPI app instrumented. Metrics available at /metrics")
# To run this: uvicorn your_module:app


## Section 3: OpenTelemetry Instrumentation (Tracing)

Distributed tracing allows us to follow a single request as it passes through different stages of our pipeline, helping us pinpoint latency issues.


In [None]:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

# Set up the tracer provider
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

# Export traces to the console for demonstration
# In production, you'd use an exporter for Jaeger, Zipkin, or another backend
trace.get_tracer_provider().add_span_processor(
    SimpleSpanProcessor(ConsoleSpanExporter())
)

def run_pipeline_stage(stage_name: str, paper_text: str):
    with tracer.start_as_current_span(f"pipeline.{stage_name}") as span:
        span.set_attribute("paper.length", len(paper_text))
        print(f"Executing stage: {stage_name}")
        # Simulate work
        import time
        time.sleep(0.5)
        span.set_attribute("output.tokens", 512)
        print(f"Stage {stage_name} complete.")

# Trace a full pipeline execution
with tracer.start_as_current_span("pipeline.run") as parent_span:
    paper = "Attention is all you need..."
    parent_span.set_attribute("model", "claude-sonnet-4-20250514")
    
    run_pipeline_stage("summary", paper)
    run_pipeline_stage("critique", paper)
    run_pipeline_stage("revision", paper)
    
    print("Pipeline execution traced.")


## Section 4: Grafana Dashboard Definitions

Grafana dashboards are defined as JSON. Below is a conceptual example of a panel for a Grafana dashboard that visualizes P99 latency.

```json
{
  "title": "P99 Latency (ms)",
  "type": "timeseries",
  "targets": [
    {
      "expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))",
      "legendFormat": "P99 Latency"
    }
  ],
  "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
}
```

**Key Metrics for Dashboarding:**

- **Latency**: P50, P95, P99 request latency.
- **Throughput**: Requests per second (RPS).
- **Error Rate**: Percentage of 5xx server errors.
- **Token Counts**: Input and output tokens per model.
- **Quality Scores**: Average `overall_quality` score over time.
- **Cost**: Estimated cost based on token counts.


## Section 5: Alert Rule Definitions

Alerts are defined in Prometheus using a YAML file. Here's an example of an alert that fires when the API's error rate is too high.

```yaml
groups:
- name: api_alerts
  rules:
  - alert: HighErrorRate
    expr: (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) > 0.05
    for: 10m
    labels:
      severity: page
    annotations:
      summary: "High API Error Rate"
      description: "The API is returning 5xx errors for more than 5% of requests over the last 10 minutes."
```
