Zero-glue observability for FastAPI.
fastapi-observer gives you structured JSON logs, request correlation, Prometheus metrics, OpenTelemetry tracing, security redaction presets, and runtime controls in one install step and one function call.
Supported Python versions: 3.10 to 3.14
| Component | Supported / Tested |
|---|---|
| Python | 3.10 to 3.14 (CI matrix) |
| FastAPI | >=0.129.0 |
| Starlette | >=0.52.1 |
| pydantic-settings | >=2.10.1 |
| Prometheus backend | prometheus-client>=0.24.1 (optional extra) |
| OpenTelemetry | opentelemetry-api/sdk/exporter>=1.39.1 (optional extra) |
| Loguru bridge | loguru>=0.7.2 (optional extra) |
Most FastAPI services eventually need the same observability plumbing:
- Structured JSON logging
- Request and trace correlation
- Metrics for dashboards and alerts
- OpenTelemetry setup
- Redaction/sanitization for sensitive data
- Runtime controls for incident response
Teams usually implement this as custom glue code in every service. That costs engineering time and creates drift between services.
fastapi-observer replaces this repeated wiring with a consistent, secure-by-default setup.
If this library saves you engineering time, you can support maintenance here:
After one call to install_observability():
| Capability | Included | Default |
|---|---|---|
| Structured JSON logs | Yes | Enabled |
| Request ID correlation | Yes | Enabled |
| Trace/span IDs in logs | Yes (with OTel) | Off until OTel enabled |
Prometheus /metrics |
Yes | Off until metrics_enabled=True |
| Sensitive-data redaction | Yes | Enabled |
Security presets (strict, pci, gdpr) |
Yes | Available |
| Runtime control endpoint | Yes | Off until enabled |
| Plugin hooks for enrichment/hooks | Yes | Available |
# Core (logging + metrics + security)
pip install fastapi-observer
# Prometheus metrics support
pip install "fastapi-observer[prometheus]"
# Loguru coexistence bridge support
pip install "fastapi-observer[loguru]"
# OpenTelemetry tracing/logs support
pip install "fastapi-observer[otel]"
# Everything
pip install "fastapi-observer[all]"Import path:
import fastapiobserverfrom fastapi import FastAPI
from fastapiobserver import ObservabilitySettings, install_observability
app = FastAPI()
settings = ObservabilitySettings(
app_name="orders-api",
service="orders",
environment="production",
version="0.1.0",
metrics_enabled=True,
)
install_observability(app, settings)
@app.get("/orders/{order_id}")
def get_order(order_id: int) -> dict[str, int]:
return {"order_id": order_id}Run:
uvicorn main:app --reloadNow you have:
- Structured request logs on every request
- Request ID propagation
- Sanitized event payloads
- Prometheus metrics at
/metrics
| Protection | Default | Why |
|---|---|---|
| Body logging | OFF |
Avoid leaking request/response secrets |
| Sensitive key masking | ON |
Protect fields like password, token, secret |
| Sensitive header masking | ON |
Protect authorization, cookie, x-api-key |
| Query string in logged path | Excluded | Prevent accidental token leakage |
| Request ID trust boundary | Trusted CIDRs only | Prevent spoofed correlation IDs |
from fastapiobserver import SecurityPolicy
# Strictest option: drop sensitive values and keep minimal safe headers
strict_policy = SecurityPolicy.from_preset("strict")
# PCI-focused redaction fields
pci_policy = SecurityPolicy.from_preset("pci")
# GDPR-focused hashed PII fields
gdpr_policy = SecurityPolicy.from_preset("gdpr")Use a preset in installation:
install_observability(app, settings, security_policy=SecurityPolicy.from_preset("pci"))If your compliance model is "log only approved fields", use allowlists:
from fastapiobserver import SecurityPolicy
policy = SecurityPolicy(
header_allowlist=("x-request-id", "content-type", "user-agent"),
event_key_allowlist=("method", "path", "status_code"),
)policy = SecurityPolicy(
log_request_body=True,
body_capture_media_types=("application/json",),
)Use runtime controls when you need higher log verbosity or different trace sampling during an incident.
export OBSERVABILITY_CONTROL_TOKEN="replace-me"from fastapiobserver import RuntimeControlSettings, install_observability
runtime_control = RuntimeControlSettings(enabled=True)
install_observability(app, settings, runtime_control_settings=runtime_control)Inspect current runtime values:
curl -X GET http://localhost:8000/_observability/control \
-H "Authorization: Bearer replace-me"Update runtime values:
curl -X POST http://localhost:8000/_observability/control \
-H "Authorization: Bearer replace-me" \
-H "Content-Type: application/json" \
-d '{"log_level":"DEBUG","trace_sampling_ratio":0.25}'What changes immediately:
- Root logger level (and uvicorn loggers)
- Dynamic OTel trace sampling ratio
from fastapiobserver import (
OTelLogsSettings,
OTelMetricsSettings,
OTelSettings,
install_observability,
)
otel_settings = OTelSettings(
enabled=True,
service_name="orders-api",
service_version="2.0.0",
environment="production",
otlp_endpoint="http://localhost:4317",
protocol="grpc", # or "http/protobuf"
trace_sampling_ratio=1.0,
extra_resource_attributes={
"k8s.namespace": "prod",
"team": "backend",
},
)
otel_logs_settings = OTelLogsSettings(
enabled=True,
logs_mode="both", # "local_json", "otlp", or "both"
otlp_endpoint="http://localhost:4317",
protocol="grpc",
)
otel_metrics_settings = OTelMetricsSettings(
enabled=True,
otlp_endpoint="http://localhost:4317",
protocol="grpc", # or "http/protobuf"
export_interval_millis=60000,
)
install_observability(
app,
settings,
otel_settings=otel_settings,
otel_logs_settings=otel_logs_settings,
otel_metrics_settings=otel_metrics_settings,
)Design details:
- Reuses an externally configured tracer provider if one already exists.
- Injects trace IDs into application logs for log-trace correlation.
- Supports runtime sampling updates through the control plane.
- Sends OTel logs in OTLP mode with the same sanitization policy.
- Supports optional OTLP metrics export for unified OTel backends.
- Registers graceful shutdown hooks to flush provider buffers on app exit.
inject_trace_headers() uses OpenTelemetry propagation, so it forwards
traceparent, tracestate, and baggage when baggage is present in the active context.
from opentelemetry import baggage
from opentelemetry.context import attach, detach
from fastapiobserver import inject_trace_headers
token = attach(baggage.set_baggage("tenant_id", "acme"))
try:
headers = inject_trace_headers({})
# headers["baggage"] == "tenant_id=acme"
finally:
detach(token)- Structured logging pipeline (JSON formatter + bounded async queue handler).
- Metrics backend and
/metricsendpoint when metrics are enabled. - OTel tracing setup when OTel is enabled.
- Optional OTel logs/metrics setup when OTLP settings are enabled.
- Request logging middleware with sanitization and context cleanup.
- Runtime control endpoint when runtime control is enabled.
Request path lifecycle (high-level):
Request arrives
-> request ID / trace context resolved
-> app handler executes
-> response classified (ok/client_error/server_error/exception)
-> payload sanitized by policy
-> log emitted + metrics recorded
-> context cleared
The project is now organized as focused subpackages instead of large monolithic modules:
fastapiobserver/logging/: formatter, queueing, filters, setup lifecycle, sink circuit-breakers.fastapiobserver/middleware/: request logging orchestration, context, IP resolution, headers, body capture, metrics hooks.fastapiobserver/sinks/: sink protocol, registry/discovery, built-ins, factory wiring, Logtail + DLQ implementation.fastapiobserver/metrics/: backend contracts/registry/builder/endpoint, Prometheus integration subpackage.fastapiobserver/security/: policy/settings models, normalization helpers, redaction engine, trusted-proxy utilities.fastapiobserver/otel/: OTel settings/resource/tracing/logs/metrics/lifecycle helpers.
Public imports remain backward-compatible via package facades (__init__.py re-exports).
{
"timestamp": "2026-02-18T10:30:00.000000+00:00",
"level": "INFO",
"logger": "fastapiobserver.middleware",
"message": "request.completed",
"app_name": "orders-api",
"service": "orders",
"environment": "production",
"version": "0.1.0",
"log_schema_version": "1.0.0",
"library": "fastapiobserver",
"request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"event": {
"method": "GET",
"path": "/orders/42",
"status_code": 200,
"http.request.method": "GET",
"url.path": "/orders/42",
"http.response.status_code": 200,
"duration_ms": 3.456,
"client_ip": "10.0.0.1",
"error_type": "ok"
}
}On exception logs, a structured error object is included for indexed queries, featuring a stable AST-based fingerprint hash which ignores transient memory locations or exact line numbers, allowing zero-dependency alerting directly in your search backend.
{
"error": {
"type": "RuntimeError",
"message": "boom",
"stacktrace": "Traceback (most recent call last): ...",
"fingerprint": "a1b2c3d4e5f67890abcd12345678bbcc"
}
}This section is deployment-first. A new engineer should be able to ship this stack without reading the source code.
flowchart LR
A["FastAPI services (fastapi-observer)"] --> C["OTel Collector"]
C --> D["Tempo (traces)"]
C --> E["Loki (logs)"]
A --> F["Prometheus (/metrics scrape)"]
F --> G["Grafana"]
D --> G
E --> G
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
check_interval: 5s
batch:
send_batch_size: 512
timeout: 5s
exporters:
otlphttp/tempo:
endpoint: http://tempo:4318
otlphttp/loki:
endpoint: http://loki:3100/otlp
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlphttp/tempo]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlphttp/loki]- Baseline current service SLOs before migration (
latency,error rate,availability). - Enable
fastapi-observerin one service with conservative settings (no body capture). - Run canary rollout (5-10% traffic) and compare: latency p95, 5xx rate, and log/traces pipeline health.
- Expand rollout to all replicas/services after 24-48h stable canary.
- Enable advanced controls in phases: security presets, allowlists, runtime control plane, OTLP logs mode.
| Failure mode | Expected behavior | Immediate action |
|---|---|---|
| OTel Collector down | App still serves traffic; local logs still available if OTEL_LOGS_MODE=both |
Fail over Collector or temporarily switch to local-json mode |
| Tempo down | Traces unavailable; logs/metrics continue | Restore Tempo, keep incident correlation via logs |
| Loki down | Logs unavailable in Grafana; metrics/traces continue | Restore Loki, use app stdout logs temporarily |
| Prometheus down | No metrics/alerts; app traffic unaffected | Restore Prometheus and alertmanager path |
| High cardinality on paths | Prometheus pressure increases | Use route templates and exclude noisy paths |
| Spoofed forwarded headers | Incorrect client IP/request ID trust | Tighten OBS_TRUSTED_CIDRS and proxy chain config |
Recommended SLOs:
- Availability:
>= 99.9%over 30 days - p95 latency:
< 500msfor core APIs - 5xx rate:
< 1%per service - Error-budget burn alerting: fast burn (1h), slow burn (6h)
Starter alert queries:
# 5xx rate per service (5 minutes)
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service)
# p95 latency per service
histogram_quantile(
0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)
# Traffic drop detection
sum(rate(http_requests_total[5m])) by (service)
- Confirm blast radius in Grafana: affected services, status codes, latency shifts, deployment changes.
- Increase signal quality without restart: use runtime control plane to raise log level and tracing sample ratio.
- Identify dependency failures: check Collector, Loki, Tempo, Prometheus health and ingestion queues.
- Mitigate: roll back latest app change, scale affected service, or disable expensive capture options.
- Verify recovery: p95 + 5xx return to baseline, trace volume normalized, alert clears.
Use the bundled manifests:
kubectl kustomize --load-restrictor=LoadRestrictionsNone examples/k8s | kubectl apply -f -
kubectl -n observability rollout status deployment/app-a
kubectl -n observability rollout status deployment/app-b
kubectl -n observability rollout status deployment/app-c
kubectl -n observability rollout status deployment/otel-collector
kubectl -n observability rollout status deployment/prometheus
kubectl -n observability rollout status deployment/loki
kubectl -n observability rollout status deployment/tempo
kubectl -n observability rollout status deployment/grafana
kubectl -n observability rollout status deployment/traffic-generator
kubectl -n observability port-forward svc/grafana 3000:3000Open http://localhost:3000.
Full guide: kubernetes.md
fastapi-observer integrates natively with the core OpenTelemetry Python SDK, meaning you can aggressively tune its resource usage purely via standard environment variables without altering your application code.
For high-throughput services (e.g. 10k+ RPS), apply these exact variables to minimize the observer footprint:
Tracing 100% of requests is too expensive at scale. You should configure fastapi-observer to respect upstream trace flags, while only sampling a fraction of net-new requests:
# Keep the parent's sample decision if it exists, otherwise sample 5%
export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.05"Do not waste cycles generating spans for health checks or static assets. fastapi-observer will auto-derive metrics exclusions, but you can explicitly drop them from tracing at the C-extension level:
export OTEL_PYTHON_FASTAPI_EXCLUDED_URLS="healthz,metrics,favicon.ico"Prevent large, unmanageable spans from consuming excessive memory in the BatchSpanProcessor:
export OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT="128"
export OTEL_SPAN_EVENT_COUNT_LIMIT="128"
export OTEL_SPAN_LINK_COUNT_LIMIT="128"The default OpenTelemetry batch limits are too conservative for high-throughput ASGI microservices. Increase the max queue limits so spikes aren't dropped, but decrease the timeout so the process memory is flushed faster:
export OTEL_BSP_MAX_QUEUE_SIZE="10000"
export OTEL_BSP_MAX_EXPORT_BATCH_SIZE="5000"
export OTEL_BSP_SCHEDULE_DELAY="1000"The examples/ directory contains runnable demos:
| Example | What it shows |
|---|---|
basic_app.py |
Minimal setup and request logging |
security_presets_app.py |
Preset-based security policy |
allowlist_app.py |
Allowlist-only sanitization |
otel_app.py |
OTel tracing and resource attributes |
audit_app.py |
Tamper-evident cryptographic log signatures |
db_tracing_app.py |
SQLCommenter and database trace injection |
graphql_app.py |
Native Strawberry GraphQL observability |
benchmarks/ |
Baseline vs observer benchmark harness |
k8s/ |
Kubernetes-native stack with Prometheus + Loki + Tempo + Grafana |
full_stack/ |
Docker Compose stack: 3 FastAPI services + Grafana + Prometheus + Loki + Tempo |
Run an example:
uvicorn examples.basic_app:app --reloadFrom examples/full_stack, these are real Grafana views generated by fastapi-observer telemetry:
Overview panels (latency heatmap, route throughput, errors, CPU/memory):
Percentiles, request rate, and structured JSON logs in Loki:
The library supports configuration from code and env vars. Below are the most relevant env vars by area.
| Variable | Default | Description |
|---|---|---|
APP_NAME |
app |
Namespace for app-level identity |
SERVICE_NAME |
api |
Service label for logs/metrics |
ENVIRONMENT |
development |
Environment label |
APP_VERSION |
0.0.0 |
Service version |
LOG_LEVEL |
INFO |
Root log level |
LOG_DIR |
- | Optional file log directory |
LOG_QUEUE_MAX_SIZE |
10000 |
Max in-memory records in core log queue |
LOG_QUEUE_OVERFLOW_POLICY |
drop_oldest |
Queue overflow behavior: drop_oldest, drop_newest, block |
LOG_QUEUE_BLOCK_TIMEOUT_SECONDS |
1.0 |
Timeout used by block policy before dropping newest |
LOG_SINK_CIRCUIT_BREAKER_ENABLED |
true |
Enable sink circuit-breaker protection |
LOG_SINK_CIRCUIT_BREAKER_FAILURE_THRESHOLD |
5 |
Consecutive sink failures before opening circuit |
LOG_SINK_CIRCUIT_BREAKER_RECOVERY_TIMEOUT_SECONDS |
30.0 |
Open-state cooldown before half-open probe |
REQUEST_ID_HEADER |
x-request-id |
Incoming request ID header |
RESPONSE_REQUEST_ID_HEADER |
x-request-id |
Response request ID header |
| Variable | Default | Description |
|---|---|---|
METRICS_ENABLED |
false |
Enable metrics backend |
METRICS_BACKEND |
prometheus |
Registered backend name used by install_observability() |
METRICS_PATH |
/metrics |
Metrics endpoint path |
METRICS_EXCLUDE_PATHS |
/metrics,/health,/healthz,/docs,/openapi.json |
Skip metrics for noisy endpoints |
METRICS_EXEMPLARS_ENABLED |
false |
Enable exemplars where supported |
METRICS_FORMAT |
negotiate |
prometheus, openmetrics, or negotiate |
Caution
The /metrics endpoint is unauthenticated by default. In production it should be restricted to internal networks (e.g. behind a Kubernetes NetworkPolicy, VPC security group, or ingress rule that only allows your Prometheus scraper). Exposing it publicly leaks service topology, error rates, and request patterns.
| Variable | Default | Description |
|---|---|---|
OBS_REDACTION_PRESET |
- | strict, pci, gdpr |
OBS_REDACTED_FIELDS |
built-in list | CSV keys to redact |
OBS_REDACTED_HEADERS |
built-in list | CSV headers to redact |
OBS_REDACTION_MODE |
mask |
mask, hash, drop |
OBS_MASK_TEXT |
*** |
Mask replacement text |
OBS_LOG_REQUEST_BODY |
false |
Enable request body logging |
OBS_LOG_RESPONSE_BODY |
false |
Enable response body logging |
OBS_MAX_BODY_LENGTH |
256 |
Max captured body bytes |
OBS_HEADER_ALLOWLIST |
- | CSV headers allowed in logs |
OBS_EVENT_KEY_ALLOWLIST |
- | CSV event keys allowed in logs |
OBS_BODY_CAPTURE_MEDIA_TYPES |
- | CSV allowed media types for body capture |
OBS_TRUSTED_PROXY_ENABLED |
true |
Enable trusted-proxy policy |
OBS_TRUSTED_CIDRS |
RFC1918 + loopback | CSV trusted CIDRs |
OBS_HONOR_FORWARDED_HEADERS |
false |
Trust forwarded headers |
Notes:
OBS_HEADER_ALLOWLIST,OBS_EVENT_KEY_ALLOWLIST, andOBS_BODY_CAPTURE_MEDIA_TYPESacceptnone,null, orunsetto clear values.
| Variable | Default | Description |
|---|---|---|
OTEL_ENABLED |
false |
Enable tracing instrumentation |
OTEL_SERVICE_NAME |
SERVICE_NAME |
OTel service name override |
OTEL_SERVICE_VERSION |
APP_VERSION |
OTel service version override |
OTEL_ENVIRONMENT |
ENVIRONMENT |
OTel environment override |
OTEL_EXPORTER_OTLP_ENDPOINT |
- | OTLP endpoint |
OTEL_EXPORTER_OTLP_PROTOCOL |
grpc |
grpc or http/protobuf |
OTEL_TRACE_SAMPLING_RATIO |
1.0 |
Initial trace sampling ratio |
OTEL_EXTRA_RESOURCE_ATTRIBUTES |
- | CSV key=value pairs |
OTEL_EXCLUDED_URLS |
auto-derived | CSV excluded paths for tracing |
OTEL_LOGS_ENABLED |
false |
Enable OTLP log export |
OTEL_LOGS_MODE |
local_json |
local_json, otlp, both |
OTEL_LOGS_ENDPOINT |
- | OTLP logs endpoint |
OTEL_LOGS_PROTOCOL |
grpc |
grpc or http/protobuf |
OTEL_METRICS_ENABLED |
false |
Enable OTLP metrics export |
OTEL_METRICS_ENDPOINT |
- | OTLP metrics endpoint |
OTEL_METRICS_PROTOCOL |
grpc |
grpc or http/protobuf |
OTEL_METRICS_EXPORT_INTERVAL_MILLIS |
60000 |
OTLP metrics export interval in milliseconds |
| Variable | Default | Description |
|---|---|---|
OBS_RUNTIME_CONTROL_ENABLED |
false |
Enable runtime control endpoint |
OBS_RUNTIME_CONTROL_PATH |
/_observability/control |
Control endpoint path |
OBS_RUNTIME_CONTROL_TOKEN_ENV_VAR |
OBSERVABILITY_CONTROL_TOKEN |
Name of env var containing bearer token |
OBSERVABILITY_CONTROL_TOKEN |
- | Bearer token value used for auth |
| Variable | Default | Description |
|---|---|---|
LOGTAIL_ENABLED |
false |
Enable Better Stack Logtail sink |
LOGTAIL_SOURCE_TOKEN |
- | Logtail source token |
LOGTAIL_BATCH_SIZE |
50 |
Batch size for shipping |
LOGTAIL_FLUSH_INTERVAL |
2.0 |
Flush interval (seconds) |
LOGTAIL_DLQ_ENABLED |
false |
Enable resilient local disk fallback for dropped logs |
LOGTAIL_DLQ_DIR |
.dlq/logtail |
Directory to archive dropped NDJSON messages |
LOGTAIL_DLQ_MAX_BYTES |
52428800 |
Max bytes per DLQ file before rotation (50MB) |
LOGTAIL_DLQ_COMPRESS |
true |
GZIP compress rotated DLQ files |
Tip
The Logtail Dead Letter Queue (DLQ) provides best-effort local durability. If the internal memory queue overflows under immense API pressure (queue.Full), or if an external network partition completely exhausts the outbound HTTP retry backoff, the dropped log payloads are immediately salvaged into local NDJSON envelopes. You can replay these files to BetterStack later using the provided scripts/replay_dlq.py utility.
For regulated industries (Fintech, Healthcare, SOC 2) where you must prove logs were not altered, deleted, or reordered:
pip install "fastapi-observer[audit]"
export OBS_AUDIT_SECRET_KEY="your-signing-secret"
export OBS_AUDIT_LOGGING_ENABLED="true"install_observability(app, settings)
# Every JSON log now contains _audit_seq and _audit_sig fieldsEach log record is chained via HMAC-SHA256: record B's signature includes the signature of record A. Breaking any link (tamper, delete, reorder) invalidates the chain.
| Variable | Default | Description |
|---|---|---|
OBS_AUDIT_LOGGING_ENABLED |
false |
Enable HMAC-SHA256 hash chain |
OBS_AUDIT_KEY_ENV_VAR |
OBS_AUDIT_SECRET_KEY |
Name of env var containing the signing key |
OBS_AUDIT_SECRET_KEY |
- | The HMAC signing key (read by LocalHMACProvider) |
Custom key provider (e.g. KMS / Vault):
from fastapiobserver import AuditKeyProvider
class VaultKeyProvider:
def get_key(self) -> bytes:
return vault_client.get_secret("audit-signing-key").encode()
install_observability(app, settings, audit_key_provider=VaultKeyProvider())Verify logs with the CLI:
export OBS_AUDIT_SECRET_KEY="your-signing-secret"
python scripts/verify_audit_chain.py exported_logs.ndjson
# PASS — 1042 records verified, chain intact.Bridges the gap between FastAPI application traces and database performance monitoring. When enabled, the trace_id is automatically injected into raw SQL queries as a comment:
SELECT * FROM users /*traceparent='00-abc123def456-01',route='/api/users'*/This lets DBAs correlate slow Postgres/MySQL queries directly back to the originating HTTP request.
pip install "fastapi-observer[otel-sqlalchemy]"Automatic (via install_observability):
from sqlalchemy import create_engine
engine = create_engine("postgresql://...")
install_observability(app, settings, otel_settings=otel_settings, db_engine=engine)Multiple engines (e.g. read/write replicas):
write_engine = create_engine("postgresql://primary/...")
read_engine = create_engine("postgresql://replica/...")
install_observability(
app, settings,
otel_settings=otel_settings,
db_engine=[write_engine, read_engine],
db_commenter_options={"opentelemetry_values": True, "route": False},
)Manual (standalone):
from fastapiobserver import instrument_sqlalchemy, instrument_sqlalchemy_async
# Sync engine
instrument_sqlalchemy(engine)
# Async engine (extracts .sync_engine automatically)
instrument_sqlalchemy_async(async_engine)Custom commenter options:
instrument_sqlalchemy(engine, commenter_options={
"opentelemetry_values": True, # traceparent (default: True)
"db_driver": True, # e.g. psycopg2 (default: True)
"route": True, # e.g. /api/users (default: True)
"db_framework": True, # e.g. sqlalchemy (default: False)
})Caution
Database statement PII risk: The OpenTelemetry SQLAlchemy instrumentor captures raw SQL as the db.statement span attribute and exports it directly to your tracing backend (Jaeger, Tempo, Datadog). These span attributes are not scrubbed by fastapi-observer's SecurityPolicy. If your application has poorly parameterized queries (e.g. WHERE email = 'user@example.com'), PII will leak into your trace storage. Always use parameterized queries and review your db.statement exports in production.
If body capture is enabled, install observability before other middleware:
from fastapi.middleware.cors import CORSMiddleware
from fastapiobserver import SecurityPolicy, install_observability
install_observability(app, settings, security_policy=SecurityPolicy(log_request_body=True))
app.add_middleware(CORSMiddleware, allow_origins=["*"])Warning
--preload-app is dangerous if visibility is initialized at the module level.
When --preload-app is used, the application is loaded into the master process before forking the worker processes. fastapi-observer uses a highly-performant background thread (QueueListener) for non-blocking logging, and OpenTelemetry similarly spawns background export threads.
Threads do not survive a process fork. If you call install_observability() at the module level (e.g., right under app = FastAPI()), the background threads will be created in the master process, and all workers will silently drop logs because their logging threads never started.
How to fix: Either remove --preload-app, OR initialize observability inside the FastAPI lifespan context manager so it starts safely after the fork inside each worker:
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapiobserver import ObservabilitySettings, install_observability
settings = ObservabilitySettings(service="api")
@asynccontextmanager
async def lifespan(app: FastAPI):
install_observability(app, settings) # Safely initializes per-worker
yield
# fastapiobserver automatically registers shutdown hooks so no teardown needed here
app = FastAPI(lifespan=lifespan)If you are using Prometheus with multiple Gunicorn workers, you must configure a shared metrics directory:
export PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus-metrics
rm -rf "$PROMETHEUS_MULTIPROC_DIR"
mkdir -p "$PROMETHEUS_MULTIPROC_DIR"gunicorn.conf.py:
from fastapiobserver import mark_prometheus_process_dead
def child_exit(server, worker):
mark_prometheus_process_dead(worker.pid)Use queue controls to define behavior under sustained log pressure:
settings = ObservabilitySettings(
app_name="orders-api",
service="orders",
environment="production",
log_queue_max_size=20000,
log_queue_overflow_policy="drop_oldest", # or "drop_newest" / "block"
log_queue_block_timeout_seconds=0.5,
)Queue pressure metrics exposed on /metrics (Prometheus mode):
fastapiobserver_log_queue_sizefastapiobserver_log_queue_capacityfastapiobserver_log_queue_enqueued_totalfastapiobserver_log_queue_dropped_total{reason="drop_oldest|drop_newest"}fastapiobserver_log_queue_blocked_totalfastapiobserver_log_queue_block_timeouts_total
Every output sink is wrapped with a circuit breaker so a failing sink does not
degrade request-path logging. This includes custom sinks registered via the
LogSink protocol.
The core package stays intentionally lean; provider-specific sinks can be added
as optional packages without changing install_observability().
settings = ObservabilitySettings(
app_name="orders-api",
service="orders",
environment="production",
sink_circuit_breaker_enabled=True,
sink_circuit_breaker_failure_threshold=5,
sink_circuit_breaker_recovery_timeout_seconds=30.0,
)Breaker metrics exposed on /metrics:
fastapiobserver_sink_circuit_breaker_state_info{sink,state}fastapiobserver_sink_circuit_breaker_failures_total{sink}fastapiobserver_sink_circuit_breaker_skipped_total{sink}fastapiobserver_sink_circuit_breaker_opens_total{sink}fastapiobserver_sink_circuit_breaker_half_open_total{sink}fastapiobserver_sink_circuit_breaker_closes_total{sink}
install_observability() now registers graceful logging teardown on FastAPI
shutdown and also uses an atexit fallback. This reduces lost log records
during process termination.
If you embed logging setup outside FastAPI lifecycle management, you can stop the queue pipeline explicitly:
from fastapiobserver import shutdown_logging
shutdown_logging()If your service already uses loguru, forward those logs into
fastapi-observer instead of maintaining two independent pipelines.
from fastapiobserver import install_loguru_bridge
# loguru -> stdlib -> fastapi-observer queue/sinks
bridge_id = install_loguru_bridge()Detailed migration/coexistence guide: loguru.md
If you use strawberry-graphql, routing all traffic through POST /graphql severely blinds your logs and traces.
fastapi-observer ships a native Strawberry extension (via Duck Typing, meaning NO bloated pip dependencies) to automatically extract GraphQL operations.
import strawberry
from fastapiobserver.integrations.strawberry import StrawberryObservabilityExtension
@strawberry.type
class Query:
@strawberry.field
def hello(self) -> str:
return "world"
schema = strawberry.Schema(
query=Query,
extensions=[StrawberryObservabilityExtension], # Inject this!
)With this extension, your logs will automatically get a graphql context key containing the extracted operation_name:
{
"event": {
"method": "POST",
"path": "/graphql"
},
"user_context": {
"graphql": {
"operation_name": "GetUsersQuery"
}
}
}If OpenTelemetry is enabled, your traces will dynamically rename from POST /graphql to graphql.operation.GetUsersQuery.
Extend behavior without editing package internals:
from fastapiobserver import (
register_log_enricher,
register_log_filter,
register_metric_hook,
)
def add_git_sha(payload: dict) -> dict:
payload["git_sha"] = "abc123"
return payload
def drop_health_probe(record) -> bool:
return "health" not in record.getMessage().lower()
def track_slow_requests(request, response, duration):
if duration > 1.0:
print(f"slow request: {request.url.path} {duration:.2f}s")
register_log_enricher("git_sha", add_git_sha)
register_log_filter("drop_health_probe", drop_health_probe)
register_metric_hook("slow_requests", track_slow_requests)Plugin failures are isolated and do not crash request handling.
Use register_metrics_backend() to plug in non-Prometheus backends without
modifying core code:
from fastapiobserver import register_metrics_backend
class MyBackend:
def observe(self, method, path, status_code, duration_seconds):
...
def mount_endpoint(self, app, *, path="/metrics", metrics_format="negotiate"):
# Optional: mount a backend-specific endpoint
...
def build_my_backend(*, service: str, environment: str, exemplars_enabled: bool):
return MyBackend()
register_metrics_backend("my_backend", build_my_backend)StructuredJsonFormatter accepts injectable callables for enrichment and
sanitization, keeping defaults unchanged while improving testability:
formatter = StructuredJsonFormatter(
settings,
enrich_event=my_enricher,
sanitize_payload=my_sanitizer,
)Repository integration tests include:
tests/test_otel_log_correlation.py: verifies trace/span IDs in logs map to real spans.tests/test_otlp_export_integration.py: validates OTLP HTTP export with local collector fixtures.
Reproducible benchmark harness and methodology:
- Guide:
benchmarks.md - Apps:
examples/benchmarks/app.py - Runner:
examples/benchmarks/harness.py
0.1.x: secure-by-default core0.2.x: OTel interoperability, security presets, allowlists0.3.x: GraphQL observability, error fingerprinting, and Logtail DLQ durability0.4.x: package modularization, sink/registry hardening, and runtime control token rotation1.0.x: first stable release contract for production deployments1.2.0: tamper-evident audit logging and SQLAlchemy trace/commenter integration
Current release version: 1.2.0
Breaking changes must be listed under a Breaking Changes section in CHANGELOG.md.
Recommended release command (uses .env with PYPI_TOKEN):
scripts/deploy_pypi.sh --tag v1.2.0 --push-tagpython -m pip install --upgrade pip build
python -m buildpython -m pip install --upgrade twine
python -m twine upload --repository testpypi dist/*python -m pip install \
--extra-index-url https://test.pypi.org/simple/ \
fastapi-observerpython -m twine upload dist/*git config core.hooksPath .githooksThe pre-push hook runs:
uv run ruff checkuv run mypy srcuv run pytest -q
See NEXT_STEPS.md for the active roadmap and release checklist.

