# Part IV — Professional Django  
## 21. Logging, Monitoring, and Observability (Production‑Grade Visibility)

If you can’t see what your app is doing, you can’t operate it in production.

This chapter teaches industry-standard observability for Django apps:

- **Logging**: what happened (and why), with enough context to debug quickly
- **Monitoring/Metrics**: how often, how long, how many errors
- **Tracing** (conceptual + practical hooks): where time is spent across services
- **Error tracking**: capture exceptions with context (request id, user, endpoint)
- **Health checks**: tell load balancers and orchestrators if your app is alive/ready

We’ll implement a realistic baseline you can deploy and operate confidently.

---

## 21.0 Learning Outcomes

By the end you should be able to:

1. Configure Python/Django logging properly (not `print`).
2. Use structured, context-rich logs (request id, user id, path, method).
3. Add middleware that logs request/response summaries with timing.
4. Separate log levels:
   - DEBUG (dev)
   - INFO (normal ops)
   - WARNING/ERROR (problems)
5. Implement:
   - `/healthz` (liveness)
   - `/readyz` (readiness: DB/cache checks)
6. Understand metrics you should monitor:
   - request latency, error rate, throughput
   - DB query time, cache hit rate
7. Understand how error tracking tools work (Sentry concepts) and what to capture.
8. Avoid logging sensitive data (PII/secrets).

---

## 21.1 Observability: Definitions (So You Know What You’re Building)

### 21.1.1 Logging
A record of discrete events:
- “User 9 updated Task 123”
- “Request completed in 42ms”
- “Failed to send email: timeout”

### 21.1.2 Metrics
Numeric time-series measurements:
- request latency p95
- requests per second
- error rate (5xx)
- queue depth
- DB connections

### 21.1.3 Tracing
A “timeline of spans” across services:
- load balancer → Django → DB → external API → response
Often implemented with OpenTelemetry.

In this workbook we’ll focus on logging and readiness checks, and outline tracing.

---

## 21.2 Logging Principles (Industry Rules That Prevent Pain)

### 21.2.1 Log **events**, not code flow noise
Good:
- request started/finished (with duration)
- warnings (unexpected but handled)
- errors (exceptions)
- security-relevant events (permission denied, suspicious auth patterns)

Bad:
- logging every line, every loop iteration, every model save

### 21.2.2 Always include context
Minimum useful context in web apps:
- request id (correlate logs)
- method + path
- status code
- duration
- user id (if authenticated)
- org slug / resource id (if relevant)

### 21.2.3 Do not log sensitive data
Never log:
- passwords
- session cookies
- CSRF tokens
- Authorization headers
- full credit card details
- private keys

Be careful with:
- email addresses
- phone numbers
- full request bodies

Log “enough to debug,” not “everything.”

---

## 21.3 Configure Logging in Django (Clean, Practical Baseline)

Django logging is configured via the `LOGGING` setting (standard Python logging dictConfig).

### 21.3.1 A baseline logging config (console, structured-ish)
Add to `config/settings.py` (or prod settings module):

```python
import os

LOG_LEVEL = os.environ.get("DJANGO_LOG_LEVEL", "INFO")

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "simple": {
            "format": "%(levelname)s %(name)s %(message)s",
        },
        "verbose": {
            "format": (
                "%(asctime)s %(levelname)s %(name)s "
                "request_id=%(request_id)s "
                "%(message)s"
            ),
        },
    },
    "filters": {
        "request_id": {
            "()": "config.logging_filters.RequestIdLogFilter",
        },
    },
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "formatter": "verbose",
            "filters": ["request_id"],
        },
    },
    "root": {
        "handlers": ["console"],
        "level": LOG_LEVEL,
    },
}
```

#### What this does
- Logs go to stdout (console). This is standard for containers and modern hosting.
- Formatter includes `request_id` (we’ll supply it via a logging filter).
- Root logger catches logs unless overridden.

---

## 21.4 Inject Request ID into Logs (So Every Log Line is Correlated)

You already created a `RequestIdMiddleware` which sets `request.request_id`. Now we
need logs to include that id.

### 21.4.1 Create a logging filter that reads request id from thread-local storage
We need a way for logging to access request context anywhere, even deep in code.
A common approach is thread-local.

Create `config/request_context.py`:

```python
from __future__ import annotations

import threading
from typing import Optional

_local = threading.local()


def set_request_id(value: Optional[str]) -> None:
    _local.request_id = value


def get_request_id() -> str:
    return getattr(_local, "request_id", "-") or "-"
```

### 21.4.2 Update RequestIdMiddleware to set thread-local
Edit `config/middleware.py`:

```python
import uuid

from django.http import HttpRequest, HttpResponse

from config.request_context import set_request_id


class RequestIdMiddleware:
    header_name = "X-Request-Id"

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request: HttpRequest) -> HttpResponse:
        incoming = request.headers.get(self.header_name)
        request_id = incoming or uuid.uuid4().hex

        setattr(request, "request_id", request_id)
        set_request_id(request_id)

        try:
            response = self.get_response(request)
        finally:
            # Clear after response to avoid leaking context to other requests
            set_request_id(None)

        response[self.header_name] = request_id
        return response
```

### 21.4.3 Create the log filter
Create `config/logging_filters.py`:

```python
import logging

from config.request_context import get_request_id


class RequestIdLogFilter(logging.Filter):
    def filter(self, record: logging.LogRecord) -> bool:
        record.request_id = get_request_id()
        return True
```

Now all logs include `request_id=<id>`.

#### Why this is valuable
If a user reports:
- “I got error with request id abc123”
you can grep logs for `request_id=abc123` and see the full story.

---

## 21.5 Request/Response Logging Middleware (Operational Baseline)

We will log one line per request at INFO level with:
- method
- path
- status
- duration
- user id (if known)

Create `config/access_log_middleware.py`:

```python
import logging
import time

from django.http import HttpRequest, HttpResponse

logger = logging.getLogger("access")


class AccessLogMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request: HttpRequest) -> HttpResponse:
        start = time.perf_counter()
        response = self.get_response(request)
        duration_ms = (time.perf_counter() - start) * 1000

        user_id = getattr(getattr(request, "user", None), "id", None)
        logger.info(
            "method=%s path=%s status=%s duration_ms=%.2f user_id=%s",
            request.method,
            request.path,
            response.status_code,
            duration_ms,
            user_id,
        )
        return response
```

### 21.5.1 Register it
Add to `MIDDLEWARE` near the top (after RequestId so request_id exists):

```python
MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "config.middleware.RequestIdMiddleware",
    "config.access_log_middleware.AccessLogMiddleware",
    # ...
]
```

### 21.5.2 Configure logger name “access”
In `LOGGING`, add:

```python
"loggers": {
    "access": {
        "handlers": ["console"],
        "level": "INFO",
        "propagate": False,
    },
}
```

Now you get clean access logs without duplicating messages.

---

## 21.6 Logging in Your App Code (Correct Patterns)

### 21.6.1 Use per-module loggers
In any module:

```python
import logging

logger = logging.getLogger(__name__)
```

Then:

```python
logger.info("Published article id=%s slug=%s", article.id, article.slug)
```

### 21.6.2 Avoid f-strings in logging for performance
Prefer:

```python
logger.info("task_id=%s status=%s", task.id, task.status)
```

Because formatting is deferred until needed.

### 21.6.3 Log exceptions properly
Use `logger.exception(...)` inside `except` blocks:

```python
try:
    send_email()
except Exception:
    logger.exception("Failed to send email")
    raise
```

It logs stack trace automatically.

---

## 21.7 Health Checks: `/healthz` vs `/readyz` (Industry Standard)

### 21.7.1 Liveness: `/healthz`
Liveness means:
- process is running and can respond to HTTP

It should be:
- fast
- not dependent on external services

Your existing `/healthz` is fine.

### 21.7.2 Readiness: `/readyz`
Readiness means:
- app is able to serve real traffic
- dependencies are ready (DB, cache, etc.)

This is used by:
- Kubernetes readiness probes
- load balancers deciding to route traffic to a node

### 21.7.3 Implement `/readyz` with DB check
Add to `pages/views.py`:

```python
from django.db import connection
from django.http import JsonResponse
from django.views.decorators.http import require_GET


@require_GET
def readyz(request):
    try:
        with connection.cursor() as cursor:
            cursor.execute("SELECT 1")
            cursor.fetchone()
    except Exception:
        return JsonResponse({"status": "not_ready", "db": "down"}, status=503)

    return JsonResponse({"status": "ready", "db": "ok"})
```

Add URL:

```python
path("readyz/", views.readyz, name="readyz"),
```

#### Why return 503
503 (Service Unavailable) is a standard signal to load balancers:
- don’t route traffic here yet

### 21.7.4 Optional readiness: cache check
If you use Redis cache in production, check it too:

```python
from django.core.cache import cache

try:
    cache.set("readyz", "1", timeout=5)
    cache.get("readyz")
except Exception:
    return JsonResponse({"status": "not_ready", "cache": "down"}, status=503)
```

Don’t do expensive checks; readiness endpoints may be hit frequently.

---

## 21.8 Metrics: What to Monitor (Even If You Don’t Implement a Metrics System Yet)

At minimum, in production you should monitor:

### Web layer
- request count (throughput)
- latency p50/p95/p99
- 4xx rate (especially 401/403/429)
- 5xx rate

### DB layer
- slow queries
- DB connections used
- lock waits (PostgreSQL)
- migration status (during deploys)

### Cache
- hit rate
- errors/timeouts

### Background jobs (later)
- queue depth
- job success/failure rate
- retry counts

Even if you don’t have Prometheus/Grafana set up yet, these metrics guide your ops.

---

## 21.9 Error Tracking (Sentry Concepts) — What You Should Capture

An error tracking system collects:
- exception type + stack trace
- request path + method
- user id
- request id
- environment (prod/staging)
- release/version
- breadcrumbs (recent logs/events)

You can integrate later, but you should structure your logging and request IDs now
so integration is easy.

### 21.9.1 What to avoid sending to error tracking
- passwords
- tokens
- full request bodies
- sensitive PII unless your compliance allows it

---

## 21.10 Log Levels and Environments (Dev vs Prod)

### Development
- more DEBUG logs are acceptable
- you can show debug pages

### Production
- INFO for access logs and business events
- WARNING for unusual events
- ERROR for failures
- DEBUG usually disabled (too noisy and may leak data)

Set log level via env var:

```bash
DJANGO_LOG_LEVEL=INFO
```

---

## 21.11 Testing Observability (Yes, You Can Test Logs and Health Endpoints)

### 21.11.1 Test readyz returns 200 when DB works
```python
from django.test import TestCase


class ReadyzTests(TestCase):
    def test_readyz_ok(self):
        response = self.client.get("/readyz/")
        self.assertEqual(response.status_code, 200)
        self.assertEqual(response.json()["status"], "ready")
```

### 21.11.2 Test access log middleware (optional)
Testing logs is possible but can be brittle. If you want:

- use `assertLogs` in unittest
- only assert presence of key fields

Example:

```python
import logging
from django.test import TestCase


class AccessLogTests(TestCase):
    def test_access_log_emits(self):
        with self.assertLogs("access", level="INFO") as cm:
            self.client.get("/healthz/")

        joined = "\n".join(cm.output)
        self.assertIn("path=/healthz/", joined)
        self.assertIn("status=200", joined)
```

---

## 21.12 Common Mistakes (And Fixes)

### Mistake A: “Logs don’t show request_id”
Cause:
- RequestId middleware not setting thread-local
- Log filter not applied
Fix:
- ensure `RequestIdMiddleware` runs early
- ensure handler includes filter
- ensure formatter includes `%(request_id)s`

### Mistake B: Logging secrets accidentally
Cause:
- logging `request.headers` or full `request.body`
Fix:
- whitelist what you log, don’t dump objects
- scrub sensitive keys if you must log payloads

### Mistake C: Readyz endpoint is slow
Cause:
- heavy DB query or external calls
Fix:
- keep readiness checks minimal (`SELECT 1`)
- avoid network calls; check only essential dependencies

---

# 21.13 LAB: Implement Production‑Grade Observability Baseline

1. Request ID middleware (already have)
2. Thread-local request context
3. Logging filter that injects request_id
4. Access log middleware
5. `/readyz` readiness endpoint
6. One test for `/readyz`
7. One test for request id header exists
8. Confirm logs show request_id in terminal

---

## 21.14 Exercises (Do These Before Proceeding)

1. Add a log line when a task is exported:
   - include org slug, user id, filter params (but not sensitive info)
2. Add a log line when a PermissionDenied is raised for task edit:
   - include user id and task id
3. Add a `/metrics` placeholder endpoint (no real Prometheus yet) that returns JSON:
   - uptime seconds
   - total requests served (keep in memory for dev)
   - explain why in-memory counters reset on restart
4. Add tests ensuring `/healthz` and `/readyz` return JSON and correct status codes.

---

## 21.15 Chapter Summary

- Logging with request IDs is the foundation of production debugging.
- Use middleware to add request_id and access logs consistently.
- Keep logs context-rich but avoid sensitive data.
- Distinguish liveness (`/healthz`) from readiness (`/readyz`).
- Observability includes logs, metrics, tracing, and error tracking—start with logs +
  health checks and build from there.

---

Next chapter: **Part IV — 22. Architecture and Code Organization**  
We’ll refactor your growing codebase into maintainable patterns (app boundaries,
service layer, settings structure, dependency inversion, reusable apps), so your
project remains clean as features multiply.