Skip to content

feat(personhog): extract x-caller-tag header in router#60361

Merged
nickbest-ph merged 3 commits into
masterfrom
nick/personhog-caller-tag-router
May 27, 2026
Merged

feat(personhog): extract x-caller-tag header in router#60361
nickbest-ph merged 3 commits into
masterfrom
nick/personhog-caller-tag-router

Conversation

@nickbest-ph
Copy link
Copy Markdown
Contributor

Problem

Multiple services (Django, Node.js, Rust) consume the personhog gRPC API, but the only attribution today is x-client-name, which identifies the service (e.g., "posthog-django") but not which code path within that service made the call. Some code paths request too many persons, generating responses up to 65 MB that destabilize the service. Finding the offending callers is whack-a-mole without finer-grained attribution.

Changes

Add a new x-caller-tag gRPC metadata header to the personhog router for caller-path attribution. This is the consumer side — the router reads the header and uses it in metrics and logging. Client-side changes (Django, Node.js, property-defs-rs) will follow in separate PRs.

Specifically:

  • personhog-common/src/grpc.rs: Add CALLER_TAG tokio task-local alongside existing CLIENT_NAME, with extract_caller_tag() and current_caller_tag() helpers. Nest both task-local scopes in GrpcMetricsService::call().
  • personhog-router/src/proxy.rs: Add caller_tag label to personhog_router_response_size_bytes and personhog_router_backend_duration_ms histograms. Add configurable oversized-response structured logging (tracing::warn!) when response exceeds threshold, including caller_tag context.
  • personhog-router/src/config.rs: Add response_size_warn_bytes config (default 10 MB).
  • personhog-router/src/backend/replica.rs and leader.rs: Propagate x-caller-tag in retry_call! macro alongside x-client-name so the tag flows to replicas/leader.

The header defaults to "unknown" when absent, making this safe for incremental rollout — untagged traffic is visible but doesn't break anything. The caller_tag label is only added to 2 metrics (response size and backend duration) to keep cardinality reasonable (~200–500 additional series).

How did you test this code?

This PR was co-authored by an AI agent (Claude Code). Testing:

  • 4 unit tests added in personhog-common/src/grpc.rs: extract_caller_tag_from_headers, extract_caller_tag_defaults_to_unknown, extract_caller_tag_treats_empty_as_unknown, current_caller_tag_defaults_outside_scope
  • Verified compilation of all modified crates
  • Updated test call sites in personhog-router/tests/common/mod.rs for new RawProxyService::new() signature

Publish to changelog?

No

🤖 Agent context

Co-authored with Claude Code (Opus). This is commit 1 of a 7-commit series implementing x-caller-tag attribution across the personhog stack. The full series adds:

  1. Router extraction (this PR)
  2. Static tag in property-defs-rs
  3. callerTag config in Node.js client
  4. CallerTagInterceptor + ContextVar in Django client
  5. Django middleware auto-tagging from URL names
  6. Celery task auto-tagging from task names
  7. Manual personhog_caller_tag() wrappers at known-heavy call sites

Key design decisions:

  • Header-based (not proto field): Avoids schema changes, lets the router extract attribution from headers without deserializing the request body — same pattern as x-client-name and x-read-consistency.
  • Only 2 metrics get the new label: response_size_bytes and backend_duration_ms — these are the metrics that matter for identifying heavy queries. Adding to all metrics would cause cardinality explosion.
  • Task-local scoping: CALLER_TAG uses the same tokio task_local! + GrpcMetricsLayer pattern as CLIENT_NAME, so it propagates through both raw proxy and typed service paths automatically.

Add a new `x-caller-tag` gRPC metadata header for caller-path
attribution. The router extracts it alongside the existing
`x-client-name` header and uses it to dimension the
`personhog_router_response_size_bytes` and
`personhog_router_backend_duration_ms` metrics, enabling dashboards
that show which code paths within a service generate the heaviest
responses.

- Add CALLER_TAG task-local in personhog-common for async propagation
- Add caller_tag label to response_size_bytes and backend_duration_ms
- Add configurable oversized-response structured logging (default 10MB)
- Propagate x-caller-tag in retry_call! macro to replica and leader
- Default to "unknown" when the header is absent (safe for incremental rollout)
Comment thread rust/personhog-common/src/grpc.rs
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
rust/personhog-common/src/grpc.rs:57-77
`extract_caller_tag` and `extract_client_name` are identical implementations that differ only in the header constant. Per the OnceAndOnlyOnce rule, this logic can live in a single shared helper — if a third attribution header is added, the same pattern would have to be duplicated a third time.

```suggestion
/// Extract a named header from HTTP headers, defaulting to `"unknown"`.
fn extract_header_or_unknown<B>(request: &Request<B>, header: &str) -> Arc<str> {
    request
        .headers()
        .get(header)
        .and_then(|v| v.to_str().ok())
        .filter(|s| !s.is_empty())
        .unwrap_or("unknown")
        .into()
}

/// Extract the client name from HTTP headers, defaulting to `"unknown"`.
fn extract_client_name<B>(request: &Request<B>) -> Arc<str> {
    extract_header_or_unknown(request, CLIENT_NAME_HEADER)
}

/// Extract the caller tag from HTTP headers, defaulting to `"unknown"`.
fn extract_caller_tag<B>(request: &Request<B>) -> Arc<str> {
    extract_header_or_unknown(request, CALLER_TAG_HEADER)
}
```

Reviews (1): Last reviewed commit: "feat(personhog): extract x-caller-tag he..." | Re-trigger Greptile

Comment thread rust/personhog-common/src/grpc.rs Outdated
Add length cap (128 chars) and character allow-list validation to both
extract_caller_tag and extract_client_name to prevent unbounded metric
cardinality from malformed headers. Non-matching values fall back to
"unknown". Also fixes rustfmt formatting issues.
@nickbest-ph nickbest-ph merged commit 7fbc6a2 into master May 27, 2026
174 checks passed
@nickbest-ph nickbest-ph deleted the nick/personhog-caller-tag-router branch May 27, 2026 23:09
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 27, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-27 23:50 UTC Run
prod-us ✅ Deployed 2026-05-28 00:02 UTC Run
prod-eu ✅ Deployed 2026-05-28 00:04 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants