feat(personhog): extract x-caller-tag header in router#60361
Merged
Conversation
Add a new `x-caller-tag` gRPC metadata header for caller-path attribution. The router extracts it alongside the existing `x-client-name` header and uses it to dimension the `personhog_router_response_size_bytes` and `personhog_router_backend_duration_ms` metrics, enabling dashboards that show which code paths within a service generate the heaviest responses. - Add CALLER_TAG task-local in personhog-common for async propagation - Add caller_tag label to response_size_bytes and backend_duration_ms - Add configurable oversized-response structured logging (default 10MB) - Propagate x-caller-tag in retry_call! macro to replica and leader - Default to "unknown" when the header is absent (safe for incremental rollout)
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
rust/personhog-common/src/grpc.rs:57-77
`extract_caller_tag` and `extract_client_name` are identical implementations that differ only in the header constant. Per the OnceAndOnlyOnce rule, this logic can live in a single shared helper — if a third attribution header is added, the same pattern would have to be duplicated a third time.
```suggestion
/// Extract a named header from HTTP headers, defaulting to `"unknown"`.
fn extract_header_or_unknown<B>(request: &Request<B>, header: &str) -> Arc<str> {
request
.headers()
.get(header)
.and_then(|v| v.to_str().ok())
.filter(|s| !s.is_empty())
.unwrap_or("unknown")
.into()
}
/// Extract the client name from HTTP headers, defaulting to `"unknown"`.
fn extract_client_name<B>(request: &Request<B>) -> Arc<str> {
extract_header_or_unknown(request, CLIENT_NAME_HEADER)
}
/// Extract the caller tag from HTTP headers, defaulting to `"unknown"`.
fn extract_caller_tag<B>(request: &Request<B>) -> Arc<str> {
extract_header_or_unknown(request, CALLER_TAG_HEADER)
}
```
Reviews (1): Last reviewed commit: "feat(personhog): extract x-caller-tag he..." | Re-trigger Greptile |
Add length cap (128 chars) and character allow-list validation to both extract_caller_tag and extract_client_name to prevent unbounded metric cardinality from malformed headers. Non-matching values fall back to "unknown". Also fixes rustfmt formatting issues.
This was referenced May 27, 2026
eli-r-ph
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Multiple services (Django, Node.js, Rust) consume the personhog gRPC API, but the only attribution today is
x-client-name, which identifies the service (e.g., "posthog-django") but not which code path within that service made the call. Some code paths request too many persons, generating responses up to 65 MB that destabilize the service. Finding the offending callers is whack-a-mole without finer-grained attribution.Changes
Add a new
x-caller-taggRPC metadata header to the personhog router for caller-path attribution. This is the consumer side — the router reads the header and uses it in metrics and logging. Client-side changes (Django, Node.js, property-defs-rs) will follow in separate PRs.Specifically:
personhog-common/src/grpc.rs: AddCALLER_TAGtokio task-local alongside existingCLIENT_NAME, withextract_caller_tag()andcurrent_caller_tag()helpers. Nest both task-local scopes inGrpcMetricsService::call().personhog-router/src/proxy.rs: Addcaller_taglabel topersonhog_router_response_size_bytesandpersonhog_router_backend_duration_mshistograms. Add configurable oversized-response structured logging (tracing::warn!) when response exceeds threshold, including caller_tag context.personhog-router/src/config.rs: Addresponse_size_warn_bytesconfig (default 10 MB).personhog-router/src/backend/replica.rsandleader.rs: Propagatex-caller-taginretry_call!macro alongsidex-client-nameso the tag flows to replicas/leader.The header defaults to
"unknown"when absent, making this safe for incremental rollout — untagged traffic is visible but doesn't break anything. Thecaller_taglabel is only added to 2 metrics (response size and backend duration) to keep cardinality reasonable (~200–500 additional series).How did you test this code?
This PR was co-authored by an AI agent (Claude Code). Testing:
personhog-common/src/grpc.rs:extract_caller_tag_from_headers,extract_caller_tag_defaults_to_unknown,extract_caller_tag_treats_empty_as_unknown,current_caller_tag_defaults_outside_scopepersonhog-router/tests/common/mod.rsfor newRawProxyService::new()signaturePublish to changelog?
No
🤖 Agent context
Co-authored with Claude Code (Opus). This is commit 1 of a 7-commit series implementing
x-caller-tagattribution across the personhog stack. The full series adds:callerTagconfig in Node.js clientCallerTagInterceptor+ ContextVar in Django clientpersonhog_caller_tag()wrappers at known-heavy call sitesKey design decisions:
x-client-nameandx-read-consistency.response_size_bytesandbackend_duration_ms— these are the metrics that matter for identifying heavy queries. Adding to all metrics would cause cardinality explosion.CALLER_TAGuses the same tokiotask_local!+GrpcMetricsLayerpattern asCLIENT_NAME, so it propagates through both raw proxy and typed service paths automatically.