feat(django): split personhog rpcs into batched calls#60307
Merged
Conversation
Contributor
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Contributor
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
posthog/models/person/test/test_batched_personhog_helpers.py:1-10
**Non-parameterized tests across the batch-count dimension**
Per the team's coding conventions, parameterized tests are preferred. Several test classes have nearly identical `test_single_batch` / `test_multiple_batches` (and similarly `test_deduplicates_same_person_across_distinct_ids` / `test_deduplicates_across_batches`) pairs whose only difference is the patched `PERSONHOG_BATCH_SIZE` and input count. Collapsing each pair into a single `@pytest.mark.parametrize`-decorated case (e.g. parametrize over `(batch_size, n_items, expected_call_count)` tuples) would remove the duplication while keeping the same coverage.
### Issue 2 of 2
posthog/models/person/util.py:605-606
**`_validate_uuids_via_personhog` silently starts filtering by `p.id` after refactor**
The original implementation (`valid = [p for p in resp.persons if p.team_id == team_id]`) did **not** apply the `p.id` guard. Via the shared `_batched_get_persons_by_uuids` helper, it now discards persons whose proto `id` field is zero/unset before checking `team_id`. A person with a matching `team_id` but `id == 0` would have been returned before and is now silently dropped. It also shifts how the `PERSONHOG_TEAM_MISMATCH_TOTAL` counter is incremented for that function: the original counted every team-mismatched person regardless of `id`; the new path counts only those that pass the `id` filter. If this change is intentional, it should be reflected in a test; if accidental, a clarifying docstring on `_batched_get_persons_by_uuids` would make the contract explicit.
```suggestion
# NOTE: _batched_get_persons_by_uuids also filters out persons with id == 0,
# which differs from the previous single-RPC implementation that only
# checked team_id. Persons returned by the server with a zero id are now
# excluded from validation results and from the team-mismatch metric.
valid_persons = _batched_get_persons_by_uuids(client, team_id, uuids, "validate_person_uuids_exist")
return [p.uuid for p in valid_persons]
```
Reviews (1): Last reviewed commit: "tests and fxies" | Re-trigger Greptile |
Contributor
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
eli-r-ph
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Django's personhog gRPC call sites send unbounded request payloads. If a caller passes thousands of UUIDs or distinct IDs, the entire list goes in a single RPC.
Changes
Adds shared helpers that chunks RPCs into batches of 500:
GetPersonsByUuuidsGetPersonsByDistinctIdsInTeamGetDistinctIdsForPersonsError Handling --
If any batch RPC failes, the exception propagates out to the healper and we fall back to ORM path
How did you test this code?
test_batched_personhog_helpers.pyfile