Skip to content

feat(django): split personhog rpcs into batched calls#60307

Merged
nickbest-ph merged 3 commits into
masterfrom
nick/person-fetch-batching
May 27, 2026
Merged

feat(django): split personhog rpcs into batched calls#60307
nickbest-ph merged 3 commits into
masterfrom
nick/person-fetch-batching

Conversation

@nickbest-ph
Copy link
Copy Markdown
Contributor

Problem

Django's personhog gRPC call sites send unbounded request payloads. If a caller passes thousands of UUIDs or distinct IDs, the entire list goes in a single RPC.

Changes

Adds shared helpers that chunks RPCs into batches of 500:

  • batches GetPersonsByUuuids
  • batches GetPersonsByDistinctIdsInTeam
  • batches GetDistinctIdsForPersons

Error Handling --

If any batch RPC failes, the exception propagates out to the healper and we fall back to ORM path

How did you test this code?

  • new unit tests in the test_batched_personhog_helpers.py file
  • existing unit tests pass

@nickbest-ph nickbest-ph changed the title Nick/person fetch batching feat(django): split personhog rpcs into batched calls May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
posthog/models/person/test/test_batched_personhog_helpers.py:1-10
**Non-parameterized tests across the batch-count dimension**

Per the team's coding conventions, parameterized tests are preferred. Several test classes have nearly identical `test_single_batch` / `test_multiple_batches` (and similarly `test_deduplicates_same_person_across_distinct_ids` / `test_deduplicates_across_batches`) pairs whose only difference is the patched `PERSONHOG_BATCH_SIZE` and input count. Collapsing each pair into a single `@pytest.mark.parametrize`-decorated case (e.g. parametrize over `(batch_size, n_items, expected_call_count)` tuples) would remove the duplication while keeping the same coverage.

### Issue 2 of 2
posthog/models/person/util.py:605-606
**`_validate_uuids_via_personhog` silently starts filtering by `p.id` after refactor**

The original implementation (`valid = [p for p in resp.persons if p.team_id == team_id]`) did **not** apply the `p.id` guard. Via the shared `_batched_get_persons_by_uuids` helper, it now discards persons whose proto `id` field is zero/unset before checking `team_id`. A person with a matching `team_id` but `id == 0` would have been returned before and is now silently dropped. It also shifts how the `PERSONHOG_TEAM_MISMATCH_TOTAL` counter is incremented for that function: the original counted every team-mismatched person regardless of `id`; the new path counts only those that pass the `id` filter. If this change is intentional, it should be reflected in a test; if accidental, a clarifying docstring on `_batched_get_persons_by_uuids` would make the contract explicit.

```suggestion
    # NOTE: _batched_get_persons_by_uuids also filters out persons with id == 0,
    # which differs from the previous single-RPC implementation that only
    # checked team_id.  Persons returned by the server with a zero id are now
    # excluded from validation results and from the team-mismatch metric.
    valid_persons = _batched_get_persons_by_uuids(client, team_id, uuids, "validate_person_uuids_exist")
    return [p.uuid for p in valid_persons]
```

Reviews (1): Last reviewed commit: "tests and fxies" | Re-trigger Greptile

Comment thread posthog/models/person/test/test_batched_personhog_helpers.py
Comment thread posthog/models/person/util.py Outdated
@tests-posthog
Copy link
Copy Markdown
Contributor

tests-posthog Bot commented May 27, 2026

⏭️ Skipped snapshot commit because branch advanced to 2017e6a while workflow was testing 5d5934a.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

  • Merge master into your branch, or
  • Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

@nickbest-ph nickbest-ph merged commit 31da26d into master May 27, 2026
220 checks passed
@nickbest-ph nickbest-ph deleted the nick/person-fetch-batching branch May 27, 2026 19:20
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 27, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-27 19:59 UTC Run
prod-us ✅ Deployed 2026-05-27 20:16 UTC Run
prod-eu ✅ Deployed 2026-05-27 20:16 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants