chore(retention): make first-time anchor entity-polymorphic#60110
Merged
thmsobrmlr merged 2 commits intoMay 27, 2026
Conversation
Generalize get_first_time_anchor_expr on RetentionBaseQueryBuilder (and its no-props sibling) to take a RetentionEntity, producing minIf(events.timestamp, ...) for events entities and minIf(<table>.<timestamp_field>, ...) for data warehouse entities (with property_to_expr or a truthy constant for the predicate). Pure refactor — zero behaviour change for events-only callers. This unblocks downstream work that needs to resolve first-ever anchors against a DWH entity from a single shared primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Contributor
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
posthog/hogql_queries/insights/retention/retention_base_query_builder.py:71-74
`assert` for precondition checks in a method reached via user-configured schema data (`RetentionEntity.table_name` / `timestamp_field` are both `Optional`). When either field is `None` the assertion fires an unhelpful bare `AssertionError`, and under `python -O` the check is silently stripped, producing `ast.Field(chain=[None, None])` and a confusing downstream error instead.
```suggestion
def entity_timestamp_field(self, entity: RetentionEntity) -> ast.Expr:
if entity.type == EntityType.DATA_WAREHOUSE:
if not entity.table_name or not entity.timestamp_field:
raise ValueError(
f"DATA_WAREHOUSE RetentionEntity requires table_name and timestamp_field, "
f"got table_name={entity.table_name!r}, timestamp_field={entity.timestamp_field!r}"
)
return ast.Field(chain=[entity.table_name, entity.timestamp_field])
```
### Issue 2 of 3
posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py:13-62
**Duplicate AST traversal logic** — `_collect_field_chains`, `_collect_constants`, and `_collect_min_if_predicates` each contain identical recursive walk logic: visit a dataclass's fields, then recurse into lists. Only the leaf-node collection differs. A single generic walker (e.g. `_walk_ast(node, collect_fn)`) would express the idea once, making it easier to handle edge cases (e.g. dict-valued fields) consistently across all three helpers.
### Issue 3 of 3
posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py:84-167
**Non-parameterised tests** — all four methods exercise the same function (`get_first_time_anchor_expr`) with different entity fixtures and expected AST properties. The team prefers parameterised tests; a single `@pytest.mark.parametrize` case list with `(entity, is_first_ever, is_first_matching, expected_chains, unexpected_chains, expected_constants)` tuples would eliminate the repeated builder/assert boilerplate and make adding future shapes trivial.
Reviews (1): Last reviewed commit: "chore(retention): make first-time anchor..." | Re-trigger Greptile |
…anchor PR - Replace `assert entity.table_name and entity.timestamp_field` with an explicit `ValueError` carrying the offending field values. Bare `assert` is stripped under `python -O` and yields an unhelpful `AssertionError` on user-configured schema input. - Fix mypy CI failure: the `# type: ignore[union-attr]` at line 167 was the wrong code (`[attr-defined]` is what mypy actually emits). Use the standard `assert isinstance(pred, ast.Constant)` narrowing pattern after `self.assertIsInstance(...)` and drop the ignore. - Extract `_walk_ast(node, visit)` and rewrite the three collectors (`_collect_field_chains`, `_collect_constants`, `_collect_min_if_predicates`) on top of it — the recursive walk was identical across all three. - Parameterize the four anchor-expr tests with `parameterized.expand` (the existing `unittest.TestCase` base rules out `pytest.parametrize`). The DWH-no-properties case gates its unique min-if-truthy assertion on a per-row flag so the four shapes share one test method. Generated-By: PostHog Code Task-Id: 56490050-2f0a-4caa-9bdc-c7c1baee038f
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor to make "first time" feature possible in follow-up work.
Problem
RetentionBaseQueryBuilder.get_first_time_anchor_expris hard-coded tominIf(events.timestamp, <events_entity_predicate>). That coupling blocks every downstream track that wants to resolve a first-ever anchor against a data warehouse entity from a single shared primitive — both the strict-calendar-date first-time parity work and the 24-hour-window DWH retrofit need to fan out from the same place.Changes
Make the first-time anchor primitive polymorphic over
RetentionEntity:get_first_time_anchor_expr(entity)now takes aRetentionEntityand routes byentity.type.EntityType.EVENTS, the produced expression is identical to before:minIf(events.timestamp, <events_entity_predicate>)with property-stripped and property-included variants for the first-ever case.EntityType.DATA_WAREHOUSE, the expression becomesminIf(<entity.table_name>.<entity.timestamp_field>, <predicate>), where the predicate isproperty_to_expr(entity.properties)if properties are configured andConstant(True)otherwise.retention_base_query_fixed.py,retention_base_query_rolling.py) passself.start_eventto the helper.start_entity_expr_no_propsbecomes a thin events-default wrapper around the new polymorphicentity_expr_no_props(entity)— the breakdownargMinIfcaller is unchanged because DWH breakdowns are out of scope for this slice.Pure refactor for events callers — zero behaviour change. No DWH call site uses the new branch yet; that wiring lands in subsequent slices.
How did you test this code?
I'm an agent (Claude). I only ran automated tests:
posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.pyexercise the four interesting shapes — events first-ever, events first-time-matching, DWH with properties, DWH without properties. All four pass.test_retention_query_runner.pywith the new signature. CI will run them against ClickHouse.events.timestampreferences inside first-time anchor logic acrossposthog/hogql_queries/insights/retention/. Other hits (breakdownargMinIf, rolling-t0 non-first-time path, generic events-only paths) are deliberate and out of scope per the ticket.retention_base_query_variant_comparison_excluded_testsin both test classes is byte-identical — this refactor closes no parity gap on its own.Publish to changelog?
no
🤖 Agent context
Authored by Claude Code (Opus 4.7, 1M context) via the
/tddskill against a Notion ticket titled "Foundation: entity-polymorphic first-time anchor".Approach:
MagicMockrunner is enough — no Django Team, no ClickHouse. That bypassed a broken local CH setup without sacrificing coverage of the meaningful branches.parse_exprsubstitution shares nodes, so a tree walk counts the reusedminIfmore than twice for the first-ever shape. The DWH-no-properties test asserts on the property of every predicate (isinstance(..., Constant)and truthy) rather than an exact count.Decisions:
start_entity_expr_no_propsas acached_propertywrapper instead of deleting it. The breakdown call site still depends on it for events, DWH breakdown is downstream work, and keeping the wrapper means existing snapshot tests don't move.entity_timestamp_field,entity_expr_with_props, andentity_expr_no_propsas public methods on the builder rather than_privatehelpers — the next two tracks will compose them.RetentionBaseQueryVariantComparisonMixinexclusion lists. The ticket explicitly says no parity gap is closed by this slice.