Skip to content

chore(retention): make first-time anchor entity-polymorphic#60110

Merged
thmsobrmlr merged 2 commits into
masterfrom
posthog-code/retention-polymorphic-first-time-anchor
May 27, 2026
Merged

chore(retention): make first-time anchor entity-polymorphic#60110
thmsobrmlr merged 2 commits into
masterfrom
posthog-code/retention-polymorphic-first-time-anchor

Conversation

@thmsobrmlr
Copy link
Copy Markdown
Collaborator

@thmsobrmlr thmsobrmlr commented May 26, 2026

Refactor to make "first time" feature possible in follow-up work.

Problem

RetentionBaseQueryBuilder.get_first_time_anchor_expr is hard-coded to minIf(events.timestamp, <events_entity_predicate>). That coupling blocks every downstream track that wants to resolve a first-ever anchor against a data warehouse entity from a single shared primitive — both the strict-calendar-date first-time parity work and the 24-hour-window DWH retrofit need to fan out from the same place.

Changes

Make the first-time anchor primitive polymorphic over RetentionEntity:

  • get_first_time_anchor_expr(entity) now takes a RetentionEntity and routes by entity.type.
  • For EntityType.EVENTS, the produced expression is identical to before: minIf(events.timestamp, <events_entity_predicate>) with property-stripped and property-included variants for the first-ever case.
  • For EntityType.DATA_WAREHOUSE, the expression becomes minIf(<entity.table_name>.<entity.timestamp_field>, <predicate>), where the predicate is property_to_expr(entity.properties) if properties are configured and Constant(True) otherwise.
  • The two existing call sites (retention_base_query_fixed.py, retention_base_query_rolling.py) pass self.start_event to the helper.
  • start_entity_expr_no_props becomes a thin events-default wrapper around the new polymorphic entity_expr_no_props(entity) — the breakdown argMinIf caller is unchanged because DWH breakdowns are out of scope for this slice.

Pure refactor for events callers — zero behaviour change. No DWH call site uses the new branch yet; that wiring lands in subsequent slices.

How did you test this code?

I'm an agent (Claude). I only ran automated tests:

  • New focused unit tests in posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py exercise the four interesting shapes — events first-ever, events first-time-matching, DWH with properties, DWH without properties. All four pass.
  • Confirmed the existing first-time retention test suite still collects 23 tests in test_retention_query_runner.py with the new signature. CI will run them against ClickHouse.
  • Confirmed zero remaining events.timestamp references inside first-time anchor logic across posthog/hogql_queries/insights/retention/. Other hits (breakdown argMinIf, rolling-t0 non-first-time path, generic events-only paths) are deliberate and out of scope per the ticket.
  • Confirmed retention_base_query_variant_comparison_excluded_tests in both test classes is byte-identical — this refactor closes no parity gap on its own.

Publish to changelog?

no

🤖 Agent context

Authored by Claude Code (Opus 4.7, 1M context) via the /tdd skill against a Notion ticket titled "Foundation: entity-polymorphic first-time anchor".

Approach:

  • Followed the TDD skill's vertical-slice discipline: events-regression test → signature refactor; DWH-with-properties test → DWH branch; DWH-no-properties test → truthy-constant predicate.
  • Picked unit tests over a full snapshot diff because the helper's contract is structural (which AST nodes appear where) and a stub MagicMock runner is enough — no Django Team, no ClickHouse. That bypassed a broken local CH setup without sacrificing coverage of the meaningful branches.
  • One AST observation that surfaced via TDD: parse_expr substitution shares nodes, so a tree walk counts the reused minIf more than twice for the first-ever shape. The DWH-no-properties test asserts on the property of every predicate (isinstance(..., Constant) and truthy) rather than an exact count.
  • The work was originally proven on the main checkout but kept getting reset by an external process touching these specific files; final application happened inside an isolated git worktree.

Decisions:

  • Kept start_entity_expr_no_props as a cached_property wrapper instead of deleting it. The breakdown call site still depends on it for events, DWH breakdown is downstream work, and keeping the wrapper means existing snapshot tests don't move.
  • Added entity_timestamp_field, entity_expr_with_props, and entity_expr_no_props as public methods on the builder rather than _private helpers — the next two tracks will compose them.
  • Did not touch RetentionBaseQueryVariantComparisonMixin exclusion lists. The ticket explicitly says no parity gap is closed by this slice.

Generalize get_first_time_anchor_expr on RetentionBaseQueryBuilder
(and its no-props sibling) to take a RetentionEntity, producing
minIf(events.timestamp, ...) for events entities and
minIf(<table>.<timestamp_field>, ...) for data warehouse entities
(with property_to_expr or a truthy constant for the predicate).

Pure refactor — zero behaviour change for events-only callers.
This unblocks downstream work that needs to resolve first-ever
anchors against a DWH entity from a single shared primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
posthog/hogql_queries/insights/retention/retention_base_query_builder.py:71-74
`assert` for precondition checks in a method reached via user-configured schema data (`RetentionEntity.table_name` / `timestamp_field` are both `Optional`). When either field is `None` the assertion fires an unhelpful bare `AssertionError`, and under `python -O` the check is silently stripped, producing `ast.Field(chain=[None, None])` and a confusing downstream error instead.

```suggestion
    def entity_timestamp_field(self, entity: RetentionEntity) -> ast.Expr:
        if entity.type == EntityType.DATA_WAREHOUSE:
            if not entity.table_name or not entity.timestamp_field:
                raise ValueError(
                    f"DATA_WAREHOUSE RetentionEntity requires table_name and timestamp_field, "
                    f"got table_name={entity.table_name!r}, timestamp_field={entity.timestamp_field!r}"
                )
            return ast.Field(chain=[entity.table_name, entity.timestamp_field])
```

### Issue 2 of 3
posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py:13-62
**Duplicate AST traversal logic**`_collect_field_chains`, `_collect_constants`, and `_collect_min_if_predicates` each contain identical recursive walk logic: visit a dataclass's fields, then recurse into lists. Only the leaf-node collection differs. A single generic walker (e.g. `_walk_ast(node, collect_fn)`) would express the idea once, making it easier to handle edge cases (e.g. dict-valued fields) consistently across all three helpers.

### Issue 3 of 3
posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py:84-167
**Non-parameterised tests** — all four methods exercise the same function (`get_first_time_anchor_expr`) with different entity fixtures and expected AST properties. The team prefers parameterised tests; a single `@pytest.mark.parametrize` case list with `(entity, is_first_ever, is_first_matching, expected_chains, unexpected_chains, expected_constants)` tuples would eliminate the repeated builder/assert boilerplate and make adding future shapes trivial.

Reviews (1): Last reviewed commit: "chore(retention): make first-time anchor..." | Re-trigger Greptile

Comment thread posthog/hogql_queries/insights/retention/test/test_retention_first_time_anchor.py Outdated
…anchor PR

- Replace `assert entity.table_name and entity.timestamp_field` with an
  explicit `ValueError` carrying the offending field values. Bare `assert`
  is stripped under `python -O` and yields an unhelpful `AssertionError`
  on user-configured schema input.
- Fix mypy CI failure: the `# type: ignore[union-attr]` at line 167 was
  the wrong code (`[attr-defined]` is what mypy actually emits). Use the
  standard `assert isinstance(pred, ast.Constant)` narrowing pattern
  after `self.assertIsInstance(...)` and drop the ignore.
- Extract `_walk_ast(node, visit)` and rewrite the three collectors
  (`_collect_field_chains`, `_collect_constants`,
  `_collect_min_if_predicates`) on top of it — the recursive walk was
  identical across all three.
- Parameterize the four anchor-expr tests with `parameterized.expand`
  (the existing `unittest.TestCase` base rules out `pytest.parametrize`).
  The DWH-no-properties case gates its unique min-if-truthy assertion on
  a per-row flag so the four shapes share one test method.

Generated-By: PostHog Code
Task-Id: 56490050-2f0a-4caa-9bdc-c7c1baee038f
@thmsobrmlr thmsobrmlr requested a review from a team May 26, 2026 20:49
Copy link
Copy Markdown
Member

@gesh gesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@thmsobrmlr thmsobrmlr merged commit fc4c498 into master May 27, 2026
222 checks passed
@thmsobrmlr thmsobrmlr deleted the posthog-code/retention-polymorphic-first-time-anchor branch May 27, 2026 10:14
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 27, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-27 10:36 UTC Run
prod-us ✅ Deployed 2026-05-27 10:47 UTC Run
prod-eu ✅ Deployed 2026-05-27 10:49 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants