Optimization for deferring row policy and PREWHERE#102884
Conversation
|
Workflow [PR], commit [2f6e799] Summary: ❌
AI ReviewSummaryThis PR narrows when row policy and Missing context
ClickHouse Rules
Final Verdict
|
…filters-perf-regression
rienath
left a comment
There was a problem hiding this comment.
Could you expand the PR description? Right now it only mentions PREWHERE, but the PR also stops deferring row policies that are over the sorting key, which is arguably the bigger change.
It would also be great to walk through each case explicitly (row policy over SK with no PREWHERE, row policy over SK with PREWHERE, row policy on non-SK with PREWHERE over SK and the cases that don't change) and describe what the pipeline looked like before vs after for each one. A small table or list would make it much easier to follow than diffing two EXPLAIN outputs.
More importantly, the correctness argument should be written down somewhere (with details about the determinism caveat and guard handling). If something breaks around row policies / FINAL, this is the first place someone will look, and "we probably can avoid deferring" won't tell them whether the optimisation was sound or a guess.
LLVM Coverage Report
Changed lines: 100.00% (27/27) | lost baseline coverage: 1 line(s) · Uncovered code |
rienath
left a comment
There was a problem hiding this comment.
I love the fact that this also incidentally solved row policy problem that we had before. If someone set SET apply_row_policy_after_final = 1, apply_prewhere_after_final = 0;, then we had PREWHERE → FINAL → ROW POLICY. PREWHERE was applied before ROW POLICY and we could probe for rows we haven't got access to. You can see example here. But this is solved now!
SELECT *
FROM employees
FINAL
PREWHERE throwIf(salary > 200000, 'leak') = 0 ┌─dept_id─┬─name──┬─salary─┬─version─┐
1. │ 1 │ Alice │ 81000 │ 2 │
└─────────┴───────┴────────┴─────────┘
I think we should add this fiddle query as a test too so we don't degrade. In any case, looks good!
|
Thanks! I will add this test in the follow-up PR. |
Cherry pick #102884 to 26.3: Optimization for deferring row policy and PREWHERE
Cherry pick #102884 to 26.4: Optimization for deferring row policy and PREWHERE
Cherry pick #102884 to 26.2: Optimization for deferring row policy and PREWHERE
Backport #102884 to 26.2: Optimization for deferring row policy and PREWHERE
Backport #102884 to 26.3: Optimization for deferring row policy and PREWHERE
Backport #102884 to 26.4: Optimization for deferring row policy and PREWHERE
With FINAL queries and
apply_row_policy_after_final = 1(default), this PR narrows when the row policy andPREWHEREare deferred to afterFINALThis is the explain without the patch: https://fiddle.clickhouse.com/6251834d-26e7-4a4e-8bc4-ba4723de4420
This is the explain after the patch: https://pastila.clickhouse.com/?00224304/0d1aaa7352c7f511fa80cd31f1ba8eef#snF8VfZFxcq4d+YOsS2qvA==GCM
For a
FINALread with a row policy (and optionally aPREWHERE):PREWHERE is not deferred anymore in this case. This is the case where we apply PREWHERE on sorting key column, when we probably can avoid deferring
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Skip deferring the row policy after
FINALwhen it depends only on sorting-key columns and is deterministic, and skip the correspondingPREWHEREdeferral that the row policy was forcing in that caseDocumentation entry for user-facing changes