Skip to content

Ila/pac derived#15

Merged
ila merged 22 commits intomainfrom
ila/pac-derived
Apr 22, 2026
Merged

Ila/pac derived#15
ila merged 22 commits intomainfrom
ila/pac-derived

Conversation

@ila
Copy link
Copy Markdown
Member

@ila ila commented Apr 22, 2026

No description provided.

ila and others added 20 commits March 23, 2026 09:42
…ac_ctas flag

- CTAS from PU/derived_pu tables propagates PAC_KEY and PROTECTED metadata
  to the new table (marked as derived_pu). Handles column renames and
  skips aggregate-consumed protected columns.
- INSERT with aggregates into derived_pu tables triggers PAC noise injection.
- New pac_ctas setting (default true) to control CTAS propagation.
- New pac_check setting guard to allow CTAS to project protected columns.
- DML wrapper detection: drills through INSERT/CTAS and CTE-wrapped DML
  to reach the SELECT plan for PAC rewriting.
- Replace std::cerr debug traces with PAC_DEBUG_PRINT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write path (working):
- ConvertDerivedPuToCounters: converts pac_noised_* to pac_* counter
  variants for CTAS targeting derived_pu tables
- Updates CTAS column types from scalar to LIST<FLOAT>
- derived_pu flag now only set when CTAS has aggregates

Read path (WIP - crashes on type mismatch):
- pac_finalize scalar function: LIST<FLOAT> -> FLOAT, reuses
  PacNoisySampleFrom64Counters with stable world picking
- PACDerivedReadRule post-optimizer: injects pac_finalize into
  projections reading from derived_pu tables
- PACDerivedTypePatcher ClientContextState: patches result->types
  after physical planning (not yet reached due to crash)
- Known issue: physical planner crashes because result->types
  (set at plan time) says FLOAT[] but optimized plan says FLOAT

New files:
- src/aggregates/pac_finalize.cpp
- src/query_processing/pac_derived_rewriter.cpp
- src/include/query_processing/pac_derived_rewriter.hpp
- test/sql/pac_derived.test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port

- CanRequestRebind() override enables OnFinalizePrepare type patching
- Generic FixAllStaleRefs: walks entire plan after pac_finalize injection,
  builds binding→type map from child operators, updates all stale column refs
- Handles ORDER BY (orders field), WHERE (filters), subqueries
- Full test suite: 50 assertions covering write path, read path, noise,
  ORDER BY, WHERE, subqueries, non-aggregate CTAS, manual pac_finalize

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused PacFinalizeBindData (pac_finalize reads settings from context)
- Fix RNG seeding: use entry.offset instead of row index for stability across batches
- Remove debug Printer::Print calls, use PAC_DEBUG_PRINT behind PAC_DEBUG flag
- Fix stale comment ("placeholder") on PACDerivedReadRule
- Early-exit in post-optimizer when no derived_pu GETs found (avoid unnecessary plan traversal)
- Add HasDerivedPuCounterGets() helper to avoid double traversal
- Restore in-line binding propagation in WrapCounterRefsWithFinalize (needed for ORDER BY)
- Add test: multiple aggregate columns (SUM + COUNT)
- Keep FixAllStaleRefs as safety net after ResolveOperatorTypes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace typeof checks with value-based checks (length(), actual values)
- Fix stale "placeholder" comment on PACDerivedReadRule
- Document CanRequestRebind overhead tradeoff
- Remove tests assuming zero-variance counters (counters vary by subsample)
- No mi > 0 value checks (noise is non-deterministic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CTAS propagation now always sets derived_pu = true when the source is
PU or derived_pu, so IVM delta tables automatically get PAC metadata.
PACDerivedReadFunction skips pac_finalize injection for DML plans
(INSERT/UPDATE/DELETE, including CTE-wrapped), ensuring upsert merges
operate on raw counter data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mns, add stress tests

- Fix INSERT into derived_pu: call ResolveOperatorTypes for all DML (not just
  CTAS) and strip redundant FLOAT[]→FLOAT[] casts after counter conversion
- Fix WHERE on counter columns: make WrapExpr recursive so pac_finalize is
  injected into nested comparisons/filters, not just top-level projections
- Register implicit casts LIST<FLOAT>→FLOAT and LIST<FLOAT>→DOUBLE with a real
  pac_finalize implementation so the binder can resolve >, <, >=, BETWEEN, IN,
  SUM(), AVG() etc. on counter columns even when DuckDB pushes filters into scans
- Patch DuckDB comparison binder to check extension-registered implicit casts
  (one-hop LIST→child resolution) for both equality and inequality comparisons
- Fix CTAS from derived_pu: skip PAC compilation for non-aggregate reads (gate
  on PlanHasAggregate), skip pac_finalize in post-optimizer for CREATE_TABLE so
  counter lists are preserved in the copy
- Add 20 stress tests (Tests 6-25): BETWEEN, DISTINCT, GROUP BY, CASE WHEN,
  SUM/AVG over counters, JOIN two derived_pu tables, subqueries, UNION ALL,
  DELETE with counter filter, INSERT into plain table, IN, !=, NULLIF,
  ORDER BY counter, multiple INSERTs, multi-aggregate INSERT, CTAS chain

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adata, implicit casts

Major overhaul of the post-optimizer pac_finalize injection:
- Use op->types (resolved output) instead of get.returned_types (stale after optimizer column reordering) to seed counter_bindings
- Propagate counter bindings through PROJECTION and AGGREGATE output
- Use LogicalOperatorVisitor::EnumerateExpressions to visit ALL operator-specific expression fields (DISTINCT targets, ORDER BY, TOP_N, etc.)
- Suppress pac_finalize wrapping inside AGGREGATE subtrees (first() in scalar subqueries needs raw FLOAT[] input)
- Fix CTE_REF chunk_types after pac_finalize changes CTE definition output types
- Skip already-wrapped pac_finalize expressions to prevent double-wrapping
- Immediately update PROJECTION types after wrapping so parent operators see correct types
- Add counter_columns field to PACTableMetadata + serialization
- Register implicit casts LIST<FLOAT>→FLOAT and LIST<FLOAT>→DOUBLE with real pac_finalize implementation
- Patch DuckDB comparison binder to check extension-registered implicit casts (one-hop LIST→child resolution)
- Gate derived_pu compilation on PlanHasAggregate; skip pac_finalize in post-optimizer for CTAS
- Add 20 stress tests (Tests 6-25)

Known limitation: scalar subquery equality on counter columns fails (CTE type propagation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aggregates (e.g. first() in scalar subqueries) need raw FLOAT[] counter input.
Suppress pac_finalize injection in AGGREGATE subtrees while still propagating
counter bindings through the AGGREGATE output so parent PROJECTIONs can apply
pac_finalize correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fter pac_finalize

After pac_finalize is injected inside a CASE expression (e.g. CASE WHEN ... ELSE
pac_finalize(x) END), the CASE's return_type and any parent BoundSubqueryExpression
return_type remain stale (FLOAT[] instead of FLOAT). Fix by detecting pac_finalize
in projection expression subtrees and updating expression + operator types.

All 42 tests pass (3684 assertions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…res, consistent APIs

- Extract ExpressionContainsPacFinalize as named static function (was inline lambda)
- Extract CounterListType() to avoid recomputing LIST(PacFloatLogicalType()) everywhere
- Simplify DerivedPuGetInfo struct to just unordered_set<idx_t> (only table_index was used)
- Use EnumerateExpressions consistently (FixAllStaleRefs, StripRedundantCasts)
- Remove all debugging PAC_DEBUG_PRINT from read path
- Remove stale/duplicate comments, consolidate skip_wrapping logic
- Simplify FixCTERefTypes parameters

513 → 432 lines. All 42 tests pass (3684 assertions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…compiler style

- Add detailed file header explaining read/write paths, key challenges
- Add PAC_DEBUG_PRINT at key decision points (GET seeding, wrapping, entry)
- Short-circuit ExpressionContainsPacFinalize when found
- Remove unused logical_order.hpp include
- Add section markers (--- Helpers ---, --- Public API ---)
- Improve inline comments explaining WHY not WHAT

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…90 lines)

All post-wrapping type fixups are handled inline by WrapCounterRefsWithFinalize
(immediate projection type updates + ExpressionContainsPacFinalize) and
ResolveOperatorTypes. The separate FixAllStaleRefs, FixCTERefTypes, and subquery
return_type patching were never triggered by any of the 42 test cases.

513 → 380 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ExpressionContainsPacFinalize: take non-const Expression& (EnumerateChildren
  requires mutable reference; the function doesn't modify the expression)
- pac_optimizer.cpp: add GetMutableTableMetadata() to PACMetadataManager instead
  of const_casting the result of GetTableMetadata()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ting

Replace pac_finalize-based comparisons on derived_pu counter columns with
pac_filter_<cmp>(scalar, counters) calls that evaluate all 64 subsamples
via majority voting. This fixes incorrect lexicographic LIST comparison for
scan-level filters and gives more robust results for single-PU groups.

- Comparisons (>, >=, <, <=, =, !=) rewritten to pac_filter_<cmp> in WrapExpr
- BETWEEN decomposed to pac_filter_lte + pac_filter_gte AND conjunction
- Scan-level table_filters rewritten via RewriteCounterFiltersFromGets
- DELETE WHERE on counter columns now correctly uses pac_filter_<cmp>
- Non-comparison contexts (SELECT, ORDER BY) still use pac_finalize
- Removed dead counter_bindings tracking, extracted shared rewrite helpers
- Removed unused DecomposeBetweenOnCounterColumns declaration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ila
Copy link
Copy Markdown
Member Author

ila commented Apr 22, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 22, 2026

@copilot resolve the merge conflicts in this pull request

Resolved and pushed in commit 114cc3d. I merged latest origin/main into this branch and resolved the CMakeLists.txt conflict by keeping both source additions so the build source list is complete. No UI changes were involved, so there is no screenshot to attach.

Copilot finished work on behalf of ila April 22, 2026 09:03
@ila ila merged commit 477a4c7 into main Apr 22, 2026
11 checks passed
@ila ila deleted the ila/pac-derived branch April 22, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants