Draft
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
antiguru
reviewed
Mar 13, 2026
Member
antiguru
left a comment
There was a problem hiding this comment.
I think this checks out! We should aim at having two different MFP implementations, and a runtime dispatch between them so we only have to pay the price of specializing the MFP once. Then there's a bunch of code that we could integrate into the sqlfunc macro, but no need to block on this.
3 tasks
Instead of converting MirScalarExpr to VectorScalarExpr on every evaluate_batch call, do the conversion once at operator construction. MfpEval enum dispatches between scalar (temporal) and vectorized (nontemporal) evaluation, storing only the needed variant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hand-written vectorized binary function dispatch with a declarative `vectorized = "..."` parameter on `#[sqlfunc]`. This keeps vectorized implementations in sync with function definitions. Introduces a `VectorizedBinaryFunc` trait with default no-op methods. The sqlfunc macro generates a specialized impl when `vectorized` is set, and `derive_binary\!` delegates through the trait on `BinaryFunc`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded discriminant constants with values derived from ColumnDatumKind, a companion enum generated by the enum_kinds crate. The macros now take only the variant name and derive the discriminant automatically, keeping them in sync with the enum definition order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Break long line in derive_binary\! macro to stay within 100-char limit * Replace `as u8` enum casts with a `discriminant()` helper to satisfy the `as_conversions` clippy lint * Replace `len as u64` with `u64::cast_from(len)` * Add Cargo.lock update for enum-kinds dependency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a criterion benchmark comparing vectorized vs scalar MFP evaluation on integer columns (100 to 100k rows), plus a rows_to_columns conversion benchmark. Add `enable_compute_vectorized_mfp` dyncfg (default false) to gate vectorized MFP evaluation. Thread the flag through persist_source, persist_source_core, and decode_and_mfp so the compute layer can enable it via worker config. Add `SafeMfpPlan::to_vectorized()` to expose VectorizedSafeMfpPlan construction without accessing private fields. Convert the oneshot source (render_decode_chunk) to use vectorized batch evaluation when the flag is enabled, with scalar fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a benchmark that reads from `Column<(Row, Timestamp, Diff)>`, evaluates an MFP (vectorized vs scalar), and encodes results back into a Column. This mirrors how persist_source processes batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update insta snapshots for new VectorizedBinaryFunc impl generation. Fix as_conversions, disallowed zip, and redundant closures in vectorized evaluation code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the flag to FlipFlagsAction in parallel workload and to get_variable_system_parameters in mzcompose so CI exercises both true and false values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e715afa to
e66d1a8
Compare
The lockfile was regenerated during rebase conflict resolution, which pulled in AWS SDK versions requiring rustc 1.91.1. Restore from upstream/main and re-resolve only our dependency additions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix as_conversions and disallowed zip in persist_source.rs vectorized path. Apply rustfmt and fix broken doc link in vectorized.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru
pushed a commit
to antiguru/materialize
that referenced
this pull request
Mar 25, 2026
…0.1) Research findings on feasibility of column-of-datums arrangement spines: - Current DatumContainer stores rows as contiguous bytes with offset indexing - Columnar spines require schema propagation, new BatchContainer impls, modified merge/cursor logic, and vectorized eval (PR MaterializeInc#35464) as prereq - Recommended phased approach starting with vectorized MFP evaluation https://claude.ai/code/session_01JHo5sTCSGPW5NavNE2b49d
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a POC written largely by Claude for vectorization of
MirScalarExpression, using standard array language / columnar idioms. The expressions are still interpreted, but act on batches of 1024 elements. It does quite a lot of printing as it does the work, and this should all be tidied up, but sharing for discussion.Here is a summary from Claude of the work, and the next steps: