feat(contrib): introduce contrib extension SPI#4339
Draft
schenksj wants to merge 27 commits into
Draft
Conversation
First two pieces of the contrib extension SPI (PR1.1 + PR1.2 from
docs/contrib-delta-migration-plan.md). No consumers yet -- the
dispatcher arm, Scala SPI, and integration hooks land in subsequent
commits on this branch.
Proto envelope (operator.proto):
- New ContribOp { kind: string, payload: bytes } message added as
variant 117 on the OpStruct oneof. Contrib operators travel
through this envelope so core's proto stays stable when contribs
ship and evolve independently.
Rust SPI (planner/contrib.rs):
- register_contrib_planner(kind, planner: Arc<dyn OperatorBuilder>)
-- intended to be called from a contrib crate's #[ctor] at lib
init time. Last-write-wins on duplicate kinds (test re-registration
convenience; production contribs only register once).
- lookup_contrib_planner_by_kind(kind) -> Option<Arc<dyn OperatorBuilder>>
-- read path the dispatcher (PR1.3) will use.
- registered_contrib_kinds() -> Vec<String> -- diagnostics.
- 2 unit tests covering registration round-trip + duplicate-kind
replacement; both pass.
Exhaustive-match accommodations (operator_registry.rs, jni_api.rs):
- operator_registry::get_operator_type returns None for ContribOp;
PR1.3 will add the dispatcher arm in planner.rs that bypasses
this lookup and goes through the contrib registry instead.
- jni_api::op_name returns "ContribOp" for the new variant
(informational tracing label).
The reused trait is core's existing OperatorBuilder rather than a new
ContribOperatorPlanner trait -- their signatures are identical and
duplicating would force contribs to maintain a parallel definition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.3 from docs/contrib-delta-migration-plan.md. The Rust planner now dispatches OpStruct::ContribOp through the registry added in PR1.2 by calling lookup_contrib_planner_by_kind(kind) and delegating to the returned OperatorBuilder. When no planner is registered for the kind, surfaces a clear ExecutionError that names the missing Cargo feature -- this is the typical case when the contrib's JVM JAR is on the classpath but core was built without the matching `contrib-<name>` feature. No behaviour change for any existing operator. Contribs activate once their rlib is linked into core's cdylib via the Cargo feature and their #[ctor] runs at lib-init time. PR1.7 (contrib/example/) will exercise this end-to-end with the first concrete contrib. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.4 from docs/contrib-delta-migration-plan.md. Three new files under
spark/src/main/scala/org/apache/comet/spi/:
CometScanRuleExtension.scala
Trait that contrib JARs implement to intercept scan transformation
in CometScanRule. Exposes both matchesV1/transformV1 (FileSourceScanExec)
and matchesV2/transformV2 (BatchScanExec) overrides; both default to
"doesn't match", letting contribs claim only the scan flavour they
care about.
CometOperatorSerdeExtension.scala
Trait that contribs implement to contribute additional
SparkPlan-class to CometOperatorSerde mappings. CometExecRule (PR1.5)
merges these with its built-in allExecs.
CometExtensionRegistry.scala
ServiceLoader-backed process-wide singleton. Idempotent `load()` that
discovers contrib JARs on the classpath via standard
META-INF/services entries. Failures to instantiate individual
extensions are logged but never fatal -- one broken contrib JAR
doesn't take down the Spark session. Test-only resetForTesting() hook.
No callers yet; PR1.5 wires CometScanRule and CometExecRule to consult
the registry, and PR1.6 wires CometSparkSessionExtensions to call
load() during installation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.5 from docs/contrib-delta-migration-plan.md. Three integration hooks added; all are no-ops until contribs are present on the classpath and PR1.6 calls CometExtensionRegistry.load() during extension install. CometScanRule.transformV1Scan After the Spark 3.4 AQE-DPP gate, iterate CometExtensionRegistry.scanExtensions. First extension whose `matchesV1` returns true gets `transformV1` called. Some result replaces the scan; None falls through to core's existing file-format dispatch. CometScanRule.transformV2Scan Same shape for BatchScanExec via matchesV2/transformV2. CometExecRule.transform.convertNode (operator dispatch) When a non-shuffle, non-broadcast operator has all-native children, the lookup now consults `(allExecs ++ contribSerdes)` where `contribSerdes` is the union of every registered CometOperatorSerdeExtension's `serdes` map. Contrib operator classes (e.g. a future Delta-contrib's CometDeltaNativeScanExec) get matched here without core having to know about them. Iteration order is registration order (i.e. ServiceLoader discovery order, which is classpath-stable per JVM run). Contribs that need priority should be the first META-INF/services entry on the classpath; in practice contribs claim disjoint scan types so the ordering rarely matters. No regression risk: with no extensions loaded (the state on this branch and on main today), every hook short-circuits in O(1) and falls through to the existing code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.6 from docs/contrib-delta-migration-plan.md. Adds a single call to CometExtensionRegistry.load() at the top of CometSparkSessionExtensions.apply, before any of Comet's rules are registered. Discovery happens once per JVM (idempotent), so subsequent SparkSession installs are no-ops. With no contrib JARs on the classpath the call discovers nothing and returns; with contribs present, their META-INF/services entries are enumerated and the registered extensions become visible to CometScanRule (PR1.5) and CometExecRule (PR1.5). Closes the JVM half of the contrib SPI: every PR1 piece for the JVM side is now in place. Remaining PR1 deliverables are the contrib/example/ minimum example (PR1.7) and the contributor guide (PR1.8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Substantive piece of PR1.7. Two structural changes break what was
otherwise going to be a cyclic dependency between core and contribs:
1. New crate native/contrib-spi/ (`comet-contrib-spi`):
- Defines the `ContribOperatorPlanner` trait that contribs implement.
- Owns the process-wide registry (`register_contrib_planner`,
`lookup_contrib_planner_by_kind`, `registered_contrib_kinds`).
- Light-weight `ContribError` enum for SPI errors. Core converts to
its own `ExecutionError` at the dispatch site.
- 1 unit test covering registration round-trip.
- Only deps: `datafusion` + `log`. No deps on core, no deps from
core back. The SPI is the leaf both core and contribs depend on.
2. New crate contrib/example/native/ (`comet-contrib-example`):
- rlib (NOT cdylib) -- linked INTO core's libcomet via the
`contrib-example` Cargo feature flag on core.
- Registers a `NoOpPlanner` against kind `"example-no-op"` via
`#[ctor::ctor]`. The planner returns a sentinel error so tests
can verify the full JVM->JNI->native->contrib dispatch chain.
- Depends on `comet-contrib-spi` (NOT on core).
- Real contribs follow the same shape: rlib + #[ctor] + thin
dependency on the SPI crate.
Core rewiring:
- native/core/Cargo.toml: `comet-contrib-spi` mandatory dep;
`comet-contrib-example` optional dep gated by feature
`contrib-example` (default-on so released builds ship the
example registered).
- native/core/src/lib.rs: `extern crate comet_contrib_example` gated
by the feature so #[ctor] runs at libcomet load.
- native/core/src/execution/planner/contrib.rs: now just re-exports
the SPI surface for backwards-compatible imports within core.
- native/core/src/execution/planner.rs: ContribOp dispatcher now
recursively builds native children, calls the SPI's `plan(payload,
children) -> Arc<dyn ExecutionPlan>`, wraps in `SparkPlan`. Maps
`ContribError` to `ExecutionError::GeneralError` with a clear
contrib-identified prefix.
Workspace wiring:
- native/Cargo.toml: adds `contrib-spi` to default-members so the
SPI crate is built/checked with the rest of the workspace.
- Adds `../contrib/example/native` to workspace members (NOT
default-members) so it shares the workspace lockfile and
dependency overrides but isn't compiled standalone.
Build state: `cargo check` on all three crates (core, contrib-spi,
contrib-example) is clean. SPI unit test passes. The Maven side of the
example contrib (pom.xml, Scala extension, ServiceLoader entry,
integration test) is NOT in this commit -- it lands in a follow-up
on the same branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.7 part 2 (completes PR1.7). The Rust half landed in d1553b5; this commit lands the Maven module + Scala extension + ServiceLoader entry so `mvn install` from the repo root produces a published comet-contrib-example-* JAR alongside core's comet-spark-*. New files: contrib/example/pom.xml Maven module. Inherits the parent pom; depends on comet-spark (provided scope, transitive Spark/Scala). Disables the parent pom's BanDuplicateClasses enforcer for this contrib because the parent rule was tuned for core (comet-spark shades scala-collection-compat and Spark drags in the same classes unshaded -- a per-module override is cleaner than reshaping the parent rule for every future contrib). contrib/example/src/main/scala/.../ExampleScanRuleExtension.scala Trivial CometScanRuleExtension impl. matchesV1 keys on a test-only marker option so the SPI can be exercised deterministically; matchesV2 / transformV1 / transformV2 inherit trait defaults. Real contribs replace these with their own file-format probes + native dispatch. contrib/example/src/main/resources/META-INF/services/ org.apache.comet.spi.CometScanRuleExtension ServiceLoader manifest entry. This is the single line that makes the contrib JVM-discoverable. contrib/example/src/test/scala/.../ExampleScanRuleExtensionSuite.scala Two tests: 1. ServiceLoader discovers ExampleScanRuleExtension via CometExtensionRegistry.load() with no other configuration. 2. matchesV1 honours the test marker option. Root pom.xml: Adds `<module>contrib/example</module>` to the modules list so `mvn install` from the repo root builds and installs the contrib alongside core. Build state: `mvn install -DskipTests -Pspark-3.5` builds the new module successfully. Native-side Rust artifact (rlib linked into libcomet via Cargo feature `contrib-example`) was already committed in d1553b5. PR1.7 closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1.8 from docs/contrib-delta-migration-plan.md. New doc at
docs/source/contributor-guide/contrib-extensions.md walks future
contrib authors through:
* Architecture overview (JVM JAR + native rlib linked into libcomet
via Cargo feature flag, single cdylib at runtime, ContribOp proto
envelope as the dispatch hop).
* The SPI surface on both sides: traits, registry, error type.
* Required files for a new contrib, mirroring contrib/example/.
* The three existing-file edits needed (root pom, native workspace,
core Cargo features).
* End-to-end wire-format flow.
* Cargo feature gating semantics (--no-default-features for slim
builds; the JVM side is always classpath-driven).
* Testing recommendations modeled on contrib/example/'s suite.
* Cross-references to the migration plan and the SPI crate.
Closes PR1.8. With this commit, all eight PR1 deliverables from
docs/contrib-delta-migration-plan.md are in place on the
comet-contrib-spi branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions surfaced by porting Delta onto the SPI as a local
confidence check (port not committed; see PR1-delta-port-findings.md):
1. CometScanRuleExtension.preTransform tree-level hook
Default-identity method that runs once per plan in CometScanRule._apply
before per-scan dispatch. Lets contribs undo wrapper rewrites their own
Catalyst strategies applied (Delta's PreprocessTableWithDVs is the
motivating case; its strategy wraps DV-bearing scans in
Project(Filter(...)) referencing a synthetic column Comet's reader
can't produce). Without this hook, Delta couldn't move into a contrib
at all without losing the unwrap step.
Shared state between preTransform and transformV1 is the contrib's
problem -- the recommended pattern (documented) is Spark's TreeNodeTag
mechanism, which the existing CometSpark34AqeDppFallbackRule already
uses.
2. Proto layer in contrib/example/
Each contrib now ships its own .proto schema, build.rs running
prost-build, and gitignored src/generated/. contrib/example/ carries
a trivial ExampleConstantScan { row_count } message; a new
ConstantScanPlanner registered under kind="example-constant-scan"
decodes the payload via prost::Message::decode and returns an
EmptyExec sized by the field. Three new tests:
* ctor registers both planners
* payload decode-and-build round-trip
* bad payload surfaces ContribError::BadPayload
This makes the worked reference complete -- future contrib authors
have a runnable proto setup to copy.
3. Class-subclass convention documented
CometExecRule dispatches by op.getClass. Documented the convention
that contribs needing a custom executor should define their own
CometScanExec subclass (or similar) and register the serde keyed on
that class, rather than reusing a generic class with a stringly-typed
scanImpl tag (the legacy Delta pattern that has no analogue in the
class-based SPI dispatch).
Files touched:
spark/src/main/scala/org/apache/comet/spi/CometScanRuleExtension.scala
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala
contrib/example/native/{build.rs, Cargo.toml, src/lib.rs}
contrib/example/native/src/proto/example_op.proto
docs/source/contributor-guide/contrib-extensions.md
.gitignore + native/Cargo.lock
Build state: cargo check across core + contrib-spi + contrib-example
clean. cargo test -p comet-contrib-example: 3/3 pass. cargo test -p
comet-contrib-spi: 1/1 pass. mvn install all modules: clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
apache#4) Extends the contrib SPI so file-scan contribs can build a parquet scan through core without depending on core. Adds to comet-contrib-spi: * ContribPlannerContext trait -- contribs receive a &dyn impl in their plan() call. Methods: session_ctx, build_physical_expr (Catalyst Expr proto -> PhysicalExpr), convert_spark_schema, prepare_object_store, build_parquet_datasource_exec. * ParquetDatasourceParams struct -- 15-field argument bundle mirroring core's init_datasource_exec one-to-one. * ContribOperatorPlanner::plan now takes &dyn ContribPlannerContext as its first argument. Core implements the trait via CorePlannerContext, a thin adapter that borrows &PhysicalPlanner. Dispatcher constructs one per ContribOp arm. Updates the example contrib to take and ignore the new ctx param; tests now use a TestCtx with unimplemented panics for unused trait methods. Surfaced and validated by attempting to host the full Delta dispatcher (~150 lines from delta-kernel-phase-1's OpStruct::DeltaScan arm) on the SPI -- branch contrib-delta-port carries that work. The validation port compiled clean, linked into core's cdylib, and exercised every trait method end-to-end (column-mapping rewrites, DV filter wrapping, schema conversion, expression-planner round-trip, parquet exec construction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every finding from the first review:
Blockers
- B1: Test isolation via ScopedContribPlannerRegistration RAII guard +
_clear_for_test escape hatch (cfg-gated on test or "test-utils" feature).
Negative-lookup test added.
- B2: preTransform documented as V1-only; transformV2 explicitly does not
receive a plan-tree reference. Trait + dispatcher docs aligned.
- B3/B4: #[non_exhaustive] on ParquetDatasourceParams and ContribError.
Constructor (`new`) + `with_*` setters on the params struct so contribs
don't use struct-literal syntax. WrongChildCount.expected switched from
&'static str to String.
- B5: CometScanRule preTransform corruption guard -- log warning when an
extension replaces a FileSourceScanExec whose relation it does not claim.
- B6: Example contrib's #[ctor] wrapped in catch_unwind. Contributor guide
documents panic semantics, the logger-not-ready issue (use eprintln!),
cross-platform ctor-order nondeterminism.
Important
- I1: contrib-example removed from default features. Production cdylib has
empty registered_contrib_kinds(). Build docs updated.
- I2: CometExtensionRegistry.load() moved out of CometSparkSessionExtensions.apply
into a lazy call at the top of CometScanRule._apply (after isCometLoaded).
Sessions that never enable Comet pay zero ServiceLoader cost.
- I3: CometExtensionRegistry.mergedSerdes pre-computed at load() time;
CometExecRule now consults it via .get() instead of rebuilding the merged
map per operator transform. Duplicate-class detector logs a warning when
two contribs claim the same SparkPlan class.
- I4: Multi-extension dispatch now loops over every matching extension and
takes the first that returns Some; "matched but declined" continues to the
next extension before falling back to core. Trait docs updated.
- I5: Unit tests added: ParquetDatasourceParams constructor + setters with
distinguishable bool tuple, CorePlannerContext smoke test that builds a
DataSourceExec and verifies schema flow-through, session_ctx Arc identity,
empty-schema conversion. 7 tests total in contrib-spi, 3 in core.
- I6: prepare_object_store returns (ObjectStoreUrl, object_store::path::Path);
contribs no longer have to reimplement URL parsing for PartitionedFile.location.
- I7: preTransform fold gated on COMET_NATIVE_SCAN_ENABLED. Disabled-Comet +
Delta JAR on classpath no longer strips load-bearing Catalyst wrappers.
- I8: Display test for every ContribError variant verifies the dispatcher's
format!("contrib planner {kind:?}: {e}") preserves variant-discriminating info.
- I9: Dispatcher rejects ContribOp payloads larger than 16 MiB with a
descriptive error.
- I10: CometExtensionRegistry.load() logs a positive INFO message when no
extensions are discovered, so users get a signal in deploy modes where the
context classloader doesn't see the contrib JAR.
Nits
- N2: ConstantScanPlanner log moved from info! to debug!.
- N4: Dead doc link to docs/contrib-delta-migration-plan.md removed.
- N5: ExampleScanRuleExtensionSuite no longer calls SparkSession.stop()
in finally (that tears down the JVM-wide singleton).
- N8: Trimmed comet-contrib-example crate description.
- N9: operator_registry test asserts ContribOp returns None from get_operator_type.
- N10: row_count=0 covered by an additional unit test in the example contrib.
Open-question documentation (contributor guide)
- Send+Sync asymmetry between ContribOperatorPlanner and ContribPlannerContext.
- SPI is alpha-stable; #[non_exhaustive] markers make additive changes minor.
- &[u8] vs Bytes rationale.
- --no-default-features verification + CI matrix suggestion.
- Thin JAR convention + shading guidance.
- Registry-primitive note (may switch to ArcSwap; API unchanged).
- WrongChildCount.expected convention (free-form phrase).
Verified
- cargo check --no-default-features and cargo check (default features) green.
- cargo test -p comet-contrib-spi -p comet-contrib-example: 10 tests pass.
- cargo test -p datafusion-comet --lib (filtered): 4 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regressions
- R1: CometExecRule._apply now calls CometExtensionRegistry.load() at the top
(after isCometLoaded). Previously only CometScanRule.load()ed; rule-injection
order changes or partial injection would have left mergedSerdes empty and
silently un-dispatched contribs.
- R2: CometExtensionRegistry class docstring updated to match the lazy-load
semantics introduced in the first-pass fix.
- R3: Three remaining dead references to docs/contrib-delta-migration-plan.md
rewritten to point at contrib-extensions.md (native/core/Cargo.toml,
root pom.xml, PR1-description.md).
- R4: ContribError::Display wildcard arm now emits {self:?} instead of
"unknown contrib error" so future variants under #[non_exhaustive] keep
their debug-repr message through the dispatcher's format chain.
- R5/R6: preTransform corruption guard rewritten to scan-identity check.
Snapshots every FileSourceScanExec the extension does NOT claim before the
pass, then verifies each one survives in the rewritten tree by reference
equality. Catches class-changing replacements (which the old check missed)
and is robust to plan-tree reordering (which the old zip-by-position was
not).
- R7: New unit test core_planner_context_encryption_flag_reaches_init_datasource_exec
uses the encryption_enabled asymmetry (true triggers a factory lookup that
fails when no factory is configured; false silently succeeds) to verify
that bool reaches the right positional slot in init_datasource_exec.
A swap with case_sensitive / use_field_id / etc. would now fail this test.
Surface tweak
- N-NEW-1: ParquetDatasourceParams::session_timezone switched from &'a str
to owned String. with_session_timezone now accepts `impl Into<String>` so
contribs can pass runtime-computed timezones (from a session config lookup)
without juggling lifetimes. ParquetDatasourceParams loses its lifetime
parameter entirely.
Dispatcher
- N-NEW-2: Payload-size guard moved to AFTER the planner lookup. A bogus
kind now produces the "not registered" error rather than misleadingly
blaming an oversized payload.
CI / regression guard
- N-NEW-7: New unit test production_build_has_no_contrib_planners_registered,
gated on `#[cfg(not(feature = "contrib-example"))]`, asserts the default
cdylib carries zero contrib surface. Catches an accidental re-introduction
of a contrib into core's `default = [...]` feature set.
Verified
- cargo check (default features): green.
- cargo test -p datafusion-comet --lib --no-default-features: 135 tests pass
including the new production-canary.
- cargo test -p datafusion-comet --lib -- execution::planner::contrib:
4 tests pass including the encryption-flag witness.
- cargo test -p comet-contrib-spi -p comet-contrib-example: 10 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regressions - R-NEW-1: CometScanRule preTransform corruption guard switched from mutable.Set[FileSourceScanExec] (Spark case-class equality) to a Vector with `_ eq b` lookup. Two value-equal-but-reference-distinct scans (e.g., self-join after AQE dedup) no longer trigger a false-positive warning. Cost stays O(K * (P + S)). - R-NEW-2: CometExtensionRegistry.load() now runs inside `synchronized` with explicit publication order (write @volatile fields, THEN flip `loaded`). The previous AtomicBoolean-only gate let Thread B observe `loaded=true` and read Seq.empty/Map.empty while Thread A was still loading. AQE concurrent rule application across sub-queries now sees consistent registry state. Polish - N1: Cost comment added to the preTransform guard fold. - N2: Guard comment notes V2 BatchScanExec is out of scope by design. - N3: ContribOp dispatcher now rejects empty `kind` with a dedicated error ("the JVM-side serde produced a malformed envelope") instead of the misleading "build core with `contrib-` feature" message. - N4: Payload-size guard comment corrected -- prost has already decoded the payload by the time we get here; the guard fences the contrib's plan() body, not the original allocation. - N5: Scope limitation documented on the encryption-asymmetry test -- catches swaps involving the encryption_enabled slot only; new bools must come with their own asymmetry witness. - N6: Production canary cfg switched to `not(any(...))` form with a MAINTENANCE comment listing the contract for future contrib features. - N7: resetForTesting visibility widened from `private[comet]` to public; docstring explains that contribs are not required to package under org.apache.comet.* and must still be able to reset between tests. - N8: ContribError::Display wildcard comment clarified -- the wildcard defends downstream Display-as-source consumers; inside the defining crate the match must be exhaustive anyway. Verified - cargo check default features: green. - cargo test -p datafusion-comet --lib -- execution::planner::contrib: 5 tests pass (added 1, was 4). - cargo test -p comet-contrib-spi -p comet-contrib-example: 10 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- F1: CometScanRule preTransform corruption guard swaps ArrayBuffer + `_ eq` for java.util.IdentityHashMap, making survivor lookup O(1) and the documented O(K * (P + S)) cost accurate. - F2: CometExtensionRegistry.resetForTesting() now `synchronized`. Without it a concurrent load() could observe torn state (loaded=false but the fields still populated, or vice versa), causing the next load() to short-circuit and miss re-discovery. - F3: Trimmed overstated comment in load()'s no-extensions branch. - F4: ContribOp dispatcher rejects whitespace-only `kind` (not just empty); displays the raw `kind` repr in the error message. - F5: ContribOp proto reserves tags 3-9 for additive evolution (payload_format_version, compression, contrib_version, etc.) so evolving contribs can't accidentally reuse one. - F6: Contributor guide documents the 16 MiB ContribOp.payload cap and notes contribs with a legitimate need for a higher ceiling should file an issue rather than work around it. Also adds a "MUST NOT call load() from a class's static initializer" note to the load() docstring (Scala monitors are reentrant so it wouldn't deadlock but would shadow the in-flight publication). Verified: cargo check green, 21 core planner tests pass, 10 SPI + example tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every gap surfaced by the doc-completeness validation pass:
Missing topics now covered
- JVM-side proto compilation (protoc-jar-maven-plugin block + shaded
protobuf-java rationale -- contribs MUST inherit the parent pom's
com.google.protobuf -> ${comet.shade.packageName}.protobuf relocation).
- Worked Scala snippet for building a ContribOp envelope from a serde,
including the Java setContribOp(...) name (vs Rust op_struct).
- CometOperatorSerde[T <: SparkPlan] trait shape: enabledConfig,
requiresNativeChildren, getSupportLevel, convert, createExec.
- Full walked-through plan() body exercising every ContribPlannerContext
method (convert_spark_schema, build_physical_expr, prepare_object_store,
build_parquet_datasource_exec) -- mirrors what Delta/Iceberg ports do.
- ServiceLoader diagnostics: the INFO "none discovered" line, the WARN
per-failed-entry line, which logger to enable for debugging.
- Classloader-order story (lazy load post-`--jars` so order doesn't matter).
- CometExtensionRegistry.load() MUST NOT be called from static initializers
(reentrancy shadows in-flight publication).
- Logging conventions (eprintln in #[ctor], log::* with target: elsewhere,
do NOT re-prefix errors with the contrib's kind).
- Error message convention (dispatcher already prefixes with kind).
- Version pinning for out-of-tree contribs (explicit Comet patch version,
not ${project.version}).
- Multi-Spark-version shimming: pick a spark.version.short profile, mirror
Comet's per-profile artifact ID pattern.
- End-to-end Rust+Scala round-trip test pattern with concrete recipe.
- Cargo feature canary maintenance note (when adding contrib-<name>,
extend the not(any(...)) cfg in production_build_has_no_contrib_planners_registered).
Reorganised so a new contrib author finds things in the right order:
- "Required files" + "Wiring into core" + Cargo feature gate moved BEFORE
the SPI deep-dive.
- Prerequisites + .gitignore + workspace-placement constraint called out
upfront.
Inaccuracies fixed
- Operator proto field name (op_struct in Rust, setContribOp on Java
Builder -- explained as a code-generator language difference).
- "open for inheritance" qualifier sharpened: additive default-implemented
methods are a minor bump; abstract-method additions are breaking.
- out_dir = "src/generated" pattern justified as a deliberate deviation
from idiomatic prost (stable include! path for editor tooling).
- contrib-example,contrib-delta example reworded so it doesn't reference
a feature that doesn't exist in-tree yet.
- PR1's CI -> Comet's CI.
- MAX_CONTRIB_PAYLOAD_BYTES named so readers can rg for it.
Nit cleanups
- _clear_for_test added to the SPI table with explicit "test escape hatch
only" caveat alongside ScopedContribPlannerRegistration.
- ContribError convention paragraph cross-linked from the SPI table row.
The result is a 758-line single-document reference that a contrib author
can follow end-to-end without reading core's source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the validation findings against commit 91c40e0: Blockers - I4: JVM proto-shading recipe rewritten. The original claim that the contrib pom would inherit shading from the parent was wrong -- shading is configured in spark/pom.xml as a per-module execution, not via pluginManagement. A contrib generating its own Java proto without its own shade-plugin execution would NoSuchMethodError on setPayload() at runtime because ContribOp.Builder expects ${comet.shade.packageName}.protobuf.ByteString. New section gives the full pom snippet: protoc-jar-maven-plugin + maven-shade-plugin execution that relocates com.google.protobuf to the parent's shade prefix. - R3: with_session_timezone(&scan.session_timezone) didn't compile (&String doesn't impl Into<String>). Fixed to use scan.session_timezone.as_str() with a brief inline comment explaining why. - R2/M1: build_partitioned_files was hand-waved with no shape. Added a full sketched implementation that builds PartitionedFile per task, resolves URLs to object_store::path::Path, sets object_meta.location, and notes the common real-world variations (file-range splitting, partition_values, format-specific filter wrappers). Other fixes - I6: gitignore guidance corrected -- the entry lives in the repo-root .gitignore, not in contrib/example/. Verified the actual entry exists. - I2: build.rs snippet now mirrors contrib/example/native/build.rs exactly (including the cargo:rerun-if-changed=src/proto/ line that prost-build needs to rebuild on schema changes). - I5: op_struct vs contrib_op naming clarified -- op_struct is the oneof name (Rust pattern-match handle), contrib_op is the field name on that oneof (Java setter name). They are not "the same field" with different names; they're a oneof and one of its members. - M3: ServiceLoader-diagnostics section now also covers detectDuplicateSerdeClasses (cross-contrib serde key collision) and register_contrib_planner's last-write-wins WARN on duplicate kinds. - cfg(not(any(...))) placeholder example replaced with the literal current form, plus the explicit "add feature = "contrib-<name>" here" instruction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…model The original SPI design had asymmetric distribution: the native rlib was compiled INTO libcomet via a Cargo feature, but the JVM half shipped as a separate Maven artifact discovered at runtime via ServiceLoader. This made the protobuf-shading recipe in the contributor guide load-bearing (~70 lines of XML to relocate `com.google.protobuf` per contrib) and it gave users a distribution model that didn't actually work -- the native side required a Comet rebuild regardless of how the JVM half shipped. The fix mirrors the native side: contribs are now source directories under contrib/<name>/, NOT Maven modules. Activating `-Pcontrib-<name>` on spark/pom.xml folds the contrib's Scala + resources + proto into comet-spark.jar's normal compile + shade execution. `mvn install` produces a vanilla comet-spark.jar with zero contribs; `mvn install -Pcontrib-example` produces one with the example contrib's classes inside. Same shape as `cargo build --features contrib-example`. Files touched - spark/pom.xml: new contrib-example profile using build-helper-maven-plugin (source roots), maven-resources-plugin (META-INF/services), and an extra protoc-jar-maven-plugin execution (Java proto generation). The default shade execution gains a ServicesResourceTransformer so contrib service files merge cleanly. - pom.xml: <module>contrib/example</module> removed; contribs aren't modules. - contrib/example/pom.xml: deleted. The example is now Scala + resources + Cargo crate, no Maven pom. - spark/.../spi/CometExtensionRegistry.scala: docstring rewritten to describe the bundled model; no logic change. - docs/source/contributor-guide/contrib-extensions.md: rewritten "Architecture at a glance", "Required files", "Wiring into core", "Build matrix", "Proto, JVM side", and "Maven packaging" sections to reflect the new model. The protobuf-shading recipe is gone -- shading is handled by comet-spark's existing shade execution automatically. What this fixes - No Maven cycle (the previous separate-module design hit one and required a dedicated SPI module to break it; the source-injection model avoids the cycle entirely). - One artifact installed: `comet-spark-with-<contribs>.jar` rather than a JAR + per-contrib JARs. - ~70 lines of protobuf-shading boilerplate removed from the contributor guide. The new "Proto, JVM side" section is ~15 lines. - Distribution model is honest: contribs are build-time options on Comet, JVM and native both. Verified: spark/pom.xml parses; `-Pcontrib-example` profile activates cleanly with no Maven reactor errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Q1 (external deps): the previous source-injection-only refactor lost the encapsulation needed for contribs that pull in external Maven deps like delta-spark. Reintroduce a per-contrib pom.xml but as a deps-only artifact (`<packaging>pom</packaging>`, no code, no JAR). The contrib pom enumerates external deps; spark/pom.xml's `contrib-<name>` profile depends on it via `<type>pom</type>` to pull those deps transitively onto comet-spark's classpath. No reactor cycle: the deps pom has no `<dependency>` on comet-spark; it's a leaf list of external deps. contrib/example/pom.xml is the template -- its `<dependencies>` block is empty (the example has no external deps) but the file demonstrates the pattern that a real Delta contrib would use to pull in delta-spark. Q2 (registry primitive): swap RwLock<HashMap> for ArcSwap<HashMap> in comet-contrib-spi's registry. Reads on the dispatch hot path drop from "acquire RwLock read guard + drop" to "atomic load + ref-count bump"; there was never any meaningful reason to make readers interact with a lock since writes happen only during library init (sequential, single-threaded #[ctor] calls). Public API unchanged; all 7 SPI tests still pass. ScopedContribPlannerRegistration and _clear_for_test reworked to use rcu / atomic store respectively. Audit of other concurrency / perf hot spots: no other meaningful issues found. Per-dispatch Arc::clone is already optimal (single atomic refcount bump). CometExtensionRegistry's `synchronized` load() runs once. CometExecRule's mergedSerdes lookup is O(1). The preTransform corruption guard is O(K * (P + S)) per plan with K typically 1-3 -- microseconds, real safety value, keep. Verified: cargo test -p comet-contrib-spi passes (7 tests); maven profile contrib-example activates cleanly without reactor cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture The PR description had drifted significantly from the actual branch: - Said "9 commits"; the branch is now 18. - Claimed JVM contribs ship as separate Maven JARs and users opt in via classpath; the symmetric-distribution refactor inverted that. - Said the registry uses RwLock; it now uses ArcSwap. - Missing all the post-`e018076d` work: ContribPlannerContext, four review-fix rounds, the contributor-guide completeness pass, the symmetric-distribution pivot, the deps-pom pattern. - The architecture diagram still showed ServiceLoader-from-separate-JAR. - The Delta-port confidence check section listed three SPI gaps closed; the fourth (Gap apache#4: ContribPlannerContext) was added later and was missing. The new description accurately enumerates the 18 commits, walks through the current distribution model, and explains the design decisions (single-artifact bundling, ArcSwap registry, source-injection over JAR-shading, deps-only pom per contrib) that emerged across the iterations. It also documents the four review passes and the doc-completeness validation as part of the audit trail. Not committed: this file is a working artifact, used when manually opening the PR. The previous version was a draft; the current is review-ready. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
Author
|
@parthchandra - appreciate your feedback on this proposed design for the contrib capability @andygrove - FYI |
PR1-description.md is a working artifact used when manually opening PR1 upstream; it should not be committed. It slipped in via commit a2ac715. This commit removes it from tracking (file stays on disk locally) and adds it to .gitignore so future `git add -A` doesn't pick it up again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a47d978 to
071286a
Compare
Format-agnostic surface additions PR2's Delta port (and any future file-format contrib) needs without duplicating ~600 lines of CometScanRule's existing schema-check / encryption-gate / marker-dispatch helpers. No specific format is named anywhere in core; contribs register their own tags through the SPI. 1. CometScanRule.isSchemaSupported widened from private to private[comet] so contrib extensions under org.apache.comet.contrib.* can reuse the same schema-check + fallback-reason emission rather than duplicating the 25-line body. Zero behaviour change for in-tree callers. 2. CometOperatorSerdeExtension.matchOperator new default-Some-or-None method for predicate-based serde dispatch. The class-keyed `serdes` map can't disambiguate a marker pattern like `CometScanExec(scanImpl="<contrib-specific-tag>")` (the class is shared with core's generic CometScanExec). Contribs using such markers override matchOperator. Backwards compatible: existing contribs that only populate `serdes` see no change. `serdes` now defaults to Map.empty so contribs that ONLY use matchOperator don't need to override both. 3. CometExecRule three-step dispatch: allExecs (core class map) -> mergedSerdes (contrib class map) -> matchOperator iteration (contrib predicate). First Some wins; multiple extensions' matchOperator results are tried in registration order. 4. CometOperatorSerdeExtension.nativeParquetScanImpls new default-Set.empty method. Contribs that use the CometScanExec marker pattern AND go through Comet's tuned ParquetSource declare their scanImpl tag(s) here. CometScanExec.supportedDataFilters consults the merged set (via CometExtensionRegistry.nativeParquetScanImpls) to decide whether to apply native-parquet filter exclusions. Core no longer needs to hard-code any contrib's tag name. 5. CometExtensionRegistry.nativeParquetScanImpls publishes the merged tag set at load() time. Populated/reset alongside mergedSerdesCache under the same monitor. Contributor guide updated with the matchOperator + nativeParquetScanImpls patterns and explicit guidance that contribs define their own scanImpl strings in their own code -- core's CometConf only carries SCAN_NATIVE_DATAFUSION / SCAN_NATIVE_ICEBERG_COMPAT for core's own variants. Verified - cargo check (default features): green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
071286a to
525b980
Compare
…patch
Two related additions to the contrib SPI surface, both driven by gaps the
Delta regression exposed:
1. Generic per-partition metadata hook on `CometExecRDD` so contribs can
populate executor-side thread-locals (e.g. `InputFileBlockHolder` for
`input_file_name()` / `_metadata.file_path`) from their serialized
per-partition payloads BEFORE the native iterator starts producing
rows. Without this, Delta's UPDATE/DELETE/MERGE/CDC commands resolve
`_metadata.file_path` to empty and throw `DELTA_FILE_TO_OVERWRITE_NOT_FOUND`
for every touched file. Three pieces:
- `CometExecRDD.PartitionMetadataHandler` type alias +
`registerPartitionMetadataHandler` SPI. The signature takes the
`Map[String, Array[Byte]]` data shape (NOT the spark-internal
`CometExecPartition`), so contribs don't have to live under
`org.apache.spark.*` to use it.
- `CometExecRDD.compute()` invokes handlers after plan-data injection,
before instantiating the native iterator.
- `clearPartitionMetadataHandlers()` for test isolation; called from
`CometExtensionRegistry.resetForTesting`.
Lifecycle hook: `CometOperatorSerdeExtension.init(): Unit` (default
no-op), called once per JVM by `CometExtensionRegistry.load` after
discovery. Contribs override to register their handler. Failures in
one contrib's `init` are caught and logged so they don't take down
sibling contribs.
2. `matchOperator` dispatch now fires for `CometScanExec` markers tagged
with a registered contrib's `scanImpl` (anything in
`nativeParquetScanImpls`). Before this, the generic
`case op if isCometScan(op)` branch matched first and routed the
marker through `CometScanWrapper`, so the contrib's `serialize` (and
any format-specific concerns inside it -- Delta column-mapping
physical-name substitution, etc.) was never reached. The dispatch
order is now: `SCAN_NATIVE_DATAFUSION` -> contrib-marker via
`matchOperator` -> generic `isCometScan` catch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revert the `case scan: CometScanExec if nativeParquetScanImpls.contains(...)` branch added in e417211. Unconditionally routing the contrib marker through the contrib's `matchOperator` -> serde regressed ~525 previously-passing Delta tests: the full conversion path (`CometDeltaNativeScan.serialize` -> `CometDeltaNativeScanExec`) returns 0 rows for the streaming-source `PreparedDeltaFileIndex` shape that the existing JVM-side `CometScanWrapper` path handles correctly. The motivation for the dispatch change (CM-name post-rename returning wrong values) still stands but needs a different fix: the marker-passthrough path has to apply Delta's logical->physical name substitution itself rather than forcing every Delta scan through the heavier kernel path. Left as a TODO. The partition-metadata SPI hook (also added in e417211) is unaffected by this revert and remains in place -- it's what addresses the broader UPDATE/DELETE/MERGE/CDC failures via `InputFileBlockHolder` population. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-instate the marker-dispatch case reverted in 35f1b3b, with a tighter charter: When a `CometScanExec` is tagged with a contrib's `scanImpl` (any string listed in the contrib's `nativeParquetScanImpls`), route it through that contrib's `matchOperator` serde rather than the generic `CometScanWrapper`. The contrib chooses per-scan whether to claim it (returning `Some`) or defer to the wrapper (returning `None`). If `matchOperator` returns `None`, OR if the chosen serde's `convertToComet` returns `None`, the marker falls back to the generic wrapper -- so the contrib retains full control over when the heavier native conversion fires. The previous revert was driven by a downstream bug: when this dispatch unconditionally claimed every marker, the Delta contrib's `CometDeltaNativeScan.serialize` returned 0 rows for simple streaming-source reads, regressing ~525 tests. With the new design that bug is gated behind the contrib's `matchOperator` -- contribs that aren't ready for the full conversion just return `None` until their `serialize` handles every case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`PlanDataInjector` previously had a hardcoded list of built-in injectors with a `// Future: DeltaPlanDataInjector, HudiPlanDataInjector, etc.` comment. That meant a Delta-contrib (PR2) could not actually plug its per-partition proto-injection logic into the execution path: tasks serialized via `perPartitionByKey` never got merged back into the operator tree at `CometExecRDD.compute` time, so the native side decoded `DeltaScan` with an empty `tasks` list and returned `EmptyExec` (0 rows) for any non-empty Delta scan that took the native conversion path. Promote the injector list to a built-in seq + a registerable contrib seq, and expose `registerInjector` / `clearContribInjectors` on the singleton. Same pattern as the `CometExecRDD.PartitionMetadataHandler` SPI added in e417211: contribs register their injector from `CometOperatorSerdeExtension.init`, `CometExtensionRegistry.resetForTesting` clears the registry alongside other contrib state for test isolation. Visibility of the `PlanDataInjector` object had to be widened from `private[comet]` to package-default (public) so the registry-reset call site in `org.apache.comet.spi.CometExtensionRegistry` can reach the `clearContribInjectors` method. The trait stays `private[comet]` (and so does the rest of the implementation) — contribs in `org.apache.comet.contrib.*` can still see it via the subpackage rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…put boundary `foreachUntilCometInput` enumerates known Comet input-class types (`CometNativeScanExec`, `CometScanExec`, etc.) and recurses past everything else. When a contrib-defined leaf native exec (e.g. the Delta contrib's `CometDeltaNativeScanExec`) appeared in the plan, it matched the generic `case _: CometPlan` arm and recursed into its empty children list without ever invoking `func`. The caller then saw an empty `sparkPlans` buffer and crashed on `firstNonBroadcastPlan.get` -- `None.get` at operators.scala:534. Add a case before the `_: CometPlan` recurse arm: any `CometNativeExec` with zero children is a Comet input. The explicit list above still wins for the known types (preserves existing behaviour exactly), and contribs' leaf scans now participate without needing core-side class enumeration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(contrib): introduce contrib extension SPI
Branch:
comet-contrib-spiBase:
mainCommits: 18 (one architectural cluster + four review-fix rounds + a doc-completeness pass + the symmetric-distribution refactor)
Summary
Adds the infrastructure for contrib extensions — self-contained modules that ship as
part of Comet's release but plug in via a stable SPI rather than being hard-wired into
core. Core gains no functional behaviour change; with no contrib feature/profile
enabled, the dispatch hooks are no-ops and
registered_contrib_kinds()is empty.The first concrete contrib (
contrib/example/) is a worked reference: Scala extensionclasses, a
CometScanRuleExtensionimplementation, a Rust rlib registering aContribOperatorPlanner, aMETA-INF/services/entry, its own.protoschema withprost-build wiring, and unit tests covering registration, proto decode round-trip, and
the error path. New contrib authors copy this directory layout.
This is the first of a two-PR sequence. PR2 (a follow-up off
mainafter thislands) will port the existing draft PR for delta-kernel-rs integration onto this SPI and into
contrib/delta/. The SPI shape has been validated end-to-end against a real Delta porton a separate branch — see the testing section below.
Distribution model
Both halves of a contrib are bundled into Comet's released artifacts at build time when
their matching flags are enabled. Contribs are not independently distributable — they
ship inside Comet's release.
contrib/<name>/src/main/scala/, compiledINTO
comet-spark.jarby activating-Pcontrib-<name>onspark/pom.xml. Thecontrib's
META-INF/services/entries go along for the ride; ServiceLoader atruntime discovers them from inside
comet-spark.jaritself. The contrib has a tiny<packaging>pom</packaging>Maven pom that exists solely to enumerate externaldeps (e.g., a Delta contrib's pom would carry
<dependency>io.delta:delta-spark</dependency>).rlibcrate (NOTcdylib) linked INTOlibcometvia thematching
--features contrib-<name>Cargo flag onnative/core. The contrib's#[ctor]registers its operator planners during library load.mvn install -Pcontrib-example && cargo build --features contrib-exampleproduces aComet build that includes the example contrib in both
comet-spark.jarandlibcomet.A vanilla
mvn install && cargo buildproduces a build with zero contrib surface.The wire format between JVM and native uses a single generic envelope on the operator
proto,
ContribOp { kind, payload }. Core's planner dispatches bykind; the contrib'snative crate registers planners against the same
kindstring the contrib's JVM codewrites into the proto.
Architecture
What's in this PR (18 commits)
Foundational SPI (5 commits)
51eb0fffContribOp { kind, payload }proto envelope; Rust planner registry SPIf448693bOpStruct::ContribOpdispatcher arm inplanner.rsf23500dfCometScanRuleExtension,CometOperatorSerdeExtension,CometExtensionRegistry42234b96CometScanRule(V1 + V2) andCometExecRuleconsult the registry8b694715CometSparkSessionExtensions.applyhooks the registryWorked-reference contrib (3 commits)
d1553b55contrib/example/: rlib crate,#[ctor]registration; introducednative/contrib-spi/leaf crate to break a cyclic dep5cb7099acontrib/example/: Scala extension, ServiceLoader entry, integration test8508ec50SPI shape refinements (2 commits)
e018076dpreTransformtree-level hook onCometScanRuleExtension, proto layer incontrib/example/, class-keyed dispatch convention documented14e49448ContribPlannerContexttrait +ParquetDatasourceParamsargument bundle (SPI gap #4 — see "Delta-port confidence check" below)Review-fix rounds (4 commits)
8930b698#[non_exhaustive]markers,preTransformcorruption guard,#[ctor]panic safety, dropcontrib-examplefrom default features, gate registry load lazily, cache mergedSerdes, multi-extension dispatch semantics,prepare_object_storereturnsPath, gate preTransform onCOMET_NATIVE_SCAN_ENABLED, 16 MiB payload cap, "none discovered" diagnostic68fff43fCometExecRuleself-loads, stale docstring, dead doc refs,Displaywildcard fix, corruption-guard rewrite (identity check, class-changing replacements caught),ContribOpsize guard ordering, encryption-asymmetry positional-arg test, ownedStringfor session_timezone, production-canary#[cfg]teste4e6e6c6IdentityHashMapsurvivors set,synchronizedload() publication order, empty/whitespace kind rejection, doc accuracy, scope notes6652963cresetForTestingsynchronized, doc trims, whitespace kind rejection,#[cfg(not(any(...)))]form, publicresetForTesting, wildcard-arm comment,Displaydebug repr,ContribOpprotoreservedblock, payload cap docContributor guide completeness (2 commits)
91c40e0aplan()body walkthrough using everyContribPlannerContextmethod,CometOperatorSerde[T]contract, diagnostics story, multi-Spark-version, end-to-end testing recipe, Cargo-canary maintenance note2c46552cArchitectural pivot to symmetric distribution (2 commits)
cf5253edcomet-spark.jarvia Maven profile. Mirrors the native side's "Cargo feature pulls rlib into libcomet" model. No more separate contrib JARs. ~70 lines of protobuf-shading boilerplate deleted from the contributor guide (shading now handled automatically bycomet-spark's existing shade execution).c7656fccdelta-spark) without recreating a Maven reactor cycle. Registry primitive:RwLock<HashMap>→ArcSwap<HashMap>for lock-free reads on the dispatch hot path.Notable design decisions
Why bundle into one artifact?
Previously the contrib's JVM half shipped as a separate Maven JAR users would
--packagesor--jarsonto their classpath. That asymmetry made no sense given thenative side already requires a Comet rebuild (Cargo feature flag) for the contrib to
work — pretending the JVM half was distributable independently was a fiction. The
symmetric design has one artifact per side, both varying based on which contribs were
enabled at Comet build time. This eliminated the protobuf-shading recipe that
externally-published contrib JARs needed (which was the single biggest source of doc
complexity).
Why a separate
comet-contrib-spiRust crate?Cycle break. Core would need to depend on contribs (to link them); contribs need core's
trait types (
ContribOperatorPlanner). Solution: a leaf crate both depend on, withnothing depending back on core from a contrib.
Why
ContribPlannerContext?Surfaced by the Delta-port confidence check. A real file-scan contrib needs five
core-side facilities:
init_datasource_exec,prepare_object_store_with_configs,convert_spark_types_to_arrow_schema, expression-planning (create_expr), and aSessionContexthandle. Exposing these as a trait core implements (and contribs usevia
&dyn ContribPlannerContext) avoids a back-dep on core while giving contribseverything they need to compose with Comet's tuned parquet path.
Why
ArcSwapinstead ofRwLockfor the registry?Reads are on the dispatch hot path; writes happen exclusively during library init from
sequential
#[ctor]s. The init-once / read-many access pattern is whatArcSwapisdesigned for. The original
RwLock<HashMap>would have introduced reader-writercontention with no actual concurrent-write workload to justify it.
Why bundling-via-source-injection rather than bundling-via-shaded-JAR?
A separate Maven module per contrib whose JAR gets shaded into
comet-spark.jarwould form a Maven reactor cycle (contrib's pom depends on
comet-sparkfor SPItypes;
comet-spark's contrib profile depends on contrib's JAR). Source-injectionavoids the cycle: the contrib's Scala sources are compiled INSIDE
comet-spark'sown compilation pass (via
build-helper-maven-plugin'sadd-sourcegoal); noseparate compile, no per-contrib JAR, no cycle. External Maven deps (
delta-spark,etc.) flow through the contrib's separate
<packaging>pom</packaging>artifact.The Delta-port confidence check
Before opening this PR, I ported Comet's existing Delta integration (~3,200 lines on
the
delta-kernel-phase-1branch) onto this SPI as a confidence check. The portitself is not committed here — its purpose was to surface SPI gaps before review.
Four gaps were found, all addressed in this PR:
CometScanRuleExtension.preTransform.contrib/example/→ added a trivialExampleConstantScanmessage,build.rs, prost-build wiring, and tests.contributor guide.
ContribOperatorPlanner::planlacked access to core's parquet / expressionmachinery → introduced
ContribPlannerContexttrait +ParquetDatasourceParamsbundle in commit
14e49448. The full ~150-line Delta dispatcher body compiledclean against the new SPI surface; every trait method was exercised end-to-end.
Full findings live in
PR1-delta-port-findings.md(not committed; review-prepartifact).
Net conclusion: the SPI is the right shape for a real consumer. PR2's Delta port can
proceed mechanically with no further SPI surprises expected.
Review iterations
The branch went through four independent clean-context code review passes (general-purpose
subagent reviews launched fresh on each iteration's HEAD). Each pass surfaced a different
class of issue:
14e49448, fixed by8930b698): 6 blockers + 10 important +nits. Test isolation, SemVer markers, V2 asymmetry,
#[ctor]panic safety, defaultfeature leakage, ServiceLoader gating, more.
8930b698, fixed by68fff43f): 3 regressions + 4 polishitems + 10 new findings. Stale class docstring, dead doc refs,
Displayinfo-loss, corruption-guard class-change bypass.
68fff43f, fixed bye4e6e6c6): 2 regressions + 8 polish.Survivors-set false positive,
load()publication race.e4e6e6c6, fixed by6652963c): no blockers found; 6 polish.Cost-comment accuracy, identity-map upgrade, payload-guard message ordering.
A separate completeness validation of the contributor guide (the doc-only audit pass)
identified ~13 gaps a real contrib author would hit. Closed in
91c40e0aand2c46552c.Build verification
cargo check(default features): green.cargo check --no-default-features: green; zero contrib surface in the resultinglibcomet.cargo test -p comet-contrib-spi: 7 tests pass (registry round-trip, scopedregistration, kinds snapshot, params constructor/setters,
ContribErrorDisplaypreservation).
cargo test -p datafusion-comet --lib -- execution::planner::contrib: 5 tests pass,including the production-canary that asserts default cdylib has no registered
contribs and the encryption-asymmetry test that catches positional-arg swaps in
init_datasource_exec.cargo test -p comet-contrib-example: 4 tests pass (ctor registration, decode +build, zero rows, bad payload).
-Pcontrib-exampleactivates the profile cleanly;comet-contrib-example-depsbuilds first, then
comet-sparkwith the contrib's sources injected. No reactorcycle.
What this PR is NOT
CometOperatorSerdeExtensionfrom the example contrib (theexample only demonstrates
CometScanRuleExtension; the trait surface is documentedand validated against the Delta-port confidence check).
BanDuplicateClassesenforcer is no longer overridden per-contrib(contribs are no longer separate Maven modules); the rule applies to
comet-sparkitself as before.
How to review
Suggested reading order:
docs/source/contributor-guide/contrib-extensions.md— author-facing guide; doublesas the architectural overview.
native/proto/src/proto/operator.proto— theContribOpenvelope (small, look forthe new variant on
OpStruct).native/contrib-spi/src/lib.rs— the leaf SPI crate (~370 lines incl. tests).spark/src/main/scala/org/apache/comet/spi/— three small files defining the JVMSPI.
native/core/src/execution/planner.rs— theOpStruct::ContribOpdispatcher arm(~lines 1960–2020).
native/core/src/execution/planner/contrib.rs—CorePlannerContextadapter thatexposes core's parquet/expression infrastructure to contribs through the SPI trait.
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scalaandCometExecRule.scala— the integration hooks. Each is a small insertion at thetop of
_apply; thepreTransformfold runs once per plan.contrib/example/— the worked reference (deps-pom + Scala source + Cargo crate).spark/pom.xml'scontrib-exampleprofile — the template for wiring a newcontrib into the build.
Risks / follow-ups (tracked for PR2)
CometOperatorSerdeExtensionnot yet exercised by a contrib. The examplecontrib only demonstrates
CometScanRuleExtension. PR2's Delta port will exercisethe operator serde path via
CometDeltaNativeScanExec's dedicated class.libjvmon the dyld path. Runningcargo test -p datafusion-comet --libon macOS requiresDYLD_LIBRARY_PATH=$JAVA_HOME/lib/server(only relevant when the test binarytransitively links against the JNI crate). Preexisting on
main— not introducedby this PR — but worth documenting.
-Pcontrib-example,--features contrib-examplerow so thebundling path is exercised in CI on every PR. Today only the slim build is in CI.
Implemented with Claude code - opus 4.7[1m]