Skip to content

feat(builder): expose ConfigOptions.set/get as setOption / setOptions / getOption#49

Merged
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/session-set-option
May 14, 2026
Merged

feat(builder): expose ConfigOptions.set/get as setOption / setOptions / getOption#49
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/session-set-option

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

@LantaoJin LantaoJin commented May 14, 2026

Which issue does this PR close?

Rationale for this change

SessionContextBuilder (introduced in #28) currently exposes typed setters for the six most-common session knobs: batchSize, targetPartitions, collectStatistics, informationSchema, memoryLimit, and tempDirectory. The Rust DataFusion configuration surface backing these is roughly 200 keys split across seven sections under SessionConfig (datafusion.catalog.*, datafusion.execution.*, datafusion.optimizer.*, datafusion.sql_parser.*, datafusion.explain.*, datafusion.format.*, plus user extensions). The Java builder reaches none of those except the six it names explicitly, and there is no Java surface to read any config value back at all.

DataFusion already exposes a string-keyed setter for these (ConfigOptions::set(key, value)). This PR mirrors that surface in Java rather than adding ~200 named setters one at a time.

What changes are included in this PR?

  • Two additions to the existing SessionOptions proto: a new repeated ConfigOption options = 7; field plus the ConfigOption { key, value } message. No existing field changes; wire format stays backward and forward compatible. repeated is used instead of map<string, string> because protobuf maps decode into HashMap on the Rust side, whose iteration order is randomized — that would silently break overlapping-key cases like datafusion.optimizer.enable_dynamic_filter_pushdown (which has side effects on the per-operator enable_*_dynamic_filter_pushdown flags).
  • Two new methods on SessionContextBuilder:
    • setOption(String key, String value) — single entry.
    • setOptions(Map<String, String> entries) — bulk apply.
  • One new method on SessionContext:
    • getOption(String key) → String — read the current value of any datafusion.* config key. Returns the value as a string, or null if the key is recognised but has no value set and no default. Throws RuntimeException on unknown keys (mirrors setOption's strictness for fast feedback on typos), and IllegalStateException when called on a closed context.
  • Native side: free-form options are applied via config.options_mut().set(k, v)? before the SessionContext is constructed, so the typed batchSize/targetPartitions/etc setters can be overridden by an explicit setOption for the same key. getOptionNative walks ctx.copied_config().options().entries().

The Rust calls use the ? form (rather than SessionConfig::set_str(...).unwrap()) so unknown keys or unparseable values surface as a RuntimeException on the Java side carrying DataFusion's error message, instead of panicking the JVM.

What's intentionally out of scope: datafusion.runtime.* keys

datafusion.runtime.* keys (memory limit, temp directory, cache sizes, list-files-cache TTL) live on a separate RuntimeEnv config object. Round-tripping them through getOption/setOption has several upstream-shaped correctness pitfalls that don't apply to the session-config subtree (lazy default tempdir creation that materializes a path the user never set, per-session datafusion-Xxxxxx spill suffixes, OS-specific path separators, integer K/M/G truncation that loses fractional capacities, sub-1KB values formatted as bare bytes, the unlimited sentinel, and disk-manager state being clobbered when multiple runtime keys are routed through SET). Doing them right needs a per-context side-cache of the user's verbatim values.

This PR rejects datafusion.runtime.* in both setOption and getOption with a clear error pointing at the typed memoryLimit / tempDirectory setters. Adding round-trippable runtime support is tracked as a separate follow-up.

Apply order for setOption

Map entries are applied after typed setters by design — otherwise a caller writing both batchSize(8192) and setOption("datafusion.execution.batch_size", "1024") would have their explicit setOption silently dropped. The opposite order would make setOption strictly weaker than the typed setters and give callers no way to override.

Among setOption calls themselves, entries are applied in caller insertion order (first call applied first). The Java side stores entries in a LinkedHashMap, the proto field is repeated (preserves order on the wire), and the Rust side iterates a Vec (preserves order on apply). Caller's last write wins for both same-key duplicates and overlapping side-effect keys.

Why getOption lives on SessionContext, not the builder

The value getOption returns is "what DataFusion actually compiled". That's only knowable post-construction, so the read-side method belongs on the constructed SessionContext. A builder-side getter would only return what's pending in the local map, which is a strictly worse signal.

Are these changes tested?

Yes — 14 new tests in SessionContextBuilderTest, bringing the file from 5 to 19 tests total. All pass.

Are there any user-facing changes?

Yes — purely additive. New public API:

  • SessionContextBuilder.setOption(String, String)
  • SessionContextBuilder.setOptions(Map<String, String>)
  • SessionContextBuilder.setOptions(LinkedHashMap<String, String>)
  • SessionContext.getOption(String) → String

The new options field is also exposed on the existing org.apache.datafusion.protobuf.SessionOptions generated class (getOptionsMap() etc.). No API removals, no deprecations, no behavior change for callers who don't call any of the new methods.

@LantaoJin LantaoJin marked this pull request as draft May 14, 2026 07:29
… / getOption

DataFusion's SessionConfig carries roughly 200 keys split across seven sections (datafusion.execution.*, datafusion.optimizer.*, etc). The Java SessionContextBuilder introduced in apache#28 covers six of them with named setters, and there is no Java surface to read any config value back at all. Rather than ship ~200 named get/set pairs one at a time, mirror DataFusion's existing string-keyed surface (ConfigOptions::set) as a generic escape hatch on the builder + context.

Adds two additions to session_options.proto: a `repeated ConfigOption options = 7;` field plus the matching ConfigOption message. `repeated` is used instead of `map<string,string>` because protobuf maps decode into a Rust HashMap whose iteration order is randomized -- that would silently break overlapping-key cases like `datafusion.optimizer.enable_dynamic_filter_pushdown` (whose setter has side effects on the per-operator `enable_*_dynamic_filter_pushdown` flags). The Java builder gains setOption(key, value) and setOptions(Map). Java-side storage is a LinkedHashMap so caller insertion order is preserved end to end.

On the native side, free-form options are applied via config.options_mut().set(k, v)? before SessionContext construction. Map entries are applied after the typed setters so an explicit setOption call wins over a typed setter for the same knob, and within the entry list the caller's last write wins -- both for same-key duplicates (LinkedHashMap dedups) and for overlapping side-effect keys. The ? form (rather than SessionConfig::set_str's .unwrap()) means unknown keys or unparseable values surface as a RuntimeException with DataFusion's error message instead of panicking the JVM.

Adds SessionContext.getOption(key) on the constructed context (not on the builder, since the value reflects "what DataFusion actually compiled" -- only knowable post-construction). The native side walks ctx.copied_config().options().entries() and returns ConfigEntry.value as a String, or null if the key is known but unset and has no default. Unknown keys throw RuntimeException to mirror setOption's strictness.

datafusion.runtime.* keys (memory limit, temp directory, cache sizes) live on a separate RuntimeEnv config object and have several upstream-shaped round-trip pitfalls that don't apply to the SessionConfig subtree (lazy default tempdir, per-session datafusion-Xxxxxx spill suffixes, OS-specific path separators, integer K/M/G truncation, sub-1KB byte formatting, the unlimited sentinel, multi-statement SET clobbering). Both setOption and getOption reject runtime keys with a clear error pointing at the typed memoryLimit() / tempDirectory() setters; round-trippable runtime support is tracked as a follow-up PR that needs a per-context side-cache.

Tests cover the proto round-trip with explicit on-the-wire ordering assertions, bulk setOptions, null rejection, override-typed-setter semantics (asserted by reading the value back via getOption), last-write-wins for repeated keys, an unknown-key error path on both set and get, default-fallback on get, a closed-context guard on get, the runtime-key rejection on both set and get with messages that point at the typed setters, and the order-preservation case where setting the umbrella `enable_dynamic_filter_pushdown` flag followed by an explicit `enable_topk_dynamic_filter_pushdown=false` correctly leaves topk disabled (the override winning over the umbrella's side effect).

Common knobs this unlocks include parquet pushdown_filters / bloom_filter_on_read, optimizer prefer_hash_join, execution time_zone, sql_parser dialect, and explain show_statistics -- previously inaccessible from Java without adding a named get/set per key.
@LantaoJin LantaoJin force-pushed the feat/session-set-option branch from 30595c0 to 304ef6e Compare May 14, 2026 08:56
@LantaoJin LantaoJin marked this pull request as ready for review May 14, 2026 09:43
Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @LantaoJin

@andygrove andygrove merged commit 4a967e4 into apache:main May 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expose ConfigOptions.set/get as generic SessionContextBuilder.setOption / SessionContext.getOption

2 participants