Skip to content

feat(session): register object_store backends on SessionContextBuilder#73

Merged
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/register-object-store
May 20, 2026
Merged

feat(session): register object_store backends on SessionContextBuilder#73
andygrove merged 1 commit into
apache:mainfrom
LantaoJin:feat/register-object-store

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

SessionContext.registerParquet(name, "s3://bucket/...") (and the read/register counterparts for CSV / NDJSON / Arrow / Avro) accept arbitrary path strings, but until this PR there was no Java surface to attach an object_store::ObjectStore to a URL scheme + bucket. As a result, s3://, gs://, and remote https:// paths fail with "No suitable object store found for ..." unless the embedder shells out to process-level environment variables — and even then, multi-tenant JVMs cannot give two contexts different credentials in the same process.

DataFusion's Rust RuntimeEnv::register_object_store(url, store) already solves this end of the problem; the gap was purely in the Java surface above the JNI line. This PR closes it with a typed, per-context registration API:

SessionContext ctx = SessionContext.builder()
    .registerObjectStore(ObjectStoreOptions.s3()
        .bucket("my-bucket").region("us-east-1")
        .accessKeyId("...").secretAccessKey("...").build())
    .registerObjectStore(ObjectStoreOptions.s3()
        .bucket("other").endpoint("https://minio.internal:9000")
        .allowHttp(true).build())
    .build();

ctx.registerParquet("orders", "s3://my-bucket/orders/");

What changes are included in this PR?

  • Proto: new proto/object_store_options.proto
  • Java API: new org.apache.datafusion.ObjectStoreOptions
  • Native: new native/src/object_store.rs
  • Cargo features: object-store-{aws,gcp,http}, with default = ["object-store-aws", "object-store-gcp", "object-store-http"]
  • Build wiring: proto/object_store_options.proto

Are these changes tested?

23 new tests across ObjectStoreOptionsTest and SessionContextObjectStoreTest

Are there any user-facing changes?

Yes, additive only, no breaking changes:

  • New public class org.apache.datafusion.ObjectStoreOptions (sealed) with three concrete subtypes S3 / Gcs / Http and per-backend builders. Static factories: ObjectStoreOptions.s3(), .gcs(), .http(listingUrl).
  • New public method SessionContextBuilder.registerObjectStore(ObjectStoreOptions).
  • Generated protobuf classes ObjectStoreRegistration, S3Options, GcsOptions, HttpOptions under org.apache.datafusion.protobuf (consistent with existing options bundles).
  • New repeated ObjectStoreRegistration object_stores = 8 field on SessionOptions (proto-3 forward compatible — older builds simply ignore it).

Existing APIs are unchanged. The make build picks up object-store-{aws,gcp,http} Cargo features by default; downstream Rust builds that strip a feature trip a clear runtime error if a caller registers the missing backend.

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks @LantaoJin

@andygrove andygrove merged commit 7f49541 into apache:main May 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expose registerObjectStore for S3 / GCS / HTTP backends

2 participants