Skip to content

Bound CometS3CredentialDispatcher cache via refcounted handle lifecycle #4456

@mbutrovich

Description

@mbutrovich

Problem

CometS3CredentialDispatcher caches one provider instance per (FQCN, dispatchKey, catalogProperties) triple in KEY_TO_HANDLE and INSTANCES for the lifetime of the executor JVM. Today eviction only happens via the JVM shutdown hook.

Catalogs that vend per-table credentials inject those credentials into the table's FileIO property bag, which Comet captures into catalogProperties at plan time (see comment in IcebergReflection.scala getFileIOProperties). The Iceberg REST + STS AssumeRole pattern (e.g. Apache Gravitino's S3TokenCredential with default 1h TTL) is the canonical example. Each loadTable returns a fresh session token, so each unique table → a new InstanceKey → a new cached provider instance. On long-running JVMs (Spark Connect, Thrift Server) the maps grow without bound.

Discussion thread: #4309 (comment)

Why a plain LRU is not the fix

Native CometS3CredentialBridge instances hold the handle by value and are reused across scans. Time- or size-based eviction can invalidate a handle that a live bridge still references mid-job, surfacing as opaque credential failures from object_store or opendal.

Proposed approach

Refcounted handle lifecycle driven by the native side:

  • Java: add a JNI-callable releaseHandle(long) that decrements a refcount and, at zero, removes the entry from both maps and calls provider.close() (swallowing exceptions, matching the shutdown-hook precedent).
  • Native: wrap the handle in a struct whose Drop calls releaseHandle. The bridge itself is already shared via Arc, so the wrapping struct's Drop fires when the last Arc goes away.
  • Guard against re-entry during JVM shutdown (the existing shutdown hook already drains everything) and during panic unwind.

The ensureInitialized path stays as-is: computeIfAbsent returns the existing handle and increments the refcount; first-time insertion sets count to 1.

Acceptance criteria

  • KEY_TO_HANDLE.size() does not grow unboundedly under per-table-vended-credential workloads.
  • Steady-state path (same triple reused across many scans) keeps a single cached instance.
  • Concurrent ensureInitialized + releaseHandle against the same key does not double-free or resurrect a released entry.
  • Provider close() exceptions on release are swallowed and logged.
  • Existing CometS3CredentialBridgeSuite and dispatcher unit tests continue to pass.

Out of scope

  • Driver→executor refresh of catalogProperties for long-running scans. That side of the AssumeRole story is addressed by composing with Spark's HadoopDelegationTokenProvider, documented in docs/source/user-guide/latest/s3-credential-providers.md and docs/source/contributor-guide/s3-credential-provider-design.md.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions