Skip to content

bug: SessionContext.close() / DataFrame.close() race with concurrent JNI calls (use-after-free) #40

@andygrove

Description

@andygrove

Describe the bug

SessionContext and DataFrame hold their native pointer in a plain
long nativeHandle. Every public method follows the pattern:

if (nativeHandle == 0) throw new IllegalStateException(...);
someNativeCall(nativeHandle, ...);

…and close() does:

if (nativeHandle != 0) {
    closeSessionContext(nativeHandle);
    nativeHandle = 0;
}

If thread A is mid-method on a context and thread B calls close() on
the same context, the read in A and the write+free in B race:

  1. A reads nativeHandle (non-zero), passes it to JNI.
  2. B sets nativeHandle = 0 and the Rust side drops the Box.
  3. A's JNI call dereferences a freed *const SessionContext → UAF.

Even the nativeHandle == 0 guard is not safe — it's a TOCTOU. The
same shape applies to DataFrame (each method reads nativeHandle,
then calls JNI).

To Reproduce

No reproducer exists yet. A two-thread test running a tight loop of
ctx.sql(...).count() against ctx.close() on an ASan-instrumented
native build would surface it deterministically; can write one if the
maintainers want to see it first.

Expected behavior

One of:

  1. Document the UB explicitly. Today the Javadoc says contexts are
    "not thread-safe" and warns about concurrent sql / register* /
    close, but the consequence ("can produce a use-after-free") is
    already spelled out — so it's arguably already expected, and this
    issue is just a tracking marker so a future maintainer doesn't get
    surprised.
  2. Atomic handle + reference count. Use AtomicLong for the
    handle and reference-count on the Rust side so close() defers until
    in-flight calls drain. Closer to what JNA-style bindings do.
  3. Per-instance lock. Wrap every JNI call in a synchronized
    block. Simplest, but kills any potential concurrency on independent
    read-only operations.

Additional context

Not a regression — the documented contract already excludes concurrent
use. Filing for visibility ahead of the first multi-threaded user (a
server, a Flink/Spark integration, etc.) hitting it in production
rather than dev. Cross-references SessionContext.java and
DataFrame.java; same shape will need attention any time a new
long-lived handle is added.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions