Skip to content

java.lang.IllegalArgumentException: Accounted size went negative. #3101

@loserwang1024

Description

@loserwang1024

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

0.9.0 (latest release)

Please describe the bug 🐞

The error is

2026-04-16 22:49:50,872 ERROR org.apache.fluss.client.write.Sender                         [] - Uncaught error in Fluss write sender I/O thread: 
java.lang.IllegalArgumentException: Accounted size went negative.
	at org.apache.fluss.shaded.arrow.org.apache.arrow.util.Preconditions.checkArgument(Preconditions.java:136) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.Accountant.releaseBytes(Accountant.java:219) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.RootAllocator.releaseBytes(RootAllocator.java:29) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.AllocationManager.release(AllocationManager.java:152) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.BufferLedger.decrement(BufferLedger.java:157) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:124) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.BaseVariableWidthVector.clear(BaseVariableWidthVector.java:270) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.VectorSchemaRoot.clear(VectorSchemaRoot.java:157) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.row.arrow.ArrowWriter.recycle(ArrowWriter.java:308) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.record.MemoryLogRecordsArrowBuilder.build(MemoryLogRecordsArrowBuilder.java:229) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.record.MemoryLogRecordsArrowBuilder.close(MemoryLogRecordsArrowBuilder.java:310) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.ArrowLogWriteBatch.close(ArrowLogWriteBatch.java:104) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.RecordAccumulator.drainBatchesForOneNode(RecordAccumulator.java:773) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.RecordAccumulator.drain(RecordAccumulator.java:288) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.Sender.sendWriteData(Sender.java:243) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.Sender.runOnce(Sender.java:194) ~[blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at org.apache.fluss.client.write.Sender.run(Sender.java:164) [blob_p-4eb0b65d03350a588b07f4bbbc14363610125495-2eda0538dee6bd4827a2612b21dd4301:0.10-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

Solution

Root Cause Analysis

The error Accounted size went negative is caused by a shutdown race condition in WriterClient.close().

The Shutdown Sequence (Buggy)

In WriterClient.close():

Step 1: sender.initiateClose()     ← closes allocator + sets running=false
Step 2: ioThreadPool.shutdown()    ← signals thread pool to stop
Step 3: awaitTermination(timeout)  ← waits for sender thread
Step 4: sender.forceClose()        ← sets forceClose=true

The problem: Step 1 closes the Arrow BufferAllocator BEFORE the sender thread has stopped.

The Race

  1. sender.initiateClose() calls accumulator.close(), which:

    • Closes the arrowWriterPool
    • Calls bufferAllocator.releaseBytes(bufferAllocator.getAllocatedMemory()) — resets allocator to 0
    • Calls bufferAllocator.close() — closes the allocator
    • Sets running = false
  2. The sender thread is still executing. It sees running = false, exits the main loop, and enters the shutdown drain loop at line 162:

    while (!forceClose && accumulator.hasUnDrained()) {  // forceClose is still FALSE here!
        runOnce();  // tries to drain remaining batches
    }
  3. runOnce()drain()drainBatchesForOneNode()batch.close()build()ArrowWriter.recycle()root.clear() → tries to release bytes from the already-closed allocator (whose accounting is already 0) → "Accounted size went negative"

  4. The error is caught by the shutdown loop at line 166, logging exactly the message you see:

    "Uncaught error in Fluss write sender I/O thread: "
    

Fix

The fix is to separate the "stop accepting new appends" phase from the "release resources" phase. initiateClose() should only prevent new appends and signal the sender to stop, but NOT destroy the allocator. Resource cleanup should happen AFTER the sender thread has fully exited.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions