Skip to content

[Bug] BanyanDB 0.10.1 trace merge: "offset must be equal to bytesRead" panic in part_iter, crashes the process #13861

@Felix-wave

Description

@Felix-wave

Search before asking

Apache SkyWalking Component

BanyanDB

What happened

After upgrading from apache/skywalking-banyandb:0.9.0 to 0.10.1 (with OAP 10.4.0), BanyanDB crashes the process every ~7-8 minutes with:

panic: offset 1400877 must be equal to bytesRead 1400490

Unlike the timestamp-ordering panic in #13860 (which is recovered by grpc-middleware), this one fires from a background mergeLoop goroutine that is not wrapped by recovery, so the process exits and the pod restarts.

Full stack

goroutine 3900 [running]:
github.com/apache/skywalking-banyandb/pkg/logger.Panicf(...)
github.com/apache/skywalking-banyandb/banyand/trace.(*partMergeIter).mustReadRaw(0xc001ac4000, 0xc002d716b8, 0xc001ac4118)
    /mnt/d/skywalking-banyandb/banyand/trace/part_iter.go:359 +0xf5
github.com/apache/skywalking-banyandb/banyand/trace.(*blockReader).mustReadRaw(...)
    /mnt/d/skywalking-banyandb/banyand/trace/block_reader.go:263
github.com/apache/skywalking-banyandb/banyand/trace.mergeBlocks(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:421 +0x79e
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeParts(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:344 +0x42a
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergePartsThenSendIntroduction(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:118 +0x145
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeSnapshot(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:104 +0x125
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop.func1(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:78 +0x1f9
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop(...)
    /mnt/d/skywalking-banyandb/banyand/trace/merger.go:90 +0x271
created by github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).startLoop in goroutine 157
    /mnt/d/skywalking-banyandb/banyand/trace/tstable.go:130 +0x246

Source location (apache/skywalking-banyandb v0.10.1)

banyand/trace/part_iter.go:354-365:

func (pmi *partMergeIter) mustReadRaw(r *rawBlock, bm *blockMetadata) {
    r.bm = bm
    // spans
    if bm.spans != nil && bm.spans.size > 0 {
        // Validate the reader is aligned to the expected offset
        if bm.spans.offset != pmi.seqReaders.spans.bytesRead {
            logger.Panicf("offset %d must be equal to bytesRead %d", bm.spans.offset, pmi.seqReaders.spans.bytesRead)
        }
        ...
    }
    ...
}

So the merger sequentially reads spans from seqReaders.spans, and a per-block bm.spans.offset is expected to match how far the seqReader has advanced (bytesRead). When they diverge — by 387 bytes in our sample — the merger panics. The same pattern (offset must be equal to bytesRead) appears at:

  • banyand/trace/block.go:196 (tag metadata)
  • banyand/trace/block.go:329 (span data)
  • banyand/internal/sidx/block.go
  • banyand/measure/block.go
  • banyand/stream/block.go

So the invariant is repeated across the new (0.10) trace storage engine.

Cadence and impact

In our cluster, BanyanDB pod restarted 126 times in 17 hours = roughly once every 8 minutes. Every time, OAP loses connection to BanyanDB and hot-loops crash too (~148 OAP restarts in the same window). Net effect: rolling availability — every ~8 minutes there is a 1-2 minute window where ingestion and queries fail.

For comparison, on 0.9.0 the only panic we saw fired ~once every 28 minutes. 0.10.1 is significantly less stable on our workload, primarily because of this new panic in the merger.

What you expected to happen

The merger should not panic on what is clearly a corrupted or out-of-sync block metadata. Reasonable options (maintainers know best):

  1. Skip the offending block with a warning instead of Panicf — at minimum, contain the blast radius to one block instead of restarting the whole DB.
  2. Restart the seqReader to the offset declared in bm.spans.offset (or vice versa) when divergence is detected — assumes the metadata is the source of truth.
  3. Fail the merge of the affected part but keep the process running and let retention/cleanup eventually drop the corrupted part.

How to reproduce

Steady-state SkyWalking deployment, OAP forwarding traces to standalone BanyanDB. We see this on:

  • BanyanDB: apache/skywalking-banyandb:0.10.1
  • SkyWalking OAP: apache/skywalking-oap-server:10.4.0
  • ~30+ Java services, apache-skywalking-java-agent 9.5.0, JDK 21
  • Standalone BanyanDB on Kubernetes (Aliyun ACK), --trace-root-path=/data/trace
  • 51 GB cumulative on disk after 17h of ingest (stream 38.5G + trace 12.9G + measure 26M)

The very first occurrence happens within ~30 minutes of starting fresh (after fully wiping /data and letting OAP recreate schemas). After that, panic cadence stabilizes at ~8 minutes.

Anything else

This bug is in the new trace storage engine introduced by #713 in 0.10.0; we did not see this panic on 0.9.0 (which uses the older trace path).

We have already reported the related — but distinct — timestamp-ordering panic in the write path as #13860 (recoverable, not crashing the process). Filing this one separately because the failure mode (background merge goroutine, no recovery, full process exit) is different and arguably more disruptive.

Happy to gather more samples (full stack traces over time, sample part dumps if a tool exists, sysrq dumps, anything) on request.

Are you willing to submit a pull request to fix on your own

  • Yes, I am willing to submit a pull request on my own!

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working and you are sure it's a bug!databaseBanyanDB - SkyWalking native database

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions