Search before asking
Apache SkyWalking Component
BanyanDB
What happened
After upgrading from apache/skywalking-banyandb:0.9.0 to 0.10.1 (with OAP 10.4.0), BanyanDB crashes the process every ~7-8 minutes with:
panic: offset 1400877 must be equal to bytesRead 1400490
Unlike the timestamp-ordering panic in #13860 (which is recovered by grpc-middleware), this one fires from a background mergeLoop goroutine that is not wrapped by recovery, so the process exits and the pod restarts.
Full stack
goroutine 3900 [running]:
github.com/apache/skywalking-banyandb/pkg/logger.Panicf(...)
github.com/apache/skywalking-banyandb/banyand/trace.(*partMergeIter).mustReadRaw(0xc001ac4000, 0xc002d716b8, 0xc001ac4118)
/mnt/d/skywalking-banyandb/banyand/trace/part_iter.go:359 +0xf5
github.com/apache/skywalking-banyandb/banyand/trace.(*blockReader).mustReadRaw(...)
/mnt/d/skywalking-banyandb/banyand/trace/block_reader.go:263
github.com/apache/skywalking-banyandb/banyand/trace.mergeBlocks(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:421 +0x79e
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeParts(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:344 +0x42a
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergePartsThenSendIntroduction(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:118 +0x145
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeSnapshot(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:104 +0x125
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop.func1(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:78 +0x1f9
github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).mergeLoop(...)
/mnt/d/skywalking-banyandb/banyand/trace/merger.go:90 +0x271
created by github.com/apache/skywalking-banyandb/banyand/trace.(*tsTable).startLoop in goroutine 157
/mnt/d/skywalking-banyandb/banyand/trace/tstable.go:130 +0x246
Source location (apache/skywalking-banyandb v0.10.1)
banyand/trace/part_iter.go:354-365:
func (pmi *partMergeIter) mustReadRaw(r *rawBlock, bm *blockMetadata) {
r.bm = bm
// spans
if bm.spans != nil && bm.spans.size > 0 {
// Validate the reader is aligned to the expected offset
if bm.spans.offset != pmi.seqReaders.spans.bytesRead {
logger.Panicf("offset %d must be equal to bytesRead %d", bm.spans.offset, pmi.seqReaders.spans.bytesRead)
}
...
}
...
}
So the merger sequentially reads spans from seqReaders.spans, and a per-block bm.spans.offset is expected to match how far the seqReader has advanced (bytesRead). When they diverge — by 387 bytes in our sample — the merger panics. The same pattern (offset must be equal to bytesRead) appears at:
banyand/trace/block.go:196 (tag metadata)
banyand/trace/block.go:329 (span data)
banyand/internal/sidx/block.go
banyand/measure/block.go
banyand/stream/block.go
So the invariant is repeated across the new (0.10) trace storage engine.
Cadence and impact
In our cluster, BanyanDB pod restarted 126 times in 17 hours = roughly once every 8 minutes. Every time, OAP loses connection to BanyanDB and hot-loops crash too (~148 OAP restarts in the same window). Net effect: rolling availability — every ~8 minutes there is a 1-2 minute window where ingestion and queries fail.
For comparison, on 0.9.0 the only panic we saw fired ~once every 28 minutes. 0.10.1 is significantly less stable on our workload, primarily because of this new panic in the merger.
What you expected to happen
The merger should not panic on what is clearly a corrupted or out-of-sync block metadata. Reasonable options (maintainers know best):
- Skip the offending block with a warning instead of
Panicf — at minimum, contain the blast radius to one block instead of restarting the whole DB.
- Restart the seqReader to the offset declared in
bm.spans.offset (or vice versa) when divergence is detected — assumes the metadata is the source of truth.
- Fail the merge of the affected part but keep the process running and let retention/cleanup eventually drop the corrupted part.
How to reproduce
Steady-state SkyWalking deployment, OAP forwarding traces to standalone BanyanDB. We see this on:
- BanyanDB:
apache/skywalking-banyandb:0.10.1
- SkyWalking OAP:
apache/skywalking-oap-server:10.4.0
- ~30+ Java services,
apache-skywalking-java-agent 9.5.0, JDK 21
- Standalone BanyanDB on Kubernetes (Aliyun ACK),
--trace-root-path=/data/trace
- 51 GB cumulative on disk after 17h of ingest (stream 38.5G + trace 12.9G + measure 26M)
The very first occurrence happens within ~30 minutes of starting fresh (after fully wiping /data and letting OAP recreate schemas). After that, panic cadence stabilizes at ~8 minutes.
Anything else
This bug is in the new trace storage engine introduced by #713 in 0.10.0; we did not see this panic on 0.9.0 (which uses the older trace path).
We have already reported the related — but distinct — timestamp-ordering panic in the write path as #13860 (recoverable, not crashing the process). Filing this one separately because the failure mode (background merge goroutine, no recovery, full process exit) is different and arguably more disruptive.
Happy to gather more samples (full stack traces over time, sample part dumps if a tool exists, sysrq dumps, anything) on request.
Are you willing to submit a pull request to fix on your own
Code of Conduct
Search before asking
Apache SkyWalking Component
BanyanDB
What happened
After upgrading from
apache/skywalking-banyandb:0.9.0to0.10.1(with OAP 10.4.0), BanyanDB crashes the process every ~7-8 minutes with:Unlike the timestamp-ordering panic in #13860 (which is recovered by
grpc-middleware), this one fires from a backgroundmergeLoopgoroutine that is not wrapped by recovery, so the process exits and the pod restarts.Full stack
Source location (apache/skywalking-banyandb v0.10.1)
banyand/trace/part_iter.go:354-365:So the merger sequentially reads spans from
seqReaders.spans, and a per-blockbm.spans.offsetis expected to match how far theseqReaderhas advanced (bytesRead). When they diverge — by 387 bytes in our sample — the merger panics. The same pattern (offset must be equal to bytesRead) appears at:banyand/trace/block.go:196(tag metadata)banyand/trace/block.go:329(span data)banyand/internal/sidx/block.gobanyand/measure/block.gobanyand/stream/block.goSo the invariant is repeated across the new (0.10) trace storage engine.
Cadence and impact
In our cluster, BanyanDB pod restarted 126 times in 17 hours = roughly once every 8 minutes. Every time, OAP loses connection to BanyanDB and hot-loops crash too (~148 OAP restarts in the same window). Net effect: rolling availability — every ~8 minutes there is a 1-2 minute window where ingestion and queries fail.
For comparison, on 0.9.0 the only panic we saw fired ~once every 28 minutes. 0.10.1 is significantly less stable on our workload, primarily because of this new panic in the merger.
What you expected to happen
The merger should not panic on what is clearly a corrupted or out-of-sync block metadata. Reasonable options (maintainers know best):
Panicf— at minimum, contain the blast radius to one block instead of restarting the whole DB.bm.spans.offset(or vice versa) when divergence is detected — assumes the metadata is the source of truth.How to reproduce
Steady-state SkyWalking deployment, OAP forwarding traces to standalone BanyanDB. We see this on:
apache/skywalking-banyandb:0.10.1apache/skywalking-oap-server:10.4.0apache-skywalking-java-agent9.5.0, JDK 21--trace-root-path=/data/traceThe very first occurrence happens within ~30 minutes of starting fresh (after fully wiping
/dataand letting OAP recreate schemas). After that, panic cadence stabilizes at ~8 minutes.Anything else
This bug is in the new trace storage engine introduced by #713 in 0.10.0; we did not see this panic on 0.9.0 (which uses the older trace path).
We have already reported the related — but distinct — timestamp-ordering panic in the write path as #13860 (recoverable, not crashing the process). Filing this one separately because the failure mode (background merge goroutine, no recovery, full process exit) is different and arguably more disruptive.
Happy to gather more samples (full stack traces over time, sample part dumps if a tool exists, sysrq dumps, anything) on request.
Are you willing to submit a pull request to fix on your own
Code of Conduct