Skip to content

BanyanDB: bluge index analysisWorker pool not released on segment rotation causes goroutine drift #13874

@hanahmily

Description

@hanahmily

Summary

During a 48-hour soak of standalone BanyanDB, goroutine count grew from 556 to 708 (+27%). Root cause is bluge index writers spawning their analysis-worker pools on each segment rotation without releasing them when the writer goes idle.

Reproduction

  1. Run standalone BanyanDB with a measure group using SegmentInterval: 1 day.
  2. Drive sustained write traffic for ≥48 h (any continuous write workload that crosses a UTC midnight reproduces; in our setup we used SkyWalking OAP traffic plus a synthetic ~1000-row/day fixture).
  3. Sample /debug/pprof/goroutine?debug=1 every 30 min.

Observed pattern

Time (UTC) Goroutines Δ Note
t0 556 baseline
t0 + ~21 h (first UTC midnight crossed) 556 → 632 +76 first segment rotation event
t0 + ~45 h (second UTC midnight crossed) 632 → 708 +76 second event, identical shape

The two events are spaced exactly 24 h apart and add ~76 goroutines each. Between events, goroutine count is flat to within ±1 — no steady leak, only a daily step.

Stack analysis

Diffing the goroutine profile between start and end:

  • +108 in bluge/index.analysisWorker at github.com/blugelabs/bluge/index.OpenWriter.func1 (writer.go:77 → analysisWorker at writer.go:667).
  • The remaining ~44 are orchestration goroutines around the new writers (waiters, transmit loops).
  • Every other stack signature (e.g. pkg/flow.Transmit, grpc/internal/grpcsync.CallbackSerializer.run) is identical count start vs end.

The 108 = 2 events × ~54 analysisWorkers per writer. bluge sizes this pool from GOMAXPROCS (host had 32 CPUs, default pool size ~54), so per-event growth scales with the worker host's GOMAXPROCS.

Hypothesis

When a tsTable rotates to a new daily segment, BanyanDB opens a new bluge index writer for that segment but does not close writers for older segments that are no longer being written to. Each leftover writer keeps its analysisWorker goroutine pool alive.

Over weeks/months the growth would be linear in segment count, eventually pressuring the Go scheduler and overall memory footprint.

Suggested fix direction

Close bluge index writers for segments outside the current write window. This may already be the intent of segment-lifecycle hooks in banyand/internal/storage/; the leak suggests a missing close path on the bluge writer specifically.

Environment

  • Standalone mode, single-node deployment.
  • SegmentInterval: 1 day, default flush timeouts.
  • Host: 32-core Linux, no swap.
  • Sustained write rate ~1 req/s through gRPC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseBanyanDB - SkyWalking native database

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions