Skip to content

perf(table): Decrease memory pressure by releasing records#886

Merged
zeroshade merged 3 commits into
apache:mainfrom
abhirathod95:perf-fanout-streaming-writer
Apr 13, 2026
Merged

perf(table): Decrease memory pressure by releasing records#886
zeroshade merged 3 commits into
apache:mainfrom
abhirathod95:perf-fanout-streaming-writer

Conversation

@abhirathod95
Copy link
Copy Markdown
Contributor

@abhirathod95 abhirathod95 commented Apr 13, 2026

The fanout function currently defer's record.Release(). This isn't a big deal if the iterable it reads from contains a small amount of data. However, if it's a long lived streaming channel or contains a lot of data, the fanout function will hold on to **all ** the records until it processes all of them. This dramatically increases memory use, especially in a streaming use case.

This PR extracts out the code to process each record into it's own function, and calls defer record.Release() there. No functional modifications were made. This will release each record after successfully processing the data.

@abhirathod95 abhirathod95 requested a review from zeroshade as a code owner April 13, 2026 18:15
Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The defer scoping issue is real and obvious from reading the code, but the PR claims a memory improvement without any measurement. A benchmark would prove the claim and prevent regression.

Something like:

func BenchmarkFanoutMemory(b *testing.B) {
    // Feed N large record batches through the fanout writer
    // Report b.ReportAllocs() and track peak memory via runtime.MemStats
}

Doesn't need to be fancy, even feeding 100 batches through the fanout and showing that peak HeapInuse stays flat (new) vs grows linearly (old) would be enough.

The code change is obviously correct from reading it, but a benchmark makes the improvement concrete and guards against regression.

@zeroshade
Copy link
Copy Markdown
Member

I agree with @laskoviymishka, let's add a benchmark or something to confirm and then this is good for me.

@abhirathod95
Copy link
Copy Markdown
Contributor Author

Yep, that makes sense. Let me add a benchmark

@abhirathod95
Copy link
Copy Markdown
Contributor Author

Added BenchmarkFanoutMemory in table/fanout_memory_bench_test.go to measure the improvement.

The benchmark uses a gated RecordReader that feeds 100 batches (50K rows each) one at a time through a single fanout writer, measuring HeapInuse between sends via runtime.GC() + runtime.ReadMemStats().
This directly observes whether previous batches are retained (old) or released promptly (new).

  benchstat (old vs new, n=6):

                  │   OLD (pre-fix)    │            NEW (with fix)            │
                  │ peak-heap-delta    │ peak-heap-delta  vs base             │
  FanoutMemory-14       2.111Gi ± 6%        1.593Gi ± 6%  -24.53% (p=0.002)

                  │     sec/op         │   sec/op         vs base             │
  FanoutMemory-14       1.555 ± 1%         1.550 ± 1%     ~ (p=0.699)

                  │     B/op           │     B/op         vs base             │
  FanoutMemory-14      19.48Gi ± 1%       19.59Gi ± 1%    ~ (p=0.485)

                  │   allocs/op        │   allocs/op      vs base             │
  FanoutMemory-14      36.31M ± 0%        36.31M ± 0%     ~ (p=0.240)

~25% reduction in peak live heap (p=0.002), no change in wall-clock time or total allocations

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks great! Thanks!

BenchmarkFanoutMemory feeds 100 large record batches through a single
fanout writer using a gated reader that controls when each batch enters
the pipeline. Between sends, it forces GC and samples HeapInuse to
measure peak live memory.

This proves the processRecord refactor (moving defer record.Release()
from the function-scoped fanout loop into processRecord) reduces peak
heap by ~25% (p=0.002), confirming batches are released promptly
instead of accumulating until the fanout function returns.
@abhirathod95 abhirathod95 force-pushed the perf-fanout-streaming-writer branch from 64ef8a9 to f076760 Compare April 13, 2026 21:22
Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeroshade zeroshade merged commit a0827dc into apache:main Apr 13, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants