Skip to content

Benchmark spill machinery#791

Merged
frankmcsherry merged 1 commit intoTimelyDataflow:masterfrom
frankmcsherry:spill_bench
Apr 30, 2026
Merged

Benchmark spill machinery#791
frankmcsherry merged 1 commit intoTimelyDataflow:masterfrom
frankmcsherry:spill_bench

Conversation

@frankmcsherry
Copy link
Copy Markdown
Member

@frankmcsherry frankmcsherry commented Apr 30, 2026

Introduces examples/spill_compare.rs, for evaluating spilling and non-spilling (swap, likely) with a dataflow that sends a fixed amount of data with no interruptions to receive, essentially packing the channel buffer. The usage is

spill_compare --total-gb 50 --workers 4 [--with-spill]

where that last flag can be present or absent, and controls whether we spill the data to temp files, or have the OS figure out how to handle the memory.

On my laptop, the numbers look like

  50 GB total, incompressible payload, 256 KB chunks, threshold 256 MB, head reserve 64 MB.                                                                           
  Apple M2, 24 GB RAM, internal SSD.             

  | workers | mode     | total  | production | drain  | peak RSS | throughput  |     
  |---------|----------|--------|------------|--------|----------|-------------|
  | 2       | spill    |   34 s |       17 s |  15 s  |   1.5 GB |  1.46 GB/s  |
  | 2       | no-spill |  175 s |       21 s | 152 s  |   6 GB ¹ |   285 MB/s  |
  | 4       | spill    |   32 s |       16 s |  15 s  |   2.9 GB |  1.55 GB/s  |
  | 4       | no-spill |  142 s |       27 s | 113 s  |   6.3 GB |   360 MB/s  |

They suggest that with spilling we are disk throughput limited, and the worker increase the number of concurrent faults that can be served. A dd benchmark suggests that my ssd is in the ~1.3GB/s combined read/write area.

Some nuance to the benchmark:

  1. Swizzle up the data to make sure it is not compressible (e.g. on Macos).
  2. Wrap the bytes in serde_bytes wrappers to perform bulk copies rather than per-byte serialization.

@frankmcsherry frankmcsherry merged commit d93ed19 into TimelyDataflow:master Apr 30, 2026
9 checks passed
@frankmcsherry frankmcsherry deleted the spill_bench branch April 30, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant