go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation. #91

reltuk · 2019-09-26T22:38:06Z

When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass into ChunkStore#GetMany and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation of GetMany will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.

In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.

This is a small PR to change the concurrency paradigm inside getManyAtOffsetsWithReadFunc so that we only have 4 concurrent dispatched reads per table_reader instance at a time.

This is still not the behavior we actually want.

I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of tableReaders), and not depend on the number of tableReaders which happen to back the chunk store.
Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.

I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does a GetManyCompressed on every chunk address and copies their contents to /dev/null. It was run on a ~10GB (compressed) data set:

Before:

$ /usr/bin/time -l -- go run test.go
...
MemStats: Sys: 16628128568
      161.29 real        67.29 user       456.38 sys
5106425856  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
  10805008  page reclaims
     23881  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         8  signals received
    652686  voluntary context switches
  21071339  involuntary context switches

After:

$ /usr/bin/time -l -- go run test.go
...
MemStats: Sys: 4590759160
       32.17 real        30.53 user        29.62 sys
4561879040  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
   1228770  page reclaims
     67100  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
        14  signals received
    456898  voluntary context switches
   2954503  involuntary context switches

On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.

Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now.

…nded I/O parallelism in GetMany implementation.

…mpressed} in the chunkReadPlanner case.

bheni

lgtm. One suggestion

bheni · 2019-09-26T22:50:58Z

go/store/nbs/table_reader.go

+
+	wg.Add(ioParallelism)
+	for i := 0; i < ioParallelism; i++ {
+		go readBatches()


I like to see my wg.Done() calls close to my wg.Add calls. Makes me feel like I'm not missing something. So I would do:

wg.Add(ioParallelism) for i := 0; i < ioParallelism; i++ { go func() { defer wg.Done() readBatches() }() }

Then remove the wg.Done from the top of your readBatches func.

…f channel close and wait group done calls.

Aaron Son added 2 commits September 26, 2019 13:27

go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbou…

1bcf0c3

…nded I/O parallelism in GetMany implementation.

go: store/nbs: table_set: Remove superfluous goroutine in getMany{,Co…

1012b5c

…mpressed} in the chunkReadPlanner case.

reltuk requested a review from bheni September 26, 2019 22:38

bheni approved these changes Sep 26, 2019

View reviewed changes

go/store/nbs: table_reader: PR Feedback: Small changes to structure o…

dd98974

…f channel close and wait group done calls.

reltuk merged commit 9456d37 into master Sep 26, 2019

Hydrocharged deleted the aaron/nbs-io-parallelism-fixes branch November 11, 2019 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation. #91

go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation. #91

reltuk commented Sep 26, 2019

bheni left a comment

bheni Sep 26, 2019

go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation. #91

go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation. #91

Conversation

reltuk commented Sep 26, 2019

bheni left a comment

Choose a reason for hiding this comment

bheni Sep 26, 2019

Choose a reason for hiding this comment