go/libraries/doltcore/remotestorage,go/store/datas/pull: Implement a pipelined chunk fetcher. #7824

reltuk · 2024-05-06T18:41:49Z

No description provided.

….readChunksAndCache.

…RPCDownloadLocsThread.

…unkFetch implementation.

…mentation of end-to-end pipelining remotestorage ChunkFetcher. TODO: - NewRpcErrors - Return empty CompressedChunks for things that are not found. - Fix how we find the next download. Currently might be slow.

…rent ways of structuring the reliable grpc call state machine.

…ize 8 buffer can actually store 8 elements.

…ular. Start reliable package with a reliable GRPC call copying some of the state machine logic.

… cleanup. More testing.

…vestigate other ways of structuring the GRPC reliable call state machine in order to compare them.

…e machine.

…t on a ranges tree for more efficiently picking next download.

…RepoToken storage back for StreamDownloadLocations pipeline.

…etecting missing chunks and adhering to the contract where they come back from Recv as a CompressedChunk with only a hash field set.

…fetch. Implement largest, smallest and random policy.

…adLocations.

…ompressed to chunk_fetcher implementation.

…oke tests for tree.Len().

…m large (unhedged) responses back as chunks, instead of downloading everything before delivering it as chunks.

…aming, retries and throughput monitoring are in place.

… based on successes or failures of inflight requests.

…oncurrentParams into NetworkRequestParams.

…pyright headers.

…resh table file url rpc, grpc send, and grpc open for get download locations.

…e unused downThroughputCheck and iohelp import.

…ome comments regarding the implementation and usage.

coffeegoddd · 2024-05-06T19:14:59Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`1c6a4cf`	ok	5937457

version	total_tests
`1c6a4cf`	5937457

correctness_percentage
100.0

… timeouts for getting HTTP response headers when download byte ranges.

…me comments for StreamingRangeDownload implementation.

…mentation comments.

coffeegoddd · 2024-05-06T20:28:51Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`d900819`	ok	5937457

version	total_tests
`d900819`	5937457

correctness_percentage
100.0

…arameter.

coffeegoddd · 2024-05-06T21:45:24Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`5c4d05e`	ok	5937457

version	total_tests
`5c4d05e`	5937457

correctness_percentage
100.0

max-hoffman

LGTM, bit more concerned about the hand rolled primitives than the orchestration. Like realistically I'd need to do a bunch of testing to find whether the message queue has any edge cases where we'd drop a req or something. Same thing with the state machine, it all checks out afaict but it seems like there's a lot of potential failure cases where we don't restart a streaming connection or something. But it seems pretty unlikely that we would return successfully from a fetch without detecting missed chunks? So then worst-case is maybe a repo that has a fetch bug that we need to patch before it can be cloned, which is fine?

go/libraries/doltcore/remotestorage/internal/reliable/grpc_state_machine_struct.go

go/libraries/doltcore/remotestorage/internal/ranges/ranges.go

go/libraries/doltcore/remotestorage/internal/reliable/http.go

max-hoffman · 2024-05-07T02:10:21Z

go/libraries/doltcore/remotestorage/chunk_fetcher.go

+			}
+			for h := range hs {
+				h := h
+				addrs = append(addrs, h[:])


what's the reason to decouple (1) receive addrs, (2) form request, (3) send request?

seems like the thisResCh might be unnecessary also if you combined

Hmm, I'm not sure I understand. You mean why do we batch them up separate from the incoming HashSet?

The reason is a few fold, AFAIK:

The remote server is going to impose limits on incoming request size, and it wants incoming request messages to be below a certain threshold.

The remote server is going to respond to incoming requests serially, and requests are responded to in aggregate / fully – a response message comes back with all the resolved locations, and that response body has to be fully read and processed before any locations can be pulled out of it, etc.

The remote server is essentially running HasMany() or getReadReqs or whatever on the table file indexes of the loaded Dolt repository. The method is more efficient with batches of addresses than for single shot lookups.

The response payload itself is more efficient for multi-lookups where the locations share the same table file. In that case, we only have to transit the signed URL for a given table file once. Whereas the response payloads are stateless across the connection, and so the smaller the batches, the more times the signed URLs are transitted.

The end result is that you want to batch up chunk addresses, but you can't / don't want your batches to be too big.

Does that answer the question?

In reality, some of these constraints are self-imposed and could be approached in a different way, but I think you would still want batching here...

Oh I see, I thought batching was working in the other direction for some reason -- accumulating hash sets, which the setup doesn't seem organized for. If we're sending out smaller batches, decoupling the receive/send makes sense

go/libraries/doltcore/remotestorage/chunk_fetcher.go

go/libraries/doltcore/remotestorage/internal/reliable/chan.go

go/libraries/doltcore/remotestorage/internal/reliable/grpc_state_machine_struct.go

…te_machine_struct.go Co-authored-by: Maximilian Hoffman <max@dolthub.com>

Co-authored-by: Maximilian Hoffman <max@dolthub.com>

…te_machine_struct.go Co-authored-by: Maximilian Hoffman <max@dolthub.com>

coffeegoddd · 2024-05-07T19:53:08Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`b449280`	ok	5937457

version	total_tests
`b449280`	5937457

correctness_percentage
100.0

nicktobey · 2024-05-06T22:40:41Z

go/store/nbs/chunk_fetcher.go

+// `Get()` and `Recv()` concurrently. Unless there is an error, for every
+// single Hash passed to Get, a corresponding Recv() call will deliver the
+// contents of the chunk. When a caller is done with a ChunkFetcher, they
+// should call |CloseSend()|. After CloseSend, all requested hashes have been


Suggested change

// should call |CloseSend()|. After CloseSend, all requested hashes have been

// should call |CloseSend()|. After CloseSend, once all requested hashes have been

nicktobey · 2024-05-06T22:48:34Z

go/libraries/doltcore/remotestorage/internal/reliable/timeout_controller_test.go

+		tc.Close()
+		assert.NoError(t, eg.Wait())
+	})
+	t.Run("SetTimeoutRespectsContext", func(t *testing.T) {


What's being tested in "SetTimeoutRespectsContext"? There aren't any asserts in "BeforeRun". Is calling SetTimeout after cancel supposed to have specific behavior? Should this be running and waiting on the controller to test that it actually gets cancelled?

nicktobey · 2024-05-07T23:18:24Z

go/libraries/doltcore/remotestorage/chunk_fetcher.go

+// Reads HashSets from reqCh and batches all the received addresses
+// into |GetDownloadLocsRequest| messages with up to |batchSize| chunk hashes
+// in them. It delivers the batched messages to |resCh|.
+func fetcherHashSetToGetDlLocsReqsThread(ctx context.Context, reqCh chan hash.HashSet, abortCh chan struct{}, resCh chan *remotesapi.GetDownloadLocsRequest, batchSize int, repoPath string, idFunc func() (*remotesapi.RepoId, string)) error {


The state machine in this function is pretty hard to follow. I'm not sure I understand it.

So thisResCh is assigned to resCh the first time there are any elements in addr. Prior to that, it's nil. Writing to a nil channel blocks forever. Is the idea that the select statement will never choose the thisResCh <- thisRes case until we assign thisResCh a non-nil value? Are we depending on the properties of writing to a nil channel for correctness here? Is this a good practice?

Maybe this is the idiomatic way to do this. But as someone less versed in go channels, the intended execution flow is very unclear.

reltuk added 27 commits May 6, 2024 09:41

go/libraries/doltcore/remotestorage: Small cleanup in *DoltChunkStore…

633b3be

….readChunksAndCache.

prototype: go/libraries/doltcore/remotestorage: First pass at fetcher…

bd34069

…RPCDownloadLocsThread.

go/store/datas/pull: Let ChunkStore implementation bring their own Ch…

38b1688

…unkFetch implementation.

prototype: go/libraries/doltcore/remotestorage: First prototype imple…

72190db

…mentation of end-to-end pipelining remotestorage ChunkFetcher. TODO: - NewRpcErrors - Return empty CompressedChunks for things that are not found. - Fix how we find the next download. Currently might be slow.

prototype: go/libraries/doltcore/remotestorage: Experiment with diffe…

f90aaef

…rent ways of structuring the reliable grpc call state machine.

go/libraries/doltcore/remotestorage/internal/circbuff: Make it so a s…

b06b278

…ize 8 buffer can actually store 8 elements.

prototype: go/libraries/doltcore/remotestorage: Move circbuff -> circ…

b5d58f7

…ular. Start reliable package with a reliable GRPC call copying some of the state machine logic.

prototype: go/libraries/doltcore/remotestorage: reliable.reliableCall…

4cc277b

… cleanup. More testing.

prototype: go/libraries/doltcore/remotestorage: internal/reliable: In…

2174e57

…vestigate other ways of structuring the GRPC reliable call state machine in order to compare them.

go/libraries/doltcore/remotestorage: Settle on the reliable call stat…

d22e482

…e machine.

prototype: go/libraries/doltcore/remotestorage: internal/ranges: Star…

454a384

…t on a ranges tree for more efficiently picking next download.

go/libraries/doltcore/remotestorage: chunk_fetcher, chunk_store. Get …

5aef3f3

…RepoToken storage back for StreamDownloadLocations pipeline.

go/libraries/doltcore/remotestorage: chunk_fetcher: Add support for d…

75ca79a

…etecting missing chunks and adhering to the contract where they come back from Recv as a CompressedChunk with only a hash field set.

go/libraries/doltcore/remotestorage: Use ranges.Tree to get our next …

1a36e64

…fetch. Implement largest, smallest and random policy.

go/libraries/doltcore/remotestorage: Add NewRpcError for StreamDownlo…

f78abb6

…adLocations.

go/libraries/doltcore/remotestorage: chunk_store.go: Migrate GetManyC…

99bdcb4

…ompressed to chunk_fetcher implementation.

go: libraries/doltcore/remotestorage: internal/ranges: Some simple sm…

08b7b99

…oke tests for tree.Len().

go/libraries/doltcore/remotestorage: Update downloading code to strea…

a9a7f16

…m large (unhedged) responses back as chunks, instead of downloading everything before delivering it as chunks.

go/libraries/doltcore/remotestorage: Some cleanup around chunk fetcher.

27d1f6e

go/libraries/doltcore/remotestorage: Remove chunk hedging, since stre…

50165b4

…aming, retries and throughput monitoring are in place.

go/libraries/doltcore/remotestorage: Add dynamic download concurrency…

c43e9ea

… based on successes or failures of inflight requests.

go/libraries/doltcore/remotestorage: Rework some adhoc settings and C…

dc5814f

…oncurrentParams into NetworkRequestParams.

go/libraries/doltcore/remotestorage: Cleanup some prints, repofmt, co…

6ee0fdc

…pyright headers.

go/libraries/doltcore/remotestorage: Add context timeouts around: ref…

74322d8

…resh table file url rpc, grpc send, and grpc open for get download locations.

go/Godeps: update. google/btree.

1c6a4cf

go/libraries/doltcore/remotestorage: internal/reliable/http.go: Remov…

d3fae43

…e unused downThroughputCheck and iohelp import.

go/libraries/doltcore/remotestorage: internal/reliable/chan.go: Add s…

480722e

…ome comments regarding the implementation and usage.

coffeegoddd added the correctness_approved label May 6, 2024

go/libraries/doltcore/remotestorage: chunk_store.go: Add configurable…

e737f07

… timeouts for getting HTTP response headers when download byte ranges.

reltuk added 2 commits May 6, 2024 12:34

g/libraries/doltcore/remotestorage: internal/reliable/http.go: Add so…

75fae49

…me comments for StreamingRangeDownload implementation.

go/libraries/doltcore/remotestorage: chunk_fetcher.go: Add some imple…

d900819

…mentation comments.

go/libraries/doltcore/remotestorage: internal/pool: Fix for context p…

5c4d05e

…arameter.

max-hoffman approved these changes May 7, 2024

View reviewed changes

reltuk and others added 5 commits May 7, 2024 12:01

Update go/libraries/doltcore/remotestorage/internal/reliable/grpc_sta…

30107f9

…te_machine_struct.go Co-authored-by: Maximilian Hoffman <max@dolthub.com>

Update go/libraries/doltcore/remotestorage/internal/reliable/http.go

e265770

Co-authored-by: Maximilian Hoffman <max@dolthub.com>

Update go/libraries/doltcore/remotestorage/internal/reliable/http.go

3373c47

Co-authored-by: Maximilian Hoffman <max@dolthub.com>

Update go/libraries/doltcore/remotestorage/internal/reliable/grpc_sta…

30707d5

…te_machine_struct.go Co-authored-by: Maximilian Hoffman <max@dolthub.com>

go/libraries/doltcore/remotestorage: chunk_fetcher: PR feedback.

b449280

reltuk merged commit 1f9df06 into main May 7, 2024
20 checks passed

nicktobey reviewed May 7, 2024

View reviewed changes

BrewTestBot mentioned this pull request May 8, 2024

dolt 1.36.0 Homebrew/homebrew-core#171129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go/libraries/doltcore/remotestorage,go/store/datas/pull: Implement a pipelined chunk fetcher. #7824

go/libraries/doltcore/remotestorage,go/store/datas/pull: Implement a pipelined chunk fetcher. #7824

reltuk commented May 6, 2024

coffeegoddd commented May 6, 2024

coffeegoddd commented May 6, 2024

coffeegoddd commented May 6, 2024

max-hoffman left a comment

max-hoffman May 7, 2024

max-hoffman May 7, 2024

reltuk May 7, 2024

max-hoffman May 7, 2024

coffeegoddd commented May 7, 2024

nicktobey May 6, 2024

nicktobey May 6, 2024

nicktobey May 7, 2024

	// should call \|CloseSend()\|. After CloseSend, all requested hashes have been
	// should call \|CloseSend()\|. After CloseSend, once all requested hashes have been

go/libraries/doltcore/remotestorage,go/store/datas/pull: Implement a pipelined chunk fetcher. #7824

go/libraries/doltcore/remotestorage,go/store/datas/pull: Implement a pipelined chunk fetcher. #7824

Conversation

reltuk commented May 6, 2024

coffeegoddd commented May 6, 2024

coffeegoddd commented May 6, 2024

coffeegoddd commented May 6, 2024

max-hoffman left a comment

Choose a reason for hiding this comment

max-hoffman May 7, 2024

Choose a reason for hiding this comment

max-hoffman May 7, 2024

Choose a reason for hiding this comment

reltuk May 7, 2024

Choose a reason for hiding this comment

max-hoffman May 7, 2024

Choose a reason for hiding this comment

coffeegoddd commented May 7, 2024

nicktobey May 6, 2024

Choose a reason for hiding this comment

nicktobey May 6, 2024

Choose a reason for hiding this comment

nicktobey May 7, 2024

Choose a reason for hiding this comment