-
Notifications
You must be signed in to change notification settings - Fork 26
feat: Implement batch sender. #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
😱 Looks like the solution doesn't contemplate that the Anyway, it shows promise but we might not need it if we choose to move to shuffle-queue. I can look into the special case, though. |
|
I think this can be mitigated with a |
erezrokah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
- Can we add some tests that cover the new code (if we don't have them already via the scheduler tests)? I mean sending both single items and slices to the result channel
- I think
batchSizeshould default to 5 which is oursingleResourceMaxConcurrency. There's not point in sending more than 5 items at a time as they will block onI thinkplugin-sdk/scheduler/scheduler_dfs.go
Line 151 in 32855ea
resourceSemVal, _ := s.scheduler.singleTableConcurrency.LoadOrStore(resourceConcurrencyKey, semaphore.NewWeighted(s.scheduler.singleResourceMaxConcurrency))
erezrokah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to add e2e test using our source and destination test plugins that will cover the changes in this PR
|
@erezrokah passing E2E: cloudquery/cloudquery#19875 |
|
The testing I did initially was flawed (upgraded SDK on CLI rather than on source plugin), but since then I did a separate test that passed as well: |
🤖 I have created a release *beep* *boop* --- ## [4.71.0](v4.70.2...v4.71.0) (2024-12-09) ### Features * Implement batch sender. ([#1995](#1995)) ([371b20f](371b20f)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Problem
The DFS scheduler implements concurrent resolution of resources, but only if resources are sent in batches to the top-level
reschannel. Thus, the same plugin syncs a lot quicker simply by batching resources before sending them through the channel.Solution this PR implements
Currently with a batch timeout of
100msand batch size of100, it manages a 5x improvement on a sync that reproduces the issue brought forward by the community (https://github.com/jeromewir/cq-source-concurrency-childtable-example):The PR tries to touch the least amount of scheduler code, and stays away from tricky language constructs as much as possible.