-
Notifications
You must be signed in to change notification settings - Fork 26
feat: Add shuffle scheduler #1218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
⏱️ Benchmark results
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1218 +/- ##
==========================================
+ Coverage 48.54% 48.75% +0.20%
==========================================
Files 87 88 +1
Lines 8185 8240 +55
==========================================
+ Hits 3973 4017 +44
- Misses 3842 3850 +8
- Partials 370 373 +3
☔ View full report in Codecov by Sentry. |
scheduler/scheduler_shuffle.go
Outdated
|
|
||
| func shuffle(tableClients []tableClient) { | ||
| // use a fixed seed so that runs with the same tables and clients perform similarly across syncs | ||
| r := rand.New(rand.NewSource(99)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean we always randomize it the same way 🙃 ? Ideally this would be configurable but maybe we can use something from the spec so if the initial shuffle is bad users can modify it. Maybe hash on table/client IDs or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean we always randomize it the same way
First the obligatory https://xkcd.com/221/
Maybe hash on table/client IDs or something
Yeah we can do that, so then changing the ordering of tables in your config will cause a different shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was really necessary, the odds of getting a bad shuffle (at either end of the DFS->Round Robin spectrum) if there are enough table-clients are astronomically small, but in 27c4e22 I added the ability to control it a little bit by changing the ordering of the tables in the config.
🤖 I have created a release *beep* *boop* --- ## [4.8.0](v4.7.1...v4.8.0) (2023-09-19) ### Features * Add Checksums to package.json format ([#1217](#1217)) ([720baae](720baae)) * Add message to package command ([#1216](#1216)) ([44956d9](44956d9)) * Add shuffle scheduler ([#1218](#1218)) ([2b1ba30](2b1ba30)) * Update package command ([#1211](#1211)) ([39fc65e](39fc65e)) ### Bug Fixes * Add schema version to package.json ([#1212](#1212)) ([393c94d](393c94d)) * **deps:** Update github.com/cloudquery/arrow/go/v14 digest to 483f6b2 ([#1209](#1209)) ([179769a](179769a)) * **deps:** Update github.com/cloudquery/arrow/go/v14 digest to ffb7089 ([#1215](#1215)) ([70f20bb](70f20bb)) * Use -dir suffix for plugin package arguments ([#1213](#1213)) ([93f9398](93f9398)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Falling somewhere between DFS and Round Robin, the
shufflescheduler uses randomization with a fixed seed to spread the load across tables and clients more evenly.