Skip to content

Conversation

@hermanschaaf
Copy link
Member

Falling somewhere between DFS and Round Robin, the shuffle scheduler uses randomization with a fixed seed to spread the load across tables and clients more evenly.

@github-actions
Copy link

github-actions bot commented Sep 18, 2023

⏱️ Benchmark results

  • Glob-8 ns/op: 99.82

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Patch coverage: 80.00% and project coverage change: +0.20% 🎉

Comparison is base (44956d9) 48.54% compared to head (27c4e22) 48.75%.

❗ Current head 27c4e22 differs from pull request most recent head 15f8bf2. Consider uploading reports for the commit 15f8bf2 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1218      +/-   ##
==========================================
+ Coverage   48.54%   48.75%   +0.20%     
==========================================
  Files          87       88       +1     
  Lines        8185     8240      +55     
==========================================
+ Hits         3973     4017      +44     
- Misses       3842     3850       +8     
- Partials      370      373       +3     
Files Changed Coverage Δ
scheduler/scheduler_shuffle.go 79.24% <79.24%> (ø)
scheduler/scheduler.go 54.47% <100.00%> (+0.37%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


func shuffle(tableClients []tableClient) {
// use a fixed seed so that runs with the same tables and clients perform similarly across syncs
r := rand.New(rand.NewSource(99))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we always randomize it the same way 🙃 ? Ideally this would be configurable but maybe we can use something from the spec so if the initial shuffle is bad users can modify it. Maybe hash on table/client IDs or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we always randomize it the same way

First the obligatory https://xkcd.com/221/

Maybe hash on table/client IDs or something

Yeah we can do that, so then changing the ordering of tables in your config will cause a different shuffle.

Copy link
Member Author

@hermanschaaf hermanschaaf Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was really necessary, the odds of getting a bad shuffle (at either end of the DFS->Round Robin spectrum) if there are enough table-clients are astronomically small, but in 27c4e22 I added the ability to control it a little bit by changing the ordering of the tables in the config.

@kodiakhq kodiakhq bot merged commit 2b1ba30 into main Sep 19, 2023
@kodiakhq kodiakhq bot deleted the shuffle-scheduler branch September 19, 2023 08:09
kodiakhq bot pushed a commit that referenced this pull request Sep 19, 2023
🤖 I have created a release *beep* *boop*
---


## [4.8.0](v4.7.1...v4.8.0) (2023-09-19)


### Features

* Add Checksums to package.json format ([#1217](#1217)) ([720baae](720baae))
* Add message to package command ([#1216](#1216)) ([44956d9](44956d9))
* Add shuffle scheduler ([#1218](#1218)) ([2b1ba30](2b1ba30))
* Update package command ([#1211](#1211)) ([39fc65e](39fc65e))


### Bug Fixes

* Add schema version to package.json ([#1212](#1212)) ([393c94d](393c94d))
* **deps:** Update github.com/cloudquery/arrow/go/v14 digest to 483f6b2 ([#1209](#1209)) ([179769a](179769a))
* **deps:** Update github.com/cloudquery/arrow/go/v14 digest to ffb7089 ([#1215](#1215)) ([70f20bb](70f20bb))
* Use -dir suffix for plugin package arguments ([#1213](#1213)) ([93f9398](93f9398))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants