Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mute/unmute overhead needs reduction #3119

Open
slfritchie opened this issue Mar 11, 2020 · 0 comments
Open

Mute/unmute overhead needs reduction #3119

slfritchie opened this issue Mar 11, 2020 · 0 comments

Comments

@slfritchie
Copy link
Contributor

Is this a bug, feature request, or feedback?

Enhancement/performance improvement

What is the current behavior?

The collective muting & unmuting behavior of DataChannel actors needs to be reduced. The effective startup time of Wallaroo can be dramatically affected by small changes in pipeline structure, such as where a sub-pipeline is .merge()d into a main pipeline.

What is the expected behavior?

Less-than-exponential(-seeming) behavior as the # of Wallaroo workers grows.

What OS and version of Wallaroo are you using?

Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d2038

Steps to reproduce?

See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count2.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.

Test output such as https://gist.github.com/slfritchie/5d6cbcb243d8197f61c659445feb9454 shows a set of results when starting a 2 or 4 or 8 worker cluster whose application pipeline includes 3 sources and two .merge() operations on sub-pipelines. As soon as worker0 polls ready, then a single operation is sent to the first source, and the sink's output is shown with a timestamp.

Relevant times to examine (and consider that each test was run only once):

  1. Time from first worker logging |~~ INIT PHASE IV: Cluster is ready to work! ~~| to last worker's log.
  2. Time for last worker's ready log to the processing time of the 1 work item processed by Wallaroo.

In the source as-is, those times are:

  1. 2 worker: 0.020 sec, 4 worker: 1.105 sec, 8 worker: 2.354 sec
  2. 2 worker: 0.036 sec, 4 worker: 3.307 sec, 8 worker: 7.699 sec

If the source is modified as suggested in the README.md file, moving the .merge(other)call from line 57 to line 62, i.e., immediately before the .to_sink() call, then the times drop significantly.

  1. 2 worker: 0.001 sec, 4 worker: 0.050 sec, 8 worker: 0.012 sec
  2. 2 worker: 1.031 sec(*), 4 worker: 1.118 sec, 8 worker: 1.244 sec

The timing with the (*) label is most anomalous: it's too big compared to the typical latency. But these simple results show that a small change in pipeline definition can change the time of first processed item end-to-end from roughly 10 seconds down to 1.3 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant