New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculatively schedule operator sequences #226

Merged
merged 2 commits into from Feb 8, 2019

Conversation

Projects
None yet
1 participant
@frankmcsherry
Copy link
Member

frankmcsherry commented Feb 8, 2019

This PR causes a subgraph to schedule operators whose inputs are attached to operator outputs that have produced data, without awaiting positive confirmation from the channel that data exist. The intent is that there is a fair chance that there will be data, and awaiting the next worker.step() to discover this can substantially increase the critical path.

With the fix in place, the examples/event_driven.rs example, which sets up X dataflows each a sequence of Y map operators, has its times significantly reduced:

cargo run --release --example event_driven -- 100 100 record

Pre-fix:

3.376066625s    round 9997 complete in 102 steps
3.376272991s    round 9998 complete in 102 steps
3.376481583s    round 9999 complete in 102 steps
3.376684118s    round 10000 complete in 102 steps
3.376884082s    round 10001 complete in 102 steps
3.377099496s    round 10002 complete in 102 steps
3.377305554s    round 10003 complete in 102 steps

Post-fix:

2.490122678s    round 9997 complete in 2 steps
2.490269231s    round 9998 complete in 2 steps
2.490410658s    round 9999 complete in 2 steps
2.490558343s    round 10000 complete in 2 steps
2.490703365s    round 10001 complete in 2 steps
2.490847676s    round 10002 complete in 2 steps
2.490988816s    round 10003 complete in 2 steps

The potential downside is false positive scheduling, which is not harmful from a correctness point of view, but could be an efficiency hit in some cases (e.g. perhaps the examples/pingpong.rs example, which intentionally avoids sending data to the same worker). There is also more logging burden with additional scheduling events.

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

frankmcsherry commented Feb 8, 2019

The next commit guards progress-based operator scheduling by the operator's .notify bit. Operators that opt out of notification (e.g. map, filter, concat) will not be scheduled on this basis. At the moment, we still do all of the other work on their behalf (e.g. maintain their frontier, etc). We could remove this, but it would provide a different experience to the operator, namely moving the distinction from "don't schedule me based on frontier progress" to "provide incorrect frontier information". For some operators this is fine, but for others it is probably less fine.

The improvements are to

cargo run --release --example event_driven -- 100 100

which is largely a chain of map operators.

Pre-fix:

1.56165669s     round 9997 complete in 2 steps
1.562012838s    round 9998 complete in 2 steps
1.56222355s     round 9999 complete in 2 steps
1.562436298s    round 10000 complete in 2 steps
1.562652235s    round 10001 complete in 2 steps
1.562978916s    round 10002 complete in 2 steps
1.563274899s    round 10003 complete in 2 steps

Post-fix:

1.114187328s    round 9997 complete in 2 steps
1.114280871s    round 9998 complete in 2 steps
1.114381808s    round 9999 complete in 2 steps
1.114482893s    round 10000 complete in 2 steps
1.114592772s    round 10001 complete in 2 steps
1.114701052s    round 10002 complete in 2 steps
1.114809146s    round 10003 complete in 2 steps

There is some new liveness risk, that operators might set notify() to false and nonetheless expect to be scheduled for progress information. That would be a bug on their part, but not sure how best to shake these out other than pushing the changes.

@frankmcsherry frankmcsherry merged commit 139992f into master Feb 8, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@frankmcsherry frankmcsherry deleted the event_sequence branch Feb 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment