Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential 0.9 timeline #135

Open
ryzhyk opened this issue Feb 7, 2019 · 7 comments
Open

Differential 0.9 timeline #135

ryzhyk opened this issue Feb 7, 2019 · 7 comments

Comments

@ryzhyk
Copy link
Contributor

ryzhyk commented Feb 7, 2019

Just wondering if there is a timeline for releasing differential-dataflow 0.9, including event-driven patches. I am asking because this feature is critical for my users. If 0.9 is not happening soon, I will just create my own temporary release by forking differential and timely, but on the other hand if it's on its way, I'll just wait for @frankmcsherry to do all the work :)

Thanks!

@frankmcsherry
Copy link
Member

frankmcsherry commented Feb 8, 2019

There is not yet a timeline, but I could start that. We've still been shaking out some liveness bugs in various dustier corners (recently, Scheduler::drop()).

Timely dataflow has two outstanding issues I'm aware of that I would like to fix. Both are performance, rather than correctness, issues.

  1. The scheduler has a bit of a performance regression in that the worker does not notice the emergence of new messages until it next goes around worker.step(). This means that in pipelines like

    A -> B -> C -> D -> ..
    

    it takes as many calls to worker.step() as the longest directed path. Previously, when workers polled they would hit all of these in one worker.step().

    There is probably a natural fix which is to notice when an operator produces output and speculatively schedule any operators attached to that output port, even without receiving positive confirmation that the data are actually destined for this worker. We could further restrict this to only apply to pipeline edges.

  2. The worker currently has a bit of a memory leak in that when dataflows are completed the worker does not clean up its "channel id to operator address" map that it uses to activate operators for receipt. This should be pretty easy to do, but I haven't sorted out the "best way" yet.

I'd also like to put the progress tracking hack in here somewhere, which improves scaling by quite a lot at the potential cost of additional minimal latency.

@frankmcsherry
Copy link
Member

The first issue is addressed in TimelyDataflow/timely-dataflow#226.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Feb 8, 2019

Thanks for the update!

@frankmcsherry
Copy link
Member

The second issue should be addressed in TimelyDataflow/timely-dataflow#227.

@frankmcsherry
Copy link
Member

Both of these have landed in timely master. It would be great to get a bit of testing on them, though. If you take them for a spin or two and check out that they work for you, that would be great. If things get any better with them, that would be great to know too!

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Feb 9, 2019

Awesome, thanks! Will try to test it before the end of the weekend.

@ryzhyk
Copy link
Contributor Author

ryzhyk commented Feb 10, 2019

I confirm that all my tests pass when using the latest master of timely and differential.

Performance-wise, I see things moving much, much faster on huge dataflow graphs. A computation that used to take 5 seconds to push a small update through the graph now takes a fraction of a second. Just to be clear, I already observed this speedup when test-driving you event-driver branch a month or so ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants