Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka Streams: Potential onComplete deadlock #17507

Closed
janlisse opened this issue May 18, 2015 · 10 comments
Closed

Akka Streams: Potential onComplete deadlock #17507

janlisse opened this issue May 18, 2015 · 10 comments
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted
Milestone

Comments

@janlisse
Copy link

Hello Akka-Team,

as proposed by Endre on the Akka mailing-list i'm creating this issue.
I'm currently building a scraper system on top of Akka Streams. I have written a Flow that is able to follow paginated sites and scrape them in a loop.
For this i use a feedback loop with Merge/Unzip. As Endre stated the stream does not deadlock on elements/backpressure but it deadlocks on the completion signal.
When all URL's are processed the stream never completes. OnComplete never gets invoked.
Is this the expected behaviour?
You can find a sample project with a Spec that demonstrates the problem here

@drewhk
Copy link
Member

drewhk commented May 18, 2015

This is the expected behavior, but this is not the desired behavior :) It is a tricky to workaround this.

@drewhk drewhk added the 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted label May 18, 2015
@drewhk drewhk added this to the streams-1.x milestone May 18, 2015
@jrudolph
Copy link
Member

Original mailing-list topic: https://groups.google.com/forum/#!topic/akka-user/_ENOeHyeEF0

@jrudolph
Copy link
Member

In a naive scraping approach you would keep a queue of work items containing the pages you still need to scrape. The termination condition would then be that 1) no work is outstanding and 2) the queue is empty.

Using the stream setup you seem to distribute the work queue into the buffers of the participating stream elements and thus you cannot observe the above conditions at any single point. So if there's a solution, it would probably be to make that information locally accessible somehow by having some kind of "conductor" at some point which has enough information to terminate processing.

@drewhk
Copy link
Member

drewhk commented May 19, 2015

Another option would be to add a "poision pill" element that is sent as the last element in the original stream, and once it is observed at the feedback point it should close the stream. We don't have a built-in takeUntil stage right now but it is not hard to construct one.

@janlisse
Copy link
Author

How do i close the stream manually when the poison pill element is received?

@drewhk
Copy link
Member

drewhk commented May 19, 2015

You only need to close the feedback loop, since the other port of the merge has been closed. You can do this by adding a custom stage as the last stage of the feedback-arc (before the preferred port) that just finishes the stream when it sees the poison-pill.

@jrudolph
Copy link
Member

@drewhk in the simplest case the original stream may only contain one element (the "anchor" to start scraping). If the poison pill would be the next element the stream would be instantly closed.

@drewhk
Copy link
Member

drewhk commented May 19, 2015

ah, ok, it is adding recursively elements. Btw, if the depth limit of the recursion is static, then the best approach would be to unroll the loop. Iteration is hard to express with the current stages we have, I had some ideas around one year ago but was not implemented.

@drewhk
Copy link
Member

drewhk commented May 19, 2015

well, to be more precise, (nested) iteration is easy to express (doesn't need cycles in the graph), unbounded depth iteration is the hard one.

@drewhk
Copy link
Member

drewhk commented Nov 27, 2015

This is not a bug per se, closing.

@drewhk drewhk closed this as completed Nov 27, 2015
@drewhk drewhk modified the milestones: invalid, streams-2.0 Nov 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted
Projects
None yet
Development

No branches or pull requests

3 participants