-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenated Node 0.8 streams don't get proper backpressure #87
Comments
this seems to be a general problem. This case here has same problem. I'm using EventStream here: _(['ab', 'c\nd', 'fg']).through(es.split()).through(es.join('-')).each(h.log)
// -
// dfg
|
The solution might be to wrap all arguments to functions which accept node streams in the Highland stream constructor. Can you add a test case which demonstrates this problem? If anyone wants to go ahead and make this change, then I'll be happy to merge it :) |
Wrapping them in the constructor was the first thing I tried for the AFAICT, there either needs to be an option you can pass to the Stream constructor to eagerly pipe, or else eager piping needs to become the default. Is there ever a reason why the Stream constructor should not immediately |
Yes, because @pjeby could you put together a pull request with a test that demonstrates how you'd like to use the |
That's tricky. I know the problem happens with
If the So, basically, if you wrap an already-unpaused Node stream, you're gonna have a hard time. I don't know if there's any way to tell in advance that you have an unpaused stream, though, as my Stream-fu is lacking. ;-) |
Ok, I've posted PR #89 for this. I also added code to make the test pass, because checking in code with a failing test just seems so wrong. ;-) Although it's true that piping will resume a paused Node stream, it's also the case that Highland will immediately exert backpressure, so I'm not sure what trouble could actually be caused. In order for there to be a problem, you'd need an opened node stream that was also somehow "lazy" and wasn't going to really open the file until you tried to read from it or something. I'm not aware of any node streams that work that way, though: AFAIK they all do the opening when you open them, and begin producing data (or at least buffering it) unless paused. (That being said, my Stream-fu is yet weak, so perhaps I can learn something new here.) I have also noticed that there are certain signs one can use to detect a stream that has definitely been piped to, or that is definitely in a state where it's already flowing. These are just heuristics though: i.e., looking for a Despite the possible heuristics, however, I think that eager piping is probably the way to go in general, or else with time we'll see lots more issues like this one, for some new example of a stream that needs different heuristics for you to know that it's already flowing. |
the way how Node.js streams are working depends on the version of Node you are using. Before 0.10 the where just pushing events out on you, if you attached 'data'-listener too late, you're screwed. With 0.10 they come in paused mode, but turn in this 0.8. mode as soon, as you listen for 'data' To the topic of eagerness. i'm not that sure about this idea. Highland streams are lazy evaluated, thanks to the pull-based reading. Even Node.js streams are pull-based by default now (with 'readable' + .read() in paused mode and that's a good idea. |
Okay, so is there any reliable way to tell which of those various states a readable stream is in? Because it sounds like that's what's needed. Can we even reliably tell which protocol a stream is following, let alone what state it's in? |
Okay, some more experimenting shows that the problem is indeed specific to 0.8 streams, or 0.10 streams which have had a 'data' event registered, putting them into flowing mode. Tentatively, it looks like if a stream has a Does this make sense? Should I change the pull request accordingly? |
You can still pause them. Im 0.10 its reliable. I wouldnt care much about 0.8 since they are pretty much obsolete
|
@pjeby I've merged your PR, since as you say it immediately exerts back-pressure, and I also think few node streams are expecting to be used in a strictly lazy fashion like Highland. That said, if we can reliably detect a situation where a stream is paused we should do our best not to resume it. My only concern is that another listener might bind to the stream and then we'd lose data (although node 0.10.x streams don't really work with multiple consumers). |
they do, but only via 'data' pushing, since you can attach many listeners to the events. they don't work in pull state, because stream.read removes items from the buffer. afaik. 0.8. streams don't have read anyway. |
@greelgorke yes, that's what I mean, using them in non-flowing mode means you can't really have multiple consumers. Currently we're having to weight the case for strictly lazy evaluation of streams against the possibility of losing data. Since I don't expect many Node streams to be very strict on lazy eval using pause/resume (eg, not opening a file before first data is read), I'm currently thinking the |
Node.js i'll fiddle a bit with that. probably we have a solution already in the code, but just do not use it everywhere. |
Right. Basically there are only two possible cases where eager piping can actually wake up something that wasn't already moving:
AFAIK, Node's built-in streams never do number 2, and number 1 is hard to detect reliably. And as you say, you can still lose data in case number 1 if somebody attaches a pipe or data listener after it's been passed to Highland. It kind of comes down to policy, as to whether Highland should prioritize laziness or not silently dropping data on the floor, in a situation where you can't reliably do both. Whichever way the policy goes, having a clear policy makes it easier for people to reason about its behavior. e.g., if the policy is, "use Either way, guessing what needs to be done is probably a bad idea. As the Zen of Python says, explicit is better than implicit, and in the face of ambiguity, refuse the temptation to guess. ;-) |
I'd be tempted to go with "Node streams aren't reliably lazy, so Highland On 27 May 2014 17:31, pjeby notifications@github.com wrote:
|
Works for me. I took a quick look at @greelgorke's It looks to me like the actual issue there is the way Basically, the issue is that if you resume a pipe synchronously, then your first item is always going to fall on the floor, unless the thing you're piping to is already chained to everything it needs to be chained to. (Which isn't going to happen, because of the order in which chains are built: you'd have to build them right-to-left in order for this to not be a problem.) In general, calls to Highland stream methods should not result in callbacks being fired before the method has a chance to return. But in this case, that's what's happening: It seems the heart of this, at least in the current instance, is that (This is one reason why people are down on jQuery promises vs. Promises/A+, because jQuery promises can fire off callbacks while you're still setting them up. In general, callbacks shouldn't fire I don't know if there are any other parts of Highland besides I think the correct fix is probably to change those tests to check for To put it another way, I don't think that part of If you like, I can take a look at what would be required to change (I think making resume always async will also simplify the code a bit, since there won't be any need to keep track of whether resume is recursing or how many times it was called -- it'll simply be called async each time.) |
BTW, the tests currently fail for paused Node streams concatenated in node 0.11.x - it appears they get resumed but in 0.10.x they don't. |
The test is checking the wrong thing: the stream is resumed, yes, but it's also immediately paused again. That's why the In Node 0.10, pause and resume are advisory, and once a stream is in "flowing" mode, it stays that way. In 0.11, pause and resume actually pause and resume the flowing mode. Adding some console.logs, what I see that failing test doing on 0.11 is:
In other words, it actually works correctly. The test is overspecified by insisting that This is what I mean about the tests in general overspecifying what should happen in the current event loop pass. It looks to me that Node 0.11 is actually doing it right in terms of its event timing now. |
If you use
_.concat
to concatenate two event-driven Node streams (e.g. a pair ofgulp.src()
streams), you can lose items unless you explicitly.pipe()
the streams to new Highland Streams. That is, the following will not always output everything matched by both patterns:But this will:
The problem appears to be caused by Highland's lazy initialization of converted Node streams, causing some items to be dropped. In particular, the stream being concatenated isn't listened to until the entire first stream has been consumed. (Hence, the workaround of explicitly piping.)
The text was updated successfully, but these errors were encountered: