Trampoline-based promise chaining (with reduced nextTick usage) #102

Closed
wants to merge 19 commits into
from

3 participants

@Twisol

Continued from #87. Sorry about the monolithic commit; I started from an empty file and built up from there.

@Twisol Twisol Rewrite to utilize a trampline-based method of asynchronous resolution.
By utilizing a trampoline, promises are prevented from blowing the call stack.
This implementation also allows consecutive promises to be handled in the
same tick, improving performance.
d3bf3ca
@briancavalier
The Javascript Architectural Toolkit member

No worries. It's actually easier to review one diff like this sometimes.

@Twisol

I said this in the previous pull request, but I should probably say it again here: All tests are green except the poll.js tests. Those timeout, and I'm not yet sure why.

@Twisol

It bothers me a bit that when.reject() isn't actually shorthand for when.defer().reject(). The deferred itself has the nice property that, regardless of whether you're passing a value in at the start, or an intermediate value in the middle, the semantics are the same. The when.reject() function breaks that symmetry by evaluating reasons that are promises, when the actual reject function just passes along whatever it is given.

I think either behavior is valid, but it would be best to pick one and be consistent. Personally, I'm in favor of always evaluating reasons that are promises, since that also reinforces the symmetry between success and failure paths, as well as a similar symmetry between throw and return. Unfortunately, the Promises/A+ spec disagrees with my choice. I've changed my mind - it's probably best to just consistently pass the reason through verbatim.

This behavior occurs on both my modified branch and the original async-resolutions branch.

Example, using the fakeResolved from test/defer.js:

> when.defer().reject(fakeResolved(42)).then(undefined, function(x) {console.log(x)});
{ then: [Function] }

> when.reject(fakeResolved(42)).then(undefined, function(x) {console.log(x)});
42
The Javascript Architectural Toolkit member

This difference is known. In some ways, I like it: it's like flatMap or join that always produces a rejection whose reason is a non-promise. That said, the inconsistency does bother me.

@Twisol Twisol Optimize promise chaining.
If the necessary handler to process a promise value isn't provided, return the
same promise rather than unnecessarily creating a new one.

Example: .then(f).otherwise(g). If f does not fail, then the promise should not
be unecessarily unboxed and reboxed when skipping g.
ea1019d
@Twisol

I've made some progress toward solving the failing poll.js tests, but I'm still mystified as to what's actually happening. You can temporarily prevent the tests from timing out by modifying cancelable.js:

// Change this:
deferred.promise.then(delegate.resolve, delegate.reject, delegate.progress);

// To this:
deferred.promise.then(
  function(x) { delegate.resolve(x);  },
  function(x) { delegate.reject(x);   },
  function(x) { delegate.progress(x); });

A similar change can also be made in the first poll.js test, with delay(100).then(p.cancel).

The only change is that the return values are suppressed. Somehow, if a promise is returned by a callback in certain situations, it causes things to go haywire. It's obvious that this fix is a hacky workaround at best, and that there's a behavioral bug in my implementation. Problem is, I'm not sure how to write a test against this.

What's more, this seems to be related to my earlier problem with an asynchronous infinite loop. It appears my solution went too far in the other direction. (You can recreate this issue by adding returns to the callbacks in liveThen.)

@Twisol

Hahaha, I feel like a moron. I'm banging my head against this problem, and it turns out I accidentally solved it already with one of my refactorings above ("Normalize naming convention"). Apparently the issue was that _handlers was being pulled out from underneath the callback fire bounces back to the trampoline. Closing over _handlers fixed it, but don't ask me why.

With that, all tests are green!

@Twisol

Hey @briancavalier, I don't know if this is the best place to put this, but I have a concern about your approach to nextTick minimization. If a promise is resolved from within the handler chain of another promise, both promises will execute in an interleaved fashion, without any nextTick delay.

Example:

(function(old) {
  process.nextTick = function(f) {
    old(function() {
      console.log("Tick!");
      f();
    });
  };
})(process.nextTick);

var when = require("when");

var d1 = when.defer();
var d2 = when.defer();

d1.then(function(x) {
  console.log(x);
  d2.resolve(x * 8);
});

d2.then(function(x) {
  console.log(x);
});

d1.resolve(20);

In dev-200, the output is "Tick! 20 160" (with newlines as necessary). In my branch, the output is "Tick! 20 Tick! 160".

@briancavalier
The Javascript Architectural Toolkit member

Yeah, this is actually intentional :)

The primary concern re: interleaving hazards is: each handler must execute in its own, clean, stack to avoid data sharing hazards due to variables captured on the stack. The key is the word "clean". Handlers must also always run in the order they were registered.

So, the theory I'm basing this shared trampoline on is: While we're in not executing the trampoline, one way to ensure a function is called with a clean stack is to push it into next tick to be processed by the trampoline--we basically have to do this. Once we're in the trampoline, though, the stack has been cleared, and our trampoline effectively becomes the new basis for "clean". So we can call any/all handlers we want without fear of interference due to captured variables.

And, because it's all still strictly ordered in a single queue, I believe it preserves the handler ordering constraint.

Does that all sound right? It's def good to have you sanity check my thinking here :)

@Twisol

I'm not at home right now, so I'll provide example code later, but I wanted to reply. ;)

Your rationale mimics mine; I also have a single global execution queue. The additional thing I do is delay resolutions regardless of whether we're in the trampoline or not. What if you have something like this?

p1.then(f).then(g)
p2.then(h)
function f() { p2.resolve() }

I think you'll see an order of fhg rather than fgh. But again, I'll check when I get home.

@Twisol

Confirmed!

Yours: f h g
Mine: f g h

when = require("when");
var d1 = when.defer();
var d2 = when.defer();

function f() {
  console.log("f");
  d2.resolve();
}
function g() { console.log("g"); }
function h() { console.log("h"); }

d1.then(f).then(g);
d2.then(h);

d1.resolve();
@briancavalier
The Javascript Architectural Toolkit member

Very interesting test case, well done!

So, the question is: Is this a problem? Short answer is: I don't think so :) but I would love some sanity checking! Here we go ...

Earlier, I said this:

Handlers must also always run in the order they were registered.

This statement was based on Promises/A+ 3.2.5, which was specifically written to refer to the handlers registered with the same promise. I think my mentioning it in the discussion about the shared trampoling was confusing since it probably implied a global handler ordering! D'oh! Sorry about that.

Currently, there is no requirement for a global ordering of handlers. Interestingly enough, that means that fhg is acceptable. So, that made me want to try your test in some other promise impls to see what they do:

when.js dev-200: fhg
when.js 1.8.0: fhg
avow 1.0: fhg
Q: fhg

Interesting! Ok, so Promises/A+ doesn't specify global ordering, only per-promise ordering, and several implementations seem to agree it's ok. Hmmm ... still a little nagging tho!

So then, after a bit more thinking, I came up with this modified example:

var when, d1, d2, delay;

when = require("./when");
delay = require('./delay');

d1 = when.defer();
d2 = when.defer();

d1.promise.then(f).then(g);
d2.promise.then(h);

d1.resolve();

function f() {
  console.log("f");
  d2.resolve();
  // Delay next link in the promise chain, i.e. g
  return delay(1000);
}
function g() { console.log("g"); }
function h() { console.log("h"); }

This is identical except that f() returns a delay. I'm betting this will print fhg in your impl, or rather, I believe that it should. Even though the handlers are still registered in the order fgh, it doesn't make sense to me that h should be forced to wait 1 second to run after g.

Do you think that's a valid reasoning? I'm still noodling on it a bit, but it seems pretty convincing to me that global ordering based on cross-promise then order isn't something we need to worry about.

Thoughts? Glad we're discussing this before 2.0 has been released :)

@Twisol

Hmm... It bothers me on a fundamental level that the effects of multiple stimuli are allowed to interleave. One stimulus should be fully processed before another stimulus is handled.

What I think is happening here, is that interleaving stimuli creates a form of pre-emptive multitasking. Like threads, but only while control is being passed from one handler to the next. Data can be manipulated between two handlers without any hint of its source, making it difficult to track the behavior of any given stimulus. It basically creates code injection points between every promise... I'm not sure I'm explaining myself well, but the more I think about it, the grosser it feels.

I'll try to come up with another code snippet.

@Twisol

Another thing I notice is that your approach goes down the dependency graph breadth-first, while mine is depth-first. That plays into the same interleaving-of-handlers thing, but within a single stimulus. I have an easy example of that kind of issue here:

(function() {
  var when = require("when");

  var x = 42;
  var d1 = when.defer();

  // one user of this promise from somewhere
  d1.then(f).then(g);

  function f() { console.log('f: ', x); }
  function g() { console.log('g: ', x); }

  // another user elsewhere in the codebase
  d1.then(injection);

  function injection() { x = 101; console.log("inject"); }


  d1.resolve();
})();

Mine: f 42, g 42, inject
Yours: f 42, inject, g 101

In this example, a process can be influenced by a seemingly unrelated one that was thenned elsewhere in the program. These threads of execution should be immune to such injection attacks. (Even if it's not an "attack", it can easily cause confusion.)

@Twisol

Here's an example of how two separate stimuli can interfere with eachother. Interestingly, it's just a specific case of the above breadth-first problem... which makes sense with hindsight.

(function() {
  var when = require("when");

  var x = 42;
  var d1 = when.defer();

  // one user of this promise from somewhere
  d1.then(f).then(g).then(h);

  function f() { console.log('f: ', x); }
  function g() { console.log('g: ', x); }
  function h() { console.log('h: ', x); }

  // another user elsewhere in the codebase
  var d2 = when.defer();
  d2.then(injection2);
  d1.then(injection1);

  function injection2() { x = 101; console.log('injection2'); }
  function injection1() { d2.resolve(); console.log('injection1'); }


  d1.resolve();
})();

Notice how a third handler had to be put on the first user's chain so that the injection would have time to propagate through the queue.

Yours:

f:  42
injection1
g:  42
injection2
h:  101

Mine:

f:  42
g:  42
h:  42
injection1
injection2
@Twisol

A slightly more convincing version of the above, where the two stimuli are entirely unrelated except for when they're resolved:

(function() {
  var when = require("when");

  var x = 42;
  var d1 = when.defer();

  // one user of this promise from somewhere
  d1.then(f).then(g);

  function f() { console.log('f: ', x); }
  function g() { console.log('g: ', x); }

  // another user elsewhere in the codebase
  var d2 = when.defer();
  d2.then(injection);

  function injection() { x = 101; console.log('injection'); }


  d1.resolve();
  d2.resolve();
})();

I promise (ha ha) that this is my last example. At least, until you respond. 😉

@unscriptable
The Javascript Architectural Toolkit member

Very interesting, @Twisol! Any idea if there is a performance difference between the breadth-first or depth-first algorithms?

@Twisol

@unscriptable, I did ponder about that a little bit. I think in terms of speed it's negligible, because unlike a depth/breadth-first search, you have to traverse the whole graph either way. As for space, Brian's approach saves by pushing only one function onto the trampoline per promise. My approach is essentially required to push each one individually in order to preserve the depth-first ordering - I can't just run all of the handlers in the same "bounce".

@briancavalier
The Javascript Architectural Toolkit member

Performance: I did some quick, informal tests. dev-200 is ~10% faster for the cases (again, fairly limited) I tested. That's close enough to make me think that @Twisol's could be brought into the same ballpark. I bet the space is roughly equivalent, or could be made to be: dev-200 also basically puts 1 function per handler into some array somewhere. So, I'm betting they both could end up being O(n) for space where n = number of handlers.

Utimately, performance would come down to the total number of actual ticks are used to run all the pending handlers. Otherwise, it's just a matter of blasting through an array as fast as possible. Seems like the two algorithms minimize ticks similarly.

BFS vs. DFS: I'm not actually saying that DFS is wrong, or even that it's worse than BFS. Sorry if I came across that way!

What I'm getting at is that since DFS and BFS are both ordered node visitations, it's always possible to construct "injection attacks" (although not really attacks, as @Twisol noted) where an earlier-visited node (handler) modifies a variable shared with a later-visited node. A random traversal (although incorrect for other obvious reasons!) would have the same issue.

I don't think we've yet discovered criteria by which we can determine which traversal is "more correct", if one of them even is. For example, a DFS would visit z before foo:

p.then(a).then(b)....then(z);
p.then(foo);

It's not clear to me that visiting z before foo is correct--again, "correct" by what criteria? And to be completely fair, it's equally unclear to me whether a BFS visiting z before foo is "more correct" here (even though my current mental bias makes we want to think it is):

p.then(a).then(foo);
p.then(b);
...
p.then(z);

Here's how I think we should proceed:

  1. @Twisol, if you're willing to dive into working out some concrete criteria by we could determine "correctness", I'd certainly be willing to help (as best I can given time), review, etc.
  2. We know that dev-200 behaves like 1.8.0 and like other established promise libs. I am also quite happy with the simplicity and performance of the current dev-200 code. I'm inclined to stay with it for dev-200 until we have concrete evidence by which to evaluate BFS vs. DFS (or vs. some other as-yet-imagined approach that @Twisol might come up with :) )
@Twisol

dev-200 is ~10% faster for the cases (again, fairly limited) I tested.

I also enforce a nextTick when any deferred is resolved, which may account for some of that difference. I count this as less important than the breadth-versus-depth issue, but without enforcing a nextTick, you allow for event loop starvation that's much more outside the user's control. The starvation we previously discussed was easily solved by the user putting delays into their processes at convenient points, but this doesn't provide for any convenient points - the starvation is caused by the introduction of another stimulus, rather than anything internal to the process.

It also interacts poorly with the breadth-vs-depth issue, since supposedly-independent stimuli might actually interfere with each other.

I suspect that both are still much faster than the non-nextTick-reduced version, anyway.

What I'm getting at is that since DFS and BFS are both ordered node visitations, it's always possible to construct "injection attacks" (although not really attacks, as @Twisol noted) where an earlier-visited node (handler) modifies a variable shared with a later-visited node. A random traversal (although incorrect for other obvious reasons!) would have the same issue.

Indeed, both allow you to then onto a promise and inject yourself into the evaluation order. The good thing about DFS, which BFS doesn't provide, is that injection "attacks" can only occur at promises you actually pass outside your control. If you don't pass around a promise, you have a hard guarantee that no mutable state will change in-between two handlers; the only thing you have to worry about are the other handlers thenned onto the same promise before you. This property makes it easier to reason about stimuli, because you can statically analyze the dependency tree and its effects: problematic mutations are constrained to the forks in the tree.

@Twisol, if you're willing to dive into working out some concrete criteria by we could determine "correctness", I'd certainly be willing to help (as best I can given time), review, etc.

Well... I'm kind of biased towards my own approach, obviously. 😛 I hope it doesn't sound like I'm being defensive of my code, but I'm having a hard time coming up with reasons to prefer the dev-200 version. I can at least sum up the reasons to prefer the version in this pull request.

  1. The opportunities to preempt a process should be reduced to visible and identifiable points of the dependency tree.
  2. Unrelated stimuli should not be able to starve the event queue.
@briancavalier
The Javascript Architectural Toolkit member

I'm kind of biased towards my own approach

Understood, it's natural ... I'm fighting against the same :)

I hope it doesn't sound like I'm being defensive of my code

Not at all. I completely respect your approach.

We can solve event loop starvation in dev-200 easily. I have code for it, but removed it for now--I may still reinstate it before 2.0, but not sure yet.

@Twisol

We can solve event loop starvation in dev-200 easily. I have code for it, but removed it for now--I may still reinstate it before 2.0, but not sure yet.

That would be the machinery that counts how many handlers have been called, right? I rather dislike writing code that depends on arbitrary fudge factors like 10000, no disrespect meant; that's something that the user code knows far more about. It also introduces nondeterminacy in certain edge cases. I grant that they're unlikely edge cases, but that makes any resultant bugs all the more obscure. 😧

@Twisol

Hey @briancavalier,

I've been critiquing your approach to the problem, but you've been relatively silent about my code. Is there any behavior you think is incorrect, or some other problem you can think of? I think it would help me argue more effectively one way or the other.

I'd also like to hear other people weigh in, for the same reason. At present, I'm not sure what more to say that I haven't already said.

(I'll also hang out on #cujojs a bit more; thanks for the prod in the other issue.)

@Twisol

Not sure if this was the right file for these tests - I couldn't decide between test/promise.js and test/defer.js.

@briancavalier
The Javascript Architectural Toolkit member

I'm hoping it today .. all this (excellent) discussion left me without time to give it a proper review over the weekend.

@briancavalier
The Javascript Architectural Toolkit member

@Twisol, could you rebase this against the lastest async-resolutions branch? Sorry for the hassle.

@Twisol

No worries. I had to do a merge instead of a rebase, though, since my commits were already on GitHub.

@briancavalier
The Javascript Architectural Toolkit member

Thanks, merges cleanly for me locally now.

@briancavalier
The Javascript Architectural Toolkit member

FWIW, in every current VM we care about, this will be way faster than Function.prototype.bind anyway :)

@briancavalier
The Javascript Architectural Toolkit member

This is a nice optimization. It occurred to me a while back in some Promises/A+ discussions that this kind of thing should be possible. This is def +1 if you can confirm that we never expose the promises returned by fulfilled and rejected outside of when.js.

@briancavalier
The Javascript Architectural Toolkit member

Similarly ...

@briancavalier
The Javascript Architectural Toolkit member

I'll write more about some of the style/ordering changes in an overall comment ... it's tough with lots of small commits.

@briancavalier
The Javascript Architectural Toolkit member

In general, I really like this approach to progress--treating it like a promise chain. I'm still not a fan of the way progress is kind-of-specified in Promises/A, but treating it this way internally is nice.

@briancavalier
The Javascript Architectural Toolkit member

What's your reason for using pop here rather than a for/while and then clobbering/emptying the array? Typically, the latter will be a good bit faster, but I realize you may have a reason for preferring pop here.

The Javascript Architectural Toolkit member

After studying the code a bit more, I see why pop is necessary here. Hmmm, it'd be nice if there were a way to avoid it.

The trampoline stack is emulating an actual callstack, so each record has to be added and removed ASAP. The only difference is that I push all handlers of a promise on at once, rather than waiting to return to the level of that promise again.

One possible optimization would be to keep a "stack pointer". Instead of explicitly pushing and popping throughout the process, I could track the position of the current record, and simply overwrite whatever is above it (or push if sp == stack.length). Once everything is processed, then the list would be emptied. Kind of complicates the code, but perhaps not unreasonably so.

The Javascript Architectural Toolkit member

Right, after I stared at it more, it was clear. I think the extra complication would only be worth it if it led to a significant perf increase.

@briancavalier
The Javascript Architectural Toolkit member

Can you explain the purpose of swapping here? It doesn't change the publicly accessible deferred.promise, nor any reference to it that a caller might have already saved, so I'm wondering what the purpose/benefit is.

If you call then after you've resolved the deferred, we need to have access to this promise (or at least the value you resolved with) - see deadThen a few lines earlier.

The Javascript Architectural Toolkit member

Got it, thanks.

@briancavalier
The Javascript Architectural Toolkit member

It'd be nice if this loop were avoidable by using concat or Array.proto.push.apply instead.

The Javascript Architectural Toolkit member

In a couple quick tests, push(pop()) is basically as fast as anything else I tried in v8. Pretty interesting! Other VMs may not be comparable, tho, not sure.

@briancavalier
The Javascript Architectural Toolkit member

This falls into the "browsers are a PITA" category, but in some quick tests, the perf difference seems to be a good bit larger in Safari WebKit. In v8 (both chrome and node), dev-200 is faster by only 10%, but in Safari, it's faster by 4x. Note, though, that this is without a setImmediate polyfill, so Safari's setTimeout may just suck. Another possible candidate might be the push/pop loops ... IIRC, v8 has optimized the heck out of those, whereas some other VM haven't yet. Hard to say without more investigation, tho.

I haven't tried IE or FF yet.

@briancavalier
The Javascript Architectural Toolkit member

FF's setTimeout is all over the map, so performance varies wildly, but ballpark, dev-200 is approx 1.5-2x faster in FF.

@briancavalier
The Javascript Architectural Toolkit member

It's a little tricky to add review comments inline in the commits, since it started as a total rewrite and now has lots of subsequent smaller commits, but I tried to add some comments where I could. I'll add the rest here shortly.

@Twisol

I hope the use of unshift and apply doesn't outweigh the benefits of concat here.

The Javascript Architectural Toolkit member

Hard to say, honestly. Only way to know would be to setup some tests and try it.

@briancavalier
The Javascript Architectural Toolkit member

I'm having a hard time matching this up with my mental model that a promise's fate is sealed as soon as deferred.resolve()/reject() returns. Can you help me understand why _then is replaced inside invoke rather than outside?

Sure. When you resolve, you schedule the stimulus to occur, but until that tick comes along, it's still a future timeline. You can continue to use 'then' until the timeline begins executing because the semantics of 'then' - specifically, that handlers never execute in the same tick as when they were added - are preserved.

In other words, it doesn't matter when, within a tick, you resolve a deferred. Nothing actually happens until the stimulus begins to propagate.

The Javascript Architectural Toolkit member

Yep, got it ... similar concept as in dev-200. The machinery just looks different, so wanted to make sure I understood.

@briancavalier
The Javascript Architectural Toolkit member

Things I like a lot

  1. The return self optimization in fulfilled()/rejected(). I'm all for this as long as we can verify it's "safe" (you may have already verified).
  2. Progress handling. I really like this, and would love to get it into dev-200.

Things I am still unsure about

  1. The DFS trampoline.

Pros:

  1. May provide a more intuitive ordering for some cases (though the ordering in question is not specified by Promises/A+).
  2. Code is actually quite simple (after I got my head around it). In fairness, the dev-200 code is simple as well.

Cons:

  1. DFS ordering doesn't match 1.x ordering, or the ordering of other popular libraries. Opened Promises/A+ discussion here
  2. Slower performance than dev-200 (with and w/o setImmediate polyfill): 10% in v8, up to 4x in Safari Webkit, 1.5-2x in Mozilla. While performance definitely isn't the most important criteria, in the absense of other criteria it's a valid concern.

I still feel that real evidence is needed in order to justify changing the current ordering. I am absolutely willing to ditch the dev-200 approach and switch if there is evidence. To that end, I posed the question to the Promises/A+ group.

Style

We'll need to bring the code style in line with cujojs conventions before merging parts of it. The main things I noticed are:

  1. Single var statement per scope: var x, y, z not var x; var y; var z
  2. "More important" things before "less important" things. Examples: 1) public API at the top of the file, utility functions at the bottom, 2) in a function: mainline code followed by return, followed by supporting local functions.

Logistics

There are lots of changes unrelated to the main trampoline approach. I think the best strategy going forward is to separate out smaller, focused, atomic changes that we'd like to merge, and create new pull requests against dev-200 for them. Here are the 3 candidates I see:

  1. return self optimization
  2. Progress handling
  3. DFS trampoline
@unscriptable
The Javascript Architectural Toolkit member

Slightly OT: I'm thinking of refactoring poly.js's setImmediate polyfill to use the shotgun approach to ensure it's always as fast as possible. That will hopefully resolve some of the performance issues described here.

@Twisol

I think the best strategy going forward is to separate out smaller, focused, atomic changes that we'd like to merge, and create new pull requests against dev-200 for them

I'm actually not sure what the best procedure to do that is. I have almost no experience with rebase and cherry-pick, either or both of which I imagine would be involved. A nudge in the right direction would be much appreciated!

@briancavalier
The Javascript Architectural Toolkit member

Closing, as we've incorporated the ideas from this into 2.0: nicer progress infrastructure, return self optimization, and a trampoline (although a different trampoline than the one that is implemented here). Thanks @Twisol for the awesome contributions!

@Twisol

Glad I could help!

@Twisol Twisol referenced this pull request in fantasyland/fantasy-land Apr 12, 2013
Closed

Ordering of parallel `chain`s #12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment