New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How comes it so much faster than LoDash? #137
Comments
Hey @pozadi! Great questions :)
Thanks. They are micro-benchmarks, so the intent is only to provide a feel for how much overhead each lib is likely to introduce. The difference in real applications will vary. The tests try to give each lib the best chance to run as fast as possible. (if there's a better way to setup the tests for kefir, please let me know!)
I definitely don't recommend most as a replacement for lodash. Lodash is the way to go for synchronous collection processing. Most is intended for event-based reactive programming.
Most is designed to be extremely optimization and inlining friendly. That's probably the biggest factor, since optimized/inlined code can be easily 100x faster than non-optimized. So, another way to think about this question is: What are other libs doing that is preventing them from being as fast as they could be? For example, we've tried really hard to ensure that the internal combinator objects (e.g. Map, Filter, etc.) have properties that can always be accessed via fast offsets, and never get turned into hashtables, and never change shape after construction to avoid polymorphism penalties. We've kept methods and functions tiny for inlining, and tried to ensure they won't be deoptimized for type check failures, etc, etc. If you run code like the following in iojs 2.0 and turn on var n = 1000000;
var a = new Array(n);
for(var i = 0; i< a.length; ++i) {
a[i] = i;
}
function log(x) {
console.log(x);
}
most.from(a).filter(even).map(add1).reduce(sum, 0)
.then(log); you'll see that nearly everything gets optimized and/or inlined, and then never gets deoptimized (there's probably an initial deopt, as v8 figures out some stuff). For example, if you just turn on
Nearly everything got inlined and never had to be deoptimized before the final result was printed. If you try something similar with other libs, including lodash, you'll see that the VM either isn't able to inline as much, or ends up having to deopt a lot. There are other factors as well. There is almost no scope chain capturing, which can be surprisingly expensive. Try/catch has been isolated to very tiny helper functions (many (all?) current generation VMs can't optimize try/catch, but future VMs will be able to, e.g. v8 Turbofan). The map and filter fusion helps somewhat (interestingly, it can also hurt performance in certain cases, but seems to be a win on average). We also optimize for the single observer case, since that's by far the most common. Some of this stuff has proven to be completely unintuitive, and we've spent a good amount time profiling code to find and fix performance issues, and to compare various approaches. |
Thanks, this is extremely helpful! I've never tried to use tools to detect deopt/inlining, thought they are too complex for me (e.g. require knowledge of assembler or something), but calling
Doesn't look like so :)
Yes, didn't mean to actually use most in place of lodash. I wanted to ask if the tests show that there should be opportunities for optimization in lodash? Or the results are expected because, for example, lodash builds an array as the final result, while most just call subscriber with each value, or something like that.
I think I've tried the fusion at some point, but it didn't gave much, so I decided not to add complication to the code, if I remember right.
That interesting. Could you tell more about that, or give a link to the code piece? I've tried something like: if (subscribers.length === 1) {
subscribers[0]();
} else {
// the loop...
} But turned out that a loop with a single element array works just as fast (which pretty intuitive, anyway). |
Hiya! The For a bit more on the inlining topic from the lodash side see this thread on jsblocks. The TL;DR is that the lodash chaining implementation of shortcut fusion relies on a loop which calls a list of callbacks. This prevents inlining unless the engine supports polymorphic function inlining. Chakra, the JS engine in Microsoft Edge, supports this kind of inlining up to something like 4 functions. The win here though isn't really the function inlining, as that gets busted pretty easily when you go beyond trivial _(large100kArray).map(add).filter(even).take(50).value();
// => 100 iterations performed instead of 200,000 iterations The non-chained form of lodash will benefit from inlining but won't get the bigger win of shortcut fusion. lodash's chaining syntax has some baggage because it tracks normal chaining and lazy chaining to be able to bailout to normal chaining for unsupported value types /callback signatures or mixed method (lazy + regular) sequences. Because lodash chained execution is deferred until an implicit or explicit var seq = _([]).map(add).filter(even).take(50);
// then later, maybe in another function
return seq.plant(newArray).value(); This is the gist behind lodash's composed performance being a bit better. For a different take there's jsblocks which uses method compilation to transform sequences in a way to benefit from inlining as well. A side note, engines are starting to optimize the try-block of try-catches. I know FF and Chakra (in MS Edge) have optimizations to avoid de-opts until an error is thrown. |
Hey @jdalton, very happy to see the King of Optimization is on the case! :) |
Wow, thanks for all the info, @jdalton. That's the first time I've seen jsblocks' compilation. Very interesting stuff--I need to take a closer look. Yeah, I agree that the inlining won't always kick in to the degree it is in some of the perf tests. When it does, though, the benefit seems to be pretty good. When it doesn't, most is still getting highly optimized and almost never deopt.
For sure. Pipelining provides a huge win over non-pipelined (e.g. Array). I think all of the libs are pipelining the in all of the perf tests. The reactive libs basically have to work that way since they deal with infinite asynchronous event sequences and can't wait for the last event before moving on to the next step, and they don't need to build large intermediate data structures (I don't know much about the internals of highland, but I'm guessing it pipelines, too). All the libs should be able to take advantage of, for example, the Since they're all pipelining, it's likely a confluence of other things that are contributing to most's perf, and optimization is probably a big one of those.
Oh cool, I didn't know about FF and Chakra Edge having those optimizations already. TIL :) |
I haven't looked closely at lodash's pipelining, and specifically at how
I'm also not completely convinced it's a worthwhile tradeoff, but the code isn't too bad, so I'm willing to leave it in for while and see how it performs over time.
Basically, by default, most doesn't keep subscription lists at each node. Multicast subscriptions are opt-in (some stream types enable shared subscriptions by default, like |
lodash doesn't currently support pipelining for
Same here. It's the shortcut bit, that it enables, that's the big win. |
Thanks @briancavalier and @jdalton for all the interesting stuff you are posting, this will certainly help me and others to improve performance in whatever we are working on :) To me it looks like the major reason of great Most's performance is developing with performance as a target from the beginning. I was trying to do similar with Kefir, but on lower level — only by looking on the benchmarks, and applying my intuition on how v8 should work. But in Most you looked on how it actually works internally, which enabled more optimization possibilities. |
This is misleading and just another reason microbenchmarks don't tell the whole story. In reality you would not have a case where those functions always get passed the same function, or even 4 same functions. What gets inlined would be very different if you had fed realistic type feedback before running the benchmark (say 5 different functions to each of these methods although just 1 additional in v8 is enough to ruin it but in case you run chakra). Same mistake was done here where optional properties put on prototype look good on microbenchmark because the type feedback is unrealistic. |
Yep, I agree. That's why I said exactly that at the very beginning of the thread, and both @jdalton and I have pointed out up-thread that the extreme inlining won't always happen. A while back, I ran most through Bluebird's doxbee test, since that test now includes other reactive libs. It does fairly well. It's faster than many promise implementations, and considerably faster and less memory intensive than the other reactive libs. The test isn't really a good use case for event streams, so I feel it's quite an encouraging result. I figure we can probably do better, though. |
But my point is that it never happens, even with polymorphic inlining support. |
Sure you could. Lodash at least punts to composition of its lower level methods to create others. function thing(a) {
return _(a).map(iteratee).filter(predicate).foo(yada).bar(zada);
} being used. |
So across an entire application you are always using the exact same predicate function and exact same mapper function? Seems absurdly unlikely. |
That's method composition for ya and it happens a lot in JS land. |
Ok composition happens a lot but what does that have to do with anything? Let's say I have app that never uses filtering for anything else other than getting the even numbers out of a stream or something (this alone is already a ridiculous proposition): function evens(stream) {
return stream.filter(function(value) {
return value % 2 === 0;
});
} So because this always passes a different function to The only case where the callback is getting inlined is if it's hoisted out and that no other code anywhere else in the application ever uses |
It provides an example of when such an optimization would kick in. |
It only kicks in if |
I can't speak to v8 off the top of my head, though judging by their scores on certain benchmarks I think they tackle it in some way, but Chakra should inline the iteratee of Speaking of hoisting you may dig https://github.com/codemix/babel-plugin-closure-elimination. |
Thanks, didn't know about that plugin, I dig :) V8 only considers function identities so different identities of same syntactic function entity are considered entirely different in cases like this. But it isn't really the point, applications are going to use far more different kinds of predicates than just a predicate that determines if a number is even or not. |
Yap, @briancavalier and I have already covered that. |
That's just confusing then, you just argued otherwise for 2 hours :P |
Naw. You were trying to take the wind out of the sails of the inlining efforts of |
Inlining of the user callbacks does not apply in any engine because applications need more than 1 (or 4) different predicate functions. And it also has nothing to do with inlining effort since it's the user who provides the callback to be inlined to begin with. So there is still a big misunderstanding :) |
I've already given examples to the contrary.
I get where you're coming from, I've just tried to expand your view a bit. |
@jdalton and @petkaantonov Thank you for the lively discussion :) There's a ton of great information in this thread. I learned a lot and I think anyone who reads it will, too. It seems we've covered the aspects of why most.js does well in the microbenchmarks, as well as why that may or may not translate directly into application performance on various engines. @pozadi Closing this, but if you feel there's a need to reopen, please do. |
Hi!
I really impressed with the performance. Test results are just incredible! And I can fully accept than it much faster that other RP libs. Being author of one, I know how much stuff happen on each event dispatch, and theoretically it can be boiled down to a single function call.
Still I don't understand how it can be faster, and so much faster, than LoDash, which doesn't care about asynchrony, multiple subscribers etc. Does Most do same job as LoDash, i.e. can we use Most instead of LoDash?
Thanks for any insights!
The text was updated successfully, but these errors were encountered: