Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layout algorithm decoupling, Sizing Constraints & Perf Improvements #246

Merged
merged 46 commits into from
Nov 23, 2022

Conversation

nicoburns
Copy link
Collaborator

@nicoburns nicoburns commented Nov 19, 2022

Objective

Benchmark Results

Pay attention to units: there are all of seconds, milliseconds and microseconds in here.

Warning

Note for anyone reading this in the future: the absolute values in these benchmarks turned to be bunk due to flaw in the measuring methodology. However the relative improvement ended up being similar. See benches folder for up to date benchmark results.

Benchmark main (w/new benchmarks) This Branch % change
big trees/10_000 nodes (2-level hierarchy) 44.472 µs 45.272 µs no change
big trees/100_000 nodes (2-level hierarchy) 4.2695 ms 4.5856 ms no change
big trees/100_000 nodes (7-level hierarchy) 2.8302 s 4.2983 ms -99.738%
big trees/4000 nodes (12-level hierarchy)) 10.988 s 17.138 µs -100.000%
big trees/10_000 nodes (14-level hierarchy) 46.336 s 3.3227 ms -99.992%
big trees/100_000 nodes (17-level hierarchy) gave up after 20 mins 10.998 s -
deep hierarchy/build 701.74 ns 696.26 ns no change
deep hierarchy/single 6.8114 µs 6.3035 µs no change
deep hierarchy/relayout 4.1475 µs 2.2235 µs -46.389%
generated benchmarks 206.40 µs 208.63 µs no change

Context

Code is still WIP. Cleanup needed in a number of places. It is passing all tests though.

Feedback wanted

Nothing specific, but general feedback welcome.

@alice-i-cecile alice-i-cecile added code quality Make the code cleaner or prettier. performance Layout go brr labels Nov 19, 2022
@alice-i-cecile
Copy link
Collaborator

Wow, I'm kind of at a loss for words with those benchmark results. I'm going to be prioritizing getting this reviewed and merged: let me know when you feel it's ready.

@nicoburns
Copy link
Collaborator Author

Wow, I'm kind of at a loss for words with those benchmark results. I'm going to be prioritizing getting this reviewed and merged: let me know when you feel it's ready.

Same! The majority of the improvement came from a one-line change too! I'm busy tomorrow, but I'll see if I can find some time to get this into a mergeable state on Sunday :)

Copy link
Collaborator

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream of consciousness thoughts as I read:

  1. Jesus those perf numbers. Algorithmic complexity really does matter eh?
  2. AvailableSpace::Definite is really nice! Very clear, very explicit
  3. Size::NONE -> Size::MAX_CONTENT definitely needs a migration guide. More clear though!
  4. AvailableSpace instead of an Option<f32> is so much clearer.
  5. Ditto RunMode and SizingMode. Great docs too!
  6. The methods in debug.rs feel like they will be genuinely useful to users: I'd make them properly pub.

Overall, this does a ton of the things that the team has wanted to do for this library: stronger types, better docs, dramatically better performance, foundations for multiple layout algorithms. I'm looking forward to merging this when it's ready!

Plenty to nitpick (missing doc links, commented out code), but I trust you'll get around to those :) Let me know when you're ready for a final review pass!

Copy link
Collaborator

@Weibye Weibye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ❤️ RunMode, SizingMode and AvailableSpace. It really helps readability :)
  • The debug seems particularly useful! We should further build on that both for ourselves and end-users.

Really appreciate the work!

src/geometry.rs Show resolved Hide resolved
src/layout.rs Show resolved Hide resolved
src/layout.rs Outdated Show resolved Hide resolved
src/geometry.rs Show resolved Hide resolved
src/data.rs Show resolved Hide resolved
src/compute/flexbox.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@TimJentzsch TimJentzsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing performance improvements!

src/compute/leaf.rs Outdated Show resolved Hide resolved
src/compute/leaf.rs Outdated Show resolved Hide resolved
src/debug.rs Outdated Show resolved Hide resolved
src/debug.rs Outdated Show resolved Hide resolved
src/layout.rs Outdated Show resolved Hide resolved
@alice-i-cecile alice-i-cecile added this to the 0.3 milestone Nov 20, 2022
@nicoburns
Copy link
Collaborator Author

So I've done some rough benchmarking against yoga by:

  1. Recreating the "Huge nested layout" from https://github.com/facebook/yoga/blob/578d197dd6652225b46af090c0b46471dc887361/javascript/tests/Benchmarks/YGBenchmark.js in our benchmark suite.

  2. Recreating it using the node.js bindings for yoga-layout (the yoga-layout-prebuilt package on NPM because I couldn't get it to build easily using the yoga-layout package. This will add a small overhead to call into native, but that's only a small constant overhead (a single function call) and as demonstrated by the 10 node benchmark that can't be greater than 45µs, so I think it's still pretty fair. I didn't use a benchmarking framework, but I manually ran it several times and the results were pretty similar each time.

Results

Benchmark Yoga Taffy
big trees/10 nodes (1-level hierarchy) 45.1670 µs 34.110 ns
big trees/100 nodes (2-level hierarchy) 134.1250 µs 341.80 ns
big trees/1_000 nodes (3-level hierarchy) 1.2221 ms 3.8351 µs
big trees/10_000 nodes (4-level hierarchy) 13.8672 ms 37.551 µs
big trees/100_000 nodes (5-level hierarchy) 141.5307 ms 1.7385 ms
big trees/1_000_000 nodes (6-level hierarchy) error* 44.145 ms

* But in fairness to yoga, the error is "please increase the memory limit" and the equivalent taffy benchmark was using much more memory (6gb+) than yoga's limit of 134mb. I'd like to run taffy without criterion, to get a better idea of how much memory it uses in real-world usage. Perhaps we could also try https://docs.rs/dhat/latest/dhat/

Conclusions

At least on this benchmark we seem to be quite a bit faster than yoga. Although I'm a little worried that it seems a bit too good to be true.

Code for yoga benchmarks

package.json

{
  "name": "layout-benchmark",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "private": false,
  "dependencies": {
    "yoga-layout-prebuilt": "^1.10.0"
  }
}
index.js

const Yoga = require('yoga-layout-prebuilt');

function buildTreeLevel(parent, nodesPerLevel, remainingLevels) {

for (var i = 0; i < nodesPerLevel; i++) {
var child = Yoga.Node.create();
child.setFlexGrow(1);
child.setWidth(10);
child.setHeight(10);
parent.insertChild(child, 0);

if (remainingLevels > 1) {
  buildTreeLevel(child, nodesPerLevel, remainingLevels - 1);
}

}

}

function createRoot(nodesPerLevel, levels) {
var root = Yoga.Node.create();

buildTreeLevel(root, nodesPerLevel, levels);

return root;
}

function benchmark(cb) {
let start = performance.now();

cb();

let end = performance.now();

return end - start;
}

function deepTreeBench(nodesPerLevel, levels, print = true) {
let root = createRoot(nodesPerLevel, levels);
let time = benchmark(() => root.calculateLayout(Yoga.UNDEFINED, Yoga.UNDEFINED, Yoga.DIRECTION_LTR));
if (print) {
if (time < 1) {
console.log(Nodes: ${Math.pow(nodesPerLevel, levels)} ${(time*1000).toFixed(4)} µs);
} else {
console.log(Nodes: ${Math.pow(nodesPerLevel, levels)} ${time.toFixed(4)} ms);
}
}
root.freeRecursive();
}

// Initial run seems to have a fixed ~30ms overhead, so we run once and ignore the result.
deepTreeBench(10, 4, false);

// Benchmark at 10 through 100,000 nodes
deepTreeBench(10, 1);
deepTreeBench(10, 2);
deepTreeBench(10, 3);
deepTreeBench(10, 4);
deepTreeBench(10, 5);
deepTreeBench(10, 6);

@alice-i-cecile
Copy link
Collaborator

Admittedly, yoga also implements parts of the flexbox spec we're ignoring. I'd avoid publicizing it widely until we close that gap.

@nicoburns
Copy link
Collaborator Author

@alice-i-cecile I now consider this ready for review. I had planned to add a LayoutAlgorithm trait, however this has ended up being more involved than I expected (potentially having some tricky design tradeoffs), so I now think this would be best off in a separate PR (and tbh, may not be top of my list to do next).

P.S. You've added the 0.3 milestone to this PR, but I believe we are yet to release a 0.2 version so it probably ought to be that? On that note, perhaps it would make sense to start gearing up for a release (release notes need a bit of work I think!) once this lands? Seems to me that this, along with the cumulative changes already on main would be worth getting out to people...

@nicoburns
Copy link
Collaborator Author

Admittedly, yoga also implements parts of the flexbox spec we're ignoring.

Interesting. Do you have a list in your head of things that they implement that we don't?

I'd avoid publicizing it widely until we close that gap.

I feel like it might be worth calling out in our release notes, but with the explicit caveat that we're not yet that confident in our benchmarks and would appreciate scrutiny from 3rd parties. So long as we don't come across as showing off or putting others down I think we should be alright sharing numbers?

@nicoburns nicoburns marked this pull request as ready for review November 22, 2022 02:10
@TimJentzsch
Copy link
Collaborator

Honestly, I'm not sure how useful a direct comparison between yoga and taffy is, since they use completely different languages.
I would only see value in a comparison of yoga and JS bindings for taffy, then you could say "just replace yoga with WASM taffy and you'll get more performance". Otherwise I think it's more like comparing apples with oranges.

@alice-i-cecile
Copy link
Collaborator

You've added the 0.3 milestone to this PR, but I believe we are yet to release a 0.2 version so it probably ought to be that? On that note, perhaps it would make sense to start gearing up for a release (release notes need a bit of work I think!) once this lands? Seems to me that this, along with the cumulative changes already on main would be worth getting out to people...

Agreed, let's ship it.

WRT missing features, I think gap was the big one? We've been tracking this in the issue tracker.

@nicoburns nicoburns changed the title WIP: Layout algorithm trait, Sizing Constraints & Perf Improvements Layout algorithm decoupling, Sizing Constraints & Perf Improvements Nov 22, 2022
src/debug.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really exceptional work. Two things to add to the release notes (see the comments), then I'll merge this in!

@nicoburns
Copy link
Collaborator Author

Two things to add to the release notes (see the comments)

Added :)

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
@alice-i-cecile
Copy link
Collaborator

Awesome work! Let's get gap implemented and then cut a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality Make the code cleaner or prettier. performance Layout go brr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow performance with deep hierachies
4 participants