Skip to content

Perf Testing

Fluent UI Team edited this page Apr 9, 2022 · 26 revisions

Performance testing with flamegraphs is a feature that runs on all PRs opened against Fluent UI React and was introduced with PR #9550. This page provides an overview of this feature.

Sample Performance Test Results Table

Linked from #9516, which made perf improvements to "New" Button components in packages/experiments, is a sample perf test comment:

Component Perf Analysis:

Scenario Master Ticks * PR Ticks *
BaseButton 883 895
BaseButtonNew 3734 2536
DefaultButton 1175 1175
DefaultButtonNew 3241 2039
DetailsRow 8409 8535
DetailsRowNoStyles 6357 6298
DocumentCardTitle 44342 44246
MenuButton 2068 2078
MenuButtonNew 6473 4910
PrimaryButton 1391 1361
PrimaryButtonNew 3658 2427
SplitButton 3845 3847
SplitButtonNew 14086 9225
Toggle 2037 2018
ToggleNew 2553 2485
button 81 70
* Sample counts can vary by up to 30% and shouldn't be used solely for determining regression. Flamegraph links are provided to give a hint on deltas introduced by PRs and potential bottlenecks.

Flamegraph Anatomy

Each sample number entry links to a flamegraph, which looks like the following:

Perf Test Results Intepretation

Since this PR improves perf for new Button components, we expect to see improvements in corresponding components:

Component Perf Analysis:

Scenario Master Ticks * PR Ticks *
BaseButtonNew 3734 2536
DefaultButtonNew 3241 2039
MenuButtonNew 6473 4910
PrimaryButtonNew 3658 2427
SplitButtonNew 14086 9225

We can see a lower number of counts implying improvement for each listed component in the PR. This feature currently does not assess regression due to variance that can occur, particularly in server CI environments (per the table's footnote.)

If we look at Master Ticks for SplitButtonNew, we find the following:

StackView is consuming nearly 25% of render time. The PR actually removes usage of Stack from the Button components, so we expect to see the resulting perf numbers go down. If we review the PR Ticks for SplitButtonNew:

Running Tests Locally

The perf test app and some of its dependencies may not get built with default build commands. Make sure you do a build to perf-test before running any perf tests.

After building to perf-test and its dependencies you can run perf-test from the apps/perf-test directory:

  • yarn just perf-test: Builds perf-test package and runs perf-test scenarios.
  • yarn just run-perf-test: Only runs perf-test scenarios. Assumes perf-test has been built previously.

⚠️ If you modify Fluent UI React source, you must do another build to perf-test to pick up those changes.

⚠️ If you are adding or modifying scenarios you must use yarn just perf-test to build and pick up scenario changes.

Arguments

The perf test script also supports the following optional arguments:

Argument Description
--scenarios Comma separated list of scenario names to test.
--iterations Number of iterations to run for each scenario.

Here is an example of their use:

yarn just perf-test -- --scenarios SplitButton,SplitButtonNew --iterations 1000

Questions

What are flamegraphs?

Flamegraphs are representations of call hierarchies that show time consumed by nested function calls. Perf-test has been set up to repeatedly render a given scenario, generating a flamegraph just for that scenario, making bottlenecks and perf issues easier to see.

What should I do if it looks like my PR has a perf regression?

If you don't expect any perf changes, it's possible that the regression is due to sample variance. Even if overall sample counts change, the call hierarchy and percentages they consume in the flamegraph should be consistent. Viewing the generated flamegraphs for consistency can help reveal any significant changes in behavior.

Variance will also tend to be higher in a server environment. You may get more consistent results locally and can run perf test as described above to confirm results.

What is variance and what causes it?

Variance is fundamentally caused by the sample-based approach that V8 uses (the JavaScript engine that Chrome and Puppeteer use) for generating profiles. Since ticks are at a fixed interval in time, they are subject to aliasing, CPU load and other factors that can generate variable results given the same code under test. Some of this variance has been mitigated by disabling optimizations in V8.

Why are results listed in ticks instead of time units?

Perf Test renders many iterations of a scenario to get more depth in the graphs. Additionally, Puppeteer has been configured to disable optimizations to reduce variance, impacting overall execution time. As a result, showing results in time units would lead to confusion. Instead, results are display in ticks in order to get a qualitative feel for perf deltas on PRs.

Why so many iterations per scenario? How do flamegraphs show more levels than Chrome profiler?

Perf testing generates flamegraphs using a rollup strategy, rolling together the call hierarchy of all iterations of a given scenario. This gives visibility into low level calls that wouldn't be visible when using profiler in Chrome. Function calls that typically run in less than a sample period end up getting hit across many iterations. With 5,000 iterations, lower level function calls will get less than 50 ticks, which means only 1 in 100 function calls are getting sampled. These ticks would have been missed with one iteration and would not have shown up in the flamegraph.

How do I add a scenario test?

You can add a scenario to perf-test similar to the others listed and it will automatically be picked up. Optionally, you can also add your scenario to scenarioNames.ts to give it a more readable name.

Please note that each scenario will add 5-60 seconds to build time (assuming the current 5,000 iteration default holds.) In the future, scenarios may be more selectively filtered for CI integration in order to keep build time manageable.

⚠️ When you add or modify a scenario, you must rebuild perf-test or run yarn run perf-test in order to absorb the new scenario for testing locally.

Future improvements

  • Improve flamegraph readability.
  • Expand scenario testing to measure dynamic performance, such as scrolling and resizing the browser window.
  • Find methods or alternatives for mitigating or filtering out variance for automated regression analysis.
  • Modularize performance test for use in projects outside of Fabric.

What's new

Planning

Process

Usage

Reporting issues

Contributing

Component creation and convergence

Testing

Coding guidelines

Best practices

References

Useful tools

Clone this wiki locally