Perf Testing

Performance testing with flamegraphs is a feature that runs on all PRs opened against Fluent UI React and was introduced with PR #9550. This page provides an overview of this feature.

Sample Performance Test Results Table

Linked from #9516, which made perf improvements to "New" Button components in packages/experiments, is a sample perf test comment:

Component Perf Analysis:

Scenario	Master Ticks *	PR Ticks *
BaseButton	883	895
BaseButtonNew	3734	2536
DefaultButton	1175	1175
DefaultButtonNew	3241	2039
DetailsRow	8409	8535
DetailsRowNoStyles	6357	6298
DocumentCardTitle	44342	44246
MenuButton	2068	2078
MenuButtonNew	6473	4910
PrimaryButton	1391	1361
PrimaryButtonNew	3658	2427
SplitButton	3845	3847
SplitButtonNew	14086	9225
Toggle	2037	2018
ToggleNew	2553	2485
button	81	70

* Sample counts can vary by up to 30% and shouldn't be used solely for determining regression. Flamegraph links are provided to give a hint on deltas introduced by PRs and potential bottlenecks.

Flamegraph Anatomy

Each sample number entry links to a flamegraph, which looks like the following:

Perf Test Results Intepretation

Since this PR improves perf for new Button components, we expect to see improvements in corresponding components:

Component Perf Analysis:

Scenario	Master Ticks *	PR Ticks *
BaseButtonNew	3734	2536
DefaultButtonNew	3241	2039
MenuButtonNew	6473	4910
PrimaryButtonNew	3658	2427
SplitButtonNew	14086	9225

We can see a lower number of counts implying improvement for each listed component in the PR. This feature currently does not assess regression due to variance that can occur, particularly in server CI environments (per the table's footnote.)

If we look at Master Ticks for SplitButtonNew, we find the following:

StackView is consuming nearly 25% of render time. The PR actually removes usage of Stack from the Button components, so we expect to see the resulting perf numbers go down. If we review the PR Ticks for SplitButtonNew:

Running Tests Locally

The perf test app and some of its dependencies may not get built with default build commands. Make sure you do a build to perf-test before running any perf tests.

After building to perf-test and its dependencies you can run perf-test from the apps/perf-test directory:

yarn just perf-test: Builds perf-test package and runs perf-test scenarios.
yarn just run-perf-test: Only runs perf-test scenarios. Assumes perf-test has been built previously.

⚠️ If you modify Fluent UI React source, you must do another build to perf-test to pick up those changes.

⚠️ If you are adding or modifying scenarios you must use yarn just perf-test to build and pick up scenario changes.

Arguments

The perf test script also supports the following optional arguments:

Argument	Description
`--scenarios`	Comma separated list of scenario names to test.
`--iterations`	Number of iterations to run for each scenario.

Here is an example of their use:

yarn just perf-test -- --scenarios SplitButton,SplitButtonNew --iterations 1000

Questions

What are flamegraphs?

Flamegraphs are representations of call hierarchies that show time consumed by nested function calls. Perf-test has been set up to repeatedly render a given scenario, generating a flamegraph just for that scenario, making bottlenecks and perf issues easier to see.

What should I do if it looks like my PR has a perf regression?

If you don't expect any perf changes, it's possible that the regression is due to sample variance. Even if overall sample counts change, the call hierarchy and percentages they consume in the flamegraph should be consistent. Viewing the generated flamegraphs for consistency can help reveal any significant changes in behavior.

Variance will also tend to be higher in a server environment. You may get more consistent results locally and can run perf test as described above to confirm results.

What is variance and what causes it?

Variance is fundamentally caused by the sample-based approach that V8 uses (the JavaScript engine that Chrome and Puppeteer use) for generating profiles. Since ticks are at a fixed interval in time, they are subject to aliasing, CPU load and other factors that can generate variable results given the same code under test. Some of this variance has been mitigated by disabling optimizations in V8.

Why are results listed in ticks instead of time units?

Perf Test renders many iterations of a scenario to get more depth in the graphs. Additionally, Puppeteer has been configured to disable optimizations to reduce variance, impacting overall execution time. As a result, showing results in time units would lead to confusion. Instead, results are display in ticks in order to get a qualitative feel for perf deltas on PRs.

Why so many iterations per scenario? How do flamegraphs show more levels than Chrome profiler?

Perf testing generates flamegraphs using a rollup strategy, rolling together the call hierarchy of all iterations of a given scenario. This gives visibility into low level calls that wouldn't be visible when using profiler in Chrome. Function calls that typically run in less than a sample period end up getting hit across many iterations. With 5,000 iterations, lower level function calls will get less than 50 ticks, which means only 1 in 100 function calls are getting sampled. These ticks would have been missed with one iteration and would not have shown up in the flamegraph.

How do I add a scenario test?

You can add a scenario to perf-test similar to the others listed and it will automatically be picked up. Optionally, you can also add your scenario to scenarioNames.ts to give it a more readable name.

Please note that each scenario will add 5-60 seconds to build time (assuming the current 5,000 iteration default holds.) In the future, scenarios may be more selectively filtered for CI integration in order to keep build time manageable.

⚠️ When you add or modify a scenario, you must rebuild perf-test or run yarn run perf-test in order to absorb the new scenario for testing locally.

Future improvements

Improve flamegraph readability.
Expand scenario testing to measure dynamic performance, such as scrolling and resizing the browser window.
Find methods or alternatives for mitigating or filtering out variance for automated regression analysis.
Modularize performance test for use in projects outside of Fabric.

What's new

FAQ - Fabric and Stardust to Fluent UI
@fluentui/react Version 9
- Release Schedule
- Component Roadmap
@fluentui/react Version 8
Contributing to the 7.0 branch
How to apply themes (version 7/8)

Planning

Process

Planning and development process (for work by the core team)
- Project template
- Project retrospective template
Conducting meetings Style guide
Keeping up with review requests
RFC review process

Usage

Setup (configuring your environment)
Fluent UI React version 7/8

Reporting issues

Contributing

CLA
Overview
Repo structure
Development process
- Setup (and development basics)
- Development workflow
- Using local (unpublished) version of the lib with a local React app
- Build commands (master, 7.0)
- Change files
Contributing to previous versions
API Extractor
Build command changes made in early 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly