Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The document-policy: js-profiling header adds 3% overhead #79

Open
paulirish opened this issue Sep 20, 2023 · 4 comments
Open

The document-policy: js-profiling header adds 3% overhead #79

paulirish opened this issue Sep 20, 2023 · 4 comments

Comments

@paulirish
Copy link

paulirish commented Sep 20, 2023

(Admittedly, this is implementation-specific, but.. this implementation quirk is the reason for this spec'd header..)

I heard some concerns about the overhead of when document-policy: js-profiling is set. (Before Profiler.start() has been called). I confirmed ~3% overhead during the Speedometer benchmark.

Background

The CL that adds the header in Chromium says:

[when the] document policy is alive, pre-warm the V8 profiler. This significantly reduces profiler startup time by ensuring that compiled code is built with the appropriate metadata.

This pre-warming flips on V8 EagerLogging mode which tracks execution contexts, scripts, initializes V8 profiler instances, and quite a few things. (Later, once theProfiler.start() is called, these V8 profiler instances call CPUProfiler::startProfiling, and properly collect samples.

I do know that a fresh non-warmed CpuProfiler::StartProfiling cost can be kinda big. (DevTools now shows this as 'Profiling Overhead', but 'V8 Sampling Profiler Startup' would be a more accurate name.) The cost is ~50ms on a loaded theverge.com page. Meanwhile if it's started at the beginning of pageload, it's <1ms.

I'm assuming that pre-warming with EagerLogging was selected to avoid the risk of this long task occurring at a sensitive time.

Overhead Measurement

After some quick benchmarking environment setup, I grabbed some numbers..
I ran Speedometer 2.1 a couple times, in 3 different setups. One runs' results look like this:

image (1)

Baseline (no policy):
Arithmetic Mean: 127.8 ‡ 1.1 (0.83%)
Arithmetic Mean: 128 ‡ 1.7 (1.3%)

Policy enabled (yeah, just the response header):
Arithmetic Mean: 124 ‡ 1.5 (1.2%)
Arithmetic Mean: 125 ‡ 1.5 (1.2%)

Policy enabled + Profiler started:
Arithmetic Mean: 114 ‡ 1.2 (1.1%)
Arithmetic Mean: 117 ‡ 2.3 (2.0%)

This looks like 3% overhead with EagerLogging. And 10% overhead with profiler started.
(Of course I'd love to see if someone's run similar tests with more rigor)

Developer usecases

I see four styles of use of the Profiler:

  1. A sampled audience capture from load. 1% of sessions get extra telemetry collected, and this header is set for them. Profiler is invoked at the start of page load.
    • If profiler is started early, there's not much point of this EagerLogging pre-warm.
  2. A sampled audience capture of specific interactions. (Presumably the facebook.com usecase?). 1% of sessions get the header. Profiler isn't invoked at pageload, but will be invoked later to capture other SPA navigations, interactions of interest...
    • Here, the EagerLogging mode trades off some overhead for a smaller task at start(). Makes sense.
  3. A 'report slow stuff' button. This whole investigation was motivated by a partner who wants to offer a feature where users can report sluggishness and hit a button to 'record' the problematic scenario. So, we'd want the Profiler available 100% of the time, even if it is only invoked <1%.
    • The EagerLogging overhead is prohibitive to this approach. A long task at profiler startup is fine, though.
  4. Sampled audience capture of specific interactions (but very basic). Developers/teams with less sophisticated setup may not have the infra to coordinate conditional response headers. So they'd prefer to just keep the header on for everyone and make the Profiling choice completely clientside.
    • The overhead here will also be disruptive, depending on how important the overhead/observability tradeoff is to them..
  5. random/idle tracing, with the goal of obtaining an aggregate view of overall page execution. (As suggested by @acomminos below)

EagerLogging pre-warm is moot for 1, a win for 2, and a loss for 3, 4.


Do folks have thoughts on how we can help the 3 & 4 developers?

@yoavweiss floated a possibility.. we have two modes: the eager prewarm mode and a lazy on-demand one. Seems attractive. Though I also wonder if there's another resolution to case 2 above..

@JonasBa
Copy link

JonasBa commented Sep 22, 2023

@paulirish It would be helpful if you could share how the overhead was measured so that others can attempt to replicate it and share their findings.

Afaik the overhead varies between applications as it depends on the amount of heap objects allocated in the isolate at that point in time (@nornagon optimized), but having more measurements here would be nice.

A 'report slow stuff' button. This whole investigation was motivated by a partner who wants to offer a feature where users can report sluggishness and hit a button to 'record' the problematic scenario. So, we'd want the Profiler available 100% of the time, even if it is only invoked <1%.
The EagerLogging overhead is prohibitive to this approach. A long task at profiler startup is fine, though.

There was a similar idea (cant recall by who) to enable profiling based on a longtask occurrence heuristic.

@yoavweiss
Copy link
Collaborator

Having a dual mode makes sense to me.

^^ @acomminos - I'd love your thoughts.

@acomminos
Copy link
Collaborator

Dual mode is an interesting idea, and I like the framings you've provided @paulirish!

While interactivity is a big concern with enabling on-demand profiler usage, I'm also a bit worried about this potentially being a side channel for reading the overall size of the heap.

If we could perform the warmup work asynchronously (and make it awaitable), I think that would make a dent in some of the cases where we might've otherwise incurred wasteful overhead through conditional invocation, while still granting us levers for side channel mitigation. UAs might also opt to perform that task at a time when the event loop is likely to be idle, helping interactivity as well.

Curious about folks' thoughts on an API with these semantics (maybe not necessarily this shape):

await Profiler.warmup(); // Engine kicks off EagerLogging on a (potentially lower priority) task.
const profiler = new Profiler(); // Does not block, we've warmed up! \o/
  • For case 3, the "Report slow" button would simply await on the prewarming action to finish before enabling the user to go ahead.
  • For case 4, we might not have the ability to intercept interactions in a timely fashion that occurred early in the page lifecycle- but we get the ability to conditionally incur the profiling overhead at runtime, rather than in the HTTP response (likely reducing the performance burden of the API for the developers you're describing).

This would also be helpful for a 5th case that we've heard has been useful: random/idle tracing, with the goal of obtaining an aggregate view of overall page execution. This seems like it has similar considerations to cases 3 and 4 (in that the long task execution is less of a concern), but is a little less latency sensitive (i.e. we care even less if the profiler is slow to start up, since we don't have a specific interaction to report on).

Thoughts?

@paulirish
Copy link
Author

paulirish commented Sep 28, 2023

@paulirish It would be helpful if you could share how the overhead was measured so that others can attempt to replicate it and share their findings.

Sure.

I tried replicating my speedometer results and now can't measure a significant difference between baseline and with the header set. However, I also realize that the actual work in Speedometer happens within an iframe, so it wasn't a great test case anyhow.

I repeated things with the Octane 2.0 benchmark. (Admittedly I can't expect a case that would demonstrate overhead more, given that it's full throttle, non-stop JS)

Repro:

  • git clone https://github.com/chromium/octane/
  • start an http server
  • run chromium/chrome with a fresh profile.
    • I used these flags: --disable-features=Translate,OptimizationHints,MediaRouter --disable-extensions --disable-component-extensions-with-background-pages --disable-background-networking --disable-component-update --disable-client-side-phishing-detection --disable-sync --metrics-recording-only --disable-default-apps --mute-audio --no-default-browser-check --no-first-run --disable-backgrounding-occluded-windows --disable-renderer-backgrounding --disable-background-timer-throttling --disable-ipc-flooding-protection --password-store=basic --use-mock-keychain --force-fieldtrials="*BackgroundTracing/default/" --user-data-dir=/tmp/TMPDIR http://localhost:PORT
  • click Start Octane.
  • relaunch new invocation with fresh profile for each run. collect multiple runs
  • Then, add the document-policy: js-profiling response header. Repeat.

Results

octane results (5 runs) Geo mean Median
baseline 44953 45507 45302 45600 45242 45320 45302
w/ doc-policy: jsprof header 43839 44554 44056 44056 44251 44151 44056
Difference 2.58% 2.75%

Afaik the overhead varies between applications as it depends on the amount of heap objects allocated in the isolate at that point in time

Yes that matches up with what I've seen. I'd expect both the EagerLogging ongoing overhead and (cold) profiler startup cost to scale with heap size.


(@nornagon optimized) [the profiler startup time]

From the CL description:

For some pages (I tested the Slack app), this can improve profiler startup time by ~4x (~3.5 sec -> ~0.9 sec).

Huh. This landed well after the header for EagerLogging was introduced.

I wonder if this work mitigates the need for it.
Perhaps we need some more measurements of the profiler startup cost?


EDIT (Jan 2024): There's a PR up to remove the response header requirement: #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants