Skip to content

πŸ” [REPRO – DO NOT MERGE] reproduce early-resource-timings flake on CI#4630

Closed
thomas-lebeau wants to merge 4 commits into
mainfrom
thomas.lebeau/repro-empty-css-flake
Closed

πŸ” [REPRO – DO NOT MERGE] reproduce early-resource-timings flake on CI#4630
thomas-lebeau wants to merge 4 commits into
mainfrom
thomas.lebeau/repro-empty-css-flake

Conversation

@thomas-lebeau
Copy link
Copy Markdown
Collaborator

Motivation

The E2E test rum resources β€Ί retrieve early requests timings (async / npm / bundle variants at test/e2e/scenario/rum/resources.scenario.ts:61) is one of the top flaky tests this week on webkit-pinned β€” 28 / 24 / 21 occurrences over the last 7 days across many branches.

Working theory: WebKit clamps PerformanceResourceTiming timestamps and performance.now() to 1ms resolution. When the entire /empty.css request (a zero-byte response) fits inside one 1ms tick, every relevant timestamp clamps to the same value. The SDK's existing Safari fallback in computeResourceEntryDuration (packages/rum-core/src/domain/resource/resourceUtils.ts:76) is guarded by startTime < responseEnd, which fails when they're equal, so the SDK truthfully emits duration: 0 β€” and the test asserts strictly > 0.

Locally on macOS, 50/50 repeats passed but every captured entry confirmed 1ms clamping. The failure presumably needs CI's faster Linux loopback to make collapse probable. This branch verifies that.

Changes

⚠️ Temporary repro branch β€” must NOT be merged. Closing once we have data.

  • Instrument the early-timings test (test/e2e/scenario/rum/resources.scenario.ts) to log:
    • The raw PerformanceResourceTiming for empty.css (all 12 timestamps + size fields).
    • Four consecutive performance.now() samples to expose clock granularity.
    • The SDK-emitted resource.duration, download, first_byte, size, transfer_size.
  • Narrow .gitlab-ci.yml e2e job to:
    • only webkit-pinned (skip chromium/firefox/etc.)
    • only the retrieve early requests timings test via -g
    • --repeat-each=50 Γ— 3 sub-tests = 150 attempts
    • --retries=0 so each repeat is an independent observation

Test instructions

  1. Wait for the GitLab e2e: [webkit-pinned] job to finish.
  2. In the job log, look for [EARLY-TIMINGS-DEBUG webkit] lines.
  3. Expected confirmation of the hypothesis:
    • nowSamples shows 4 identical integer-ms values (clock clamped to 1ms).
    • On failing attempts: startTime === responseStart === responseEnd in raw= and the corresponding sdk= shows duration: 0.
    • On passing attempts: at least one timestamp differs by β‰₯1ms; SDK shows duration: β‰₯1ms.
  4. Failure count (compared to the local 0/50) tells us whether CI's faster loopback is the differentiator.

Checklist

  • Tested locally (50/50 passed β€” that's the point of the CI run)
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

Temporary repro branch β€” DO NOT MERGE.

- Add raw-timing + clock-resolution debug logging to the `retrieve early
  requests timings` test, to confirm WebKit's 1ms clamping is responsible
  for the duration=0 / download.start=0 failures.
- Narrow GitLab CI e2e job to only `webkit-pinned`, only the relevant
  `-g "retrieve early requests timings"` filter, `--repeat-each=50`,
  and `--retries=0` so each repeat is an independent attempt.

The hypothesis is that WebKit clamps PerformanceResourceTiming
timestamps and performance.now() to 1ms. When the entire /empty.css
request fits in a single tick (more likely on CI's faster Linux
loopback than on local macOS), startTime === responseEnd and the SDK
falls through `(duration === 0 && startTime < responseEnd)` to return 0.
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented May 15, 2026

Tests

πŸŽ‰ All green!

❄️ No new flaky tests detected
πŸ§ͺ All tests passed

🎯 Code Coverage (details)
β€’ Patch Coverage: 100.00%
β€’ Overall Coverage: 76.96% (+0.00%)

This comment will be updated automatically if new data arrives.
πŸ”— Commit SHA: 65c7e29 | Docs | Datadog PR Page | Give us feedback!

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 15, 2026

Bundles Sizes Evolution

πŸ“¦ Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 169.51 KiB 169.51 KiB 0 B 0.00% βœ…
Rum Profiler 5.97 KiB 5.97 KiB 0 B 0.00% βœ…
Rum Recorder 21.23 KiB 21.23 KiB 0 B 0.00% βœ…
Logs 54.70 KiB 54.70 KiB 0 B 0.00% βœ…
Rum Slim 127.85 KiB 127.85 KiB 0 B 0.00% βœ…
Worker 22.99 KiB 22.99 KiB 0 B 0.00% βœ…
πŸš€ CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.0019 0.0018 -5.26%
RUM - add action 0.0104 0.0099 -4.81%
RUM - add error 0.0126 0.0092 -26.98%
RUM - add timing 0.0007 0.0004 -42.86%
RUM - start view 0.0109 0.0095 -12.84%
RUM - start/stop session replay recording 0.0008 0.0007 -12.50%
Logs - log message 0.0176 0.0139 -21.02%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 42.69 KiB 38.52 KiB -4.17 KiB
RUM - add action 67.95 KiB 67.82 KiB -138 B
RUM - add timing 37.49 KiB 36.91 KiB -598 B
RUM - add error 73.15 KiB 72.20 KiB -974 B
RUM - start/stop session replay recording 41.06 KiB 46.22 KiB +5.16 KiB
RUM - start view 492.59 KiB 488.16 KiB -4.43 KiB
Logs - log message 55.59 KiB 56.66 KiB +1.07 KiB

πŸ”— RealWorld

- Run the early-timings test with --workers=1 so stdout isn't interleaved.
- Reduce --repeat-each to 30 (3 sub-tests Γ— 30 = 90 attempts) to keep CI
  wall-clock reasonable now that the run is single-threaded.
- Tag each debug line with the title + repeat index so raw=/sdk= pairs are
  always matchable in the log.
Add chromium to the matrix to compare timestamp clamping with webkit-pinned.
Expectation: chromium will not flake (no 1ms-tick collapse) because it
exposes higher-precision timestamps for same-origin resources.
@thomas-lebeau
Copy link
Copy Markdown
Collaborator Author

Repro confirmed β€” see analysis in PR description. Closing without merging; fix follows in a separate PR.

@github-actions github-actions Bot locked and limited conversation to collaborators May 15, 2026
WebKit clamps `PerformanceResourceTiming` timestamps and `performance.now()`
to 1ms (privacy/Spectre mitigation). For the zero-byte `/empty.css` served
over fast (Linux CI) loopback, the entire request occasionally completes in
a single 1ms tick, collapsing every timestamp to the same value. The SDK
then truthfully emits `duration: 0` and `download.start: 0`, and the strict
`> 0` assertion in `expectToHaveValidTimings` fails β€” top flaky E2E this
week (28/24/21 occurrences for the async/npm/bundle variants).

Adding a 50ms server-side delay guarantees `startTime < responseEnd` even
under 1ms clamping. Keeps `expectToHaveValidTimings` strict, so a real
regression that zeroed duration would still be caught on every browser.
@thomas-lebeau
Copy link
Copy Markdown
Collaborator Author

Re-running the same repro setup (webkit-pinned + chromium, repeat-each=30, retries=0) with the 50ms /empty.css delay applied to verify the fix lands 0 failures.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant