[test optimization] Use duration buckets for playwright EFD retries#8289
Conversation
Overall package sizeSelf size: 5.79 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 16661a1 | Docs | Datadog PR Page | Give us feedback! |
4822aa8 to
2379d30
Compare
5ee02f1 to
20768e7
Compare
BenchmarksBenchmark execution time: 2026-05-08 12:14:00 Comparing candidate commit 34cca90 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1737 metrics, 107 unstable metrics. |
d4fc7a8 to
bae88c2
Compare
9049a13 to
ef86c88
Compare
ef86c88 to
7ff134e
Compare
cce0cac to
16661a1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16661a1654
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (isEfdManagedTest && !test._ddIsEfdRetry && !efdRetryCountByTestFqn.has(testFqn)) { | ||
| const testResult = results.at(-1) | ||
| const duration = testResult?.duration > 0 ? testResult.duration : performance.now() - test._ddStartTime | ||
| const retryCount = getEfdRetryCount(duration, getTestEfdSlowTestRetries(test)) | ||
| efdRetryCountByTestFqn.set(testFqn, retryCount) |
There was a problem hiding this comment.
Include project in EFD duration keys
When the same Playwright test is run in multiple projects/browsers, testFqn is only the file plus title, so the first project's duration bucket is stored here and every later project skips recomputing its own retry count. A slow browser can therefore suppress retries for a fast browser (or a fast browser can force extra retries for a slow one), because browserName/project id is not part of the key used by efdRetryCountByTestFqn and efdSlowAbortedTests.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3cb844685a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3cb844685a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (test.repeatEachIndex > 1) { | ||
| return test.repeatEachIndex - 1 | ||
| } |
There was a problem hiding this comment.
Do not infer EFD retries from native repeatEachIndex
When a user runs Playwright with native repeats and EFD is enabled for a new/impacted test, any repeated execution whose repeatEachIndex is greater than 1 is treated as a Datadog EFD retry even if it was created by Playwright's repeatEach/--repeat-each rather than by this instrumentation. Since the EFD key does not include the native repeat index, those user-requested repeats wait for the first execution's duration bucket and can then be marked/skipped by shouldSkipEfdRetry(), so scenarios like --repeat-each=3 no longer execute all requested repeats correctly.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19926dea34
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
5054a16 to
34cca90
Compare
What does this PR do?
Updates Playwright Early Flake Detection clone execution to use the
slow_test_retriesduration buckets returned by the settings API.Propagates the retry thresholds to Playwright workers, and marks tests with
test.early_flake.abort_reason=slowwhen the selected bucket aborts additional retries.Motivation
Jest, Mocha, Cucumber, and Vitest already limit EFD retry attempts based on the duration of the first test execution. Playwright should apply the same API-driven retry bucket behavior so slow tests do not keep scheduling unnecessary EFD retries.
Additional Notes
Stack: 4/5. This is now the base PR for the remaining Cypress PR after #8288 merged.