Fix wait / sleep which is causing many PRs to fail performance tests in CI for reasons unrelated to the PR#77726
Fix wait / sleep which is causing many PRs to fail performance tests in CI for reasons unrelated to the PR#77726danluu wants to merge 3 commits intoWordPress:trunkfrom
Conversation
|
👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @danluu! In case you missed it, we'd love to have you join us in our Slack community. If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information. |
0cf755b to
e51d754
Compare
|
Warning: Type of PR label mismatch To merge this PR, it requires exactly 1 label indicating the type of PR. Other labels are optional and not being checked here.
Read more about Type labels in Gutenberg. Don't worry if you don't have the required permissions to add labels; the PR reviewer should be able to help with the task. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
e51d754 to
72498c6
Compare
dmsnell
left a comment
There was a problem hiding this comment.
these two tests both define isVisibleElement and isReadyElement and they look identical. let’s extract them? I think that will make the diff look much smaller, especially if we define these helpers somewhere in the module global scope so that they aren’t right there in the waitForFunction lines.
This PR was originally created to stop every PR from having a failing CI run due to performance tests timing out at 60 minutes, which was blocking every PR for the #77716 fuzzing project. There was as race with #77725 and that issue solved the problem. However, the approach is different and this PR eliminates the 3s wait that #77725 has in successful cases. This seems to speed up the entire Performance CI run by approximately 2.5 minutes (no doubt there's a lot of noise and the true speedup might be 2 minutes or 3 minutes and not 2.5 minutes).
I originally closed this as a dup, but on seeing that this version has some advantages over the other one, I'm gong to re-open it.
BEGIN AI GENERATED TEXT
The performance job is not timing out because the Site Editor test suite or the
Site Editor itself became 20+ minutes slower. It is timing out because the
visitSiteEditor()Playwright helper can wait a full 60 seconds for a canvasloader that has already appeared and disappeared.
The direct trigger was #77443. That PR did not change Site Editor loading code,
the performance spec, or the
visitSiteEditor()helper. It changed runtimetiming enough to expose an existing race in the helper.
The latent helper bug came from the sequence of earlier helper changes, most
importantly #68667. #68667 made a missed loader a successful fallback path, but
only after spending the full 60 second timeout waiting for the loader to become
visible.
Once #77443 was on trunk, PR performance jobs compared two branches that both
contained the timing trigger. The Site Editor loading test calls
visitSiteEditor()11 times per branch, so one artificial 60 second wait periteration adds about 22 minutes to a full PR performance run.
Relevant PRs and commits
d301cf7d5152: added a Site Editor canvas loader wait to avoidresolving
visitSiteEditor()before large canvas content finished loading.7e24ae4f91ee: limited that wait to Site Editor routes that areexpected to load the canvas. The PR description says the previous behavior
added about 2 minutes to each affected performance test.
ad4da873ce4: movedsetPreferences()before the loader wait. Thismade the helper start waiting for the loader later in the page lifecycle.
4085555f8ac: added the important latent bug. The helper tries towait for
.edit-site-canvas-loader, .edit-site-canvas-spinnerto becomevisible with a 60 second timeout, catches the timeout, and then waits for the
editor content region. If the loader was already gone, the helper succeeds
after wasting 60 seconds.
309dfc2e885: changedcatch ( error )tocatch {}. This wasmechanical and did not change behavior.
0f99b8a8044: rebuilt the Experiments screen. This is the PR thatchanged timing enough for performance CI to hit the existing helper race.
What the helper does
On trunk before the fix,
visitSiteEditor()contains this sequence:The catch block comment says "If the canvas loader is already disappeared, skip
the waiting", but the implementation cannot know that until after
canvasLoader.waitFor( { state: 'visible', timeout: 60_000 } )has alreadytimed out.
The Site Editor performance loading test hits this path 11 times per branch:
The test then records
results.firstBlockfrommetrics.getLoadingDurations().timeSinceResponseEnd, which isperformance.now() - responseEnd. That means the helper's 60 second wait isincluded in the
firstBlockmetric even if the first block was already present.Evidence from CI
The #77443 performance run compared the PR merge commit
cf45676490ae3900632a512f6deaa9ab259e2d54against trunk and still finished,but it was close to the 60 minute workflow limit. Its Site Editor summary showed
a signature 60 second
firstBlockvalue on the PR side while server-side workwas normal:
That rules out a PHP/server/DB slowdown as the main cause of the 60 second
number.
After #77443 landed, later PRs compared PR branches against trunk where both
sides had the timing trigger. For example, the #77675 performance run was
canceled at the 60 minute job timeout while running the second Site Editor
branch:
That is consistent with roughly 11 minutes of artificial loader-wait delay per
branch, plus normal setup/build/test time.
Local reproduction and double-check
I reproduced the behavior with the actual performance test path, not a reduced
direct
page.goto()probe. With instrumentation added around the oldvisitSiteEditor()helper on #77443's PR merge commit, the page was alreadyloaded before the 60 second wait started:
This answers the important question directly: yes, the content was present
before 60 seconds. The iframe already contained 1437 blocks before the helper
started the 60 second loader-visible wait.
I then ran the same instrumented path on the pre-#77443 base commit
90e45267257fc4a4da0095933ecc24fa3bae776d. There the loader was still presentwhen the helper started waiting, so the 60 second timeout was not hit:
This isolates the behavior change to #77443's merge comparison: the helper code
was the same, but the loader was missed after #77443 and not missed before it.
Why #77443 is the trigger but not the actual slowdown
The #77443 diff changes the Experiments screen and related settings plumbing:
It does not change these Site Editor or performance-test files:
So the correct interpretation is:
visitSiteEditor(): a missed loader costs 60 secondsand then still succeeds.
loader was already gone by the time
visitSiteEditor()started waiting forit to become visible.
firstBlock.the job past the workflow's 60 minute timeout.
Fix direction
The helper should not require the loader to appear. It should wait for either:
Then it should verify the loader is absent and the editor content/canvas is
ready. That preserves the intent of #61629 and #61816, while avoiding the
60-second penalty introduced by the fallback behavior from #68667.
END AI GENERATED TEXT
Note that this PR was AI generated (GPT-5.5)