Description
Description
On Wednesday all our safari tests suddenly started failing. We are using macos-15-large runner because we need x86 since our tests fail on video assertions on ARM. We are using playwright to run our tests and safari tests just stop suddenly. Mainly when where we need to do something in the browser. No logs, no errors, nothing. Tests just stop and time out.
I noticed that sometimes, 1 out of 50 runs passes like it did before. I started investigating what changed in that particular run since nothing changed from our side and found one difference that is consistent with the failures.
Here are two images from the workflow run:


In the first image you can see that "Runner image provisioner" only contains one line but the second one has multiple lines of information. Whenever we get the first provisioner our tests pass. Whenever we get the second one they fail.
Can someone help me and tell what is the difference between the two so that I can fix our tests.
Platforms affected
- Azure DevOps
- GitHub Actions - Standard Runners
- GitHub Actions - Larger Runners
Runner images affected
- Ubuntu 22.04
- Ubuntu 24.04
- macOS 13
- macOS 13 Arm64
- macOS 14
- macOS 14 Arm64
- macOS 15
- macOS 15 Arm64
- Windows Server 2019
- Windows Server 2022
- Windows Server 2025
Image version and build link
Image: macos-15
Version: 20250623.1531
Is it regression?
yes
Expected behavior
Expected is that tests pass like they did before.
Actual behavior
Tests just stop and time out. The machine itself seems to be working and is not hanged.
Repro steps
No steps to list here.