You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're testing a integration of our JS widget with a website we're working with to detect possible regressions. We know that this is considered an anti-pattern and we understand the caveats of doing this. This however caused us to run into this issue and a crash should not be cause by any sort of user error.
We have seen a number of similar issues, but we did not find any solution:
As indicated above, we are running the official docker image (cypress/included:13.6.1), the host runs Ubuntu 20.04 (also happened with 22.04 as a host) as a part of self-hosted GitLab runner infrastructure on AWS (us-east-1). We're using m6i.xlarge instances, but we also tested with AMD-based instances and it did not seem to change anything. The issue happens with a default electron browser. We are recording runs to Cypress Cloud. It does not seem to be out-of-memory condition. This is also not CI runner instance specific – we had instances that subsequent test runs crashed and succeded on the same host VM.
We've been investigating this for some time now and it seems the issue is related to website JS (react?) router changing URL, this in some instances produces "waiting for a new page to load" step and execution enters the visious cycle of "Network.loadingFailed" messages as described below, until it crashes with SIGILL. We were able to adjust our tests so that the URL change does not happen and it reduced number of runs that crashed.
The issue is intermittent in nature and the logs is the only thing we have. This is somewhat reproducible (10-20% of runs suffer this). We run a number of similar tests (that differ only by some numerical parametres) in a single spec – if it happens, it happens in the first spec & test case. Overall it looks like a timing-dependent issue. The log messages from dmesg attribute SIGILL either to Cypress or Chrome_IOThread processes.
My guess is could be actually a two issues – first one, causing test runner(?) to enter Network.loadingFailed cycle, leading to the second one – some low-level buffer overflow that causes SIGILL and subsequent crash.
The text was updated successfully, but these errors were encountered:
Current behavior
Test runner crashes with SIGILL (invalid opcode). This is an intermittent failure – this does not happen for all runs.
Desired behavior
Test runner does not crash and produces a proper test result (either success or failure).
Test runner emits enough logs to understand what happened.
Test code to reproduce
https://github.com/kfigiela/cypress-sigill-issue-repro/tree/main
This happens regularly in our CI enviroment for some percentage of runs. I was also able to reproduce this on a EC2 machine running Ubuntu 22.04 with the following command:
Cypress Version
13.6.1
Node version
v20.9.0
Operating System
Docker (cypress/included:13.6.1), Ubuntu 20.04 (Host), self-hosted GitLab runner @ AWS/us-east-1 (m6i.xlarge, also tested with AMD-based instances)
Debug Logs
Full (verbose) logs:
Starting at line 122808 it registers "waiting for new page to load", "Network.loadingFailed" & related messages are repeated until it finally crashes.
This happens to repeat (with different request ids)
End of log, crash happens. Network errors seem to be the result of crash, not the cause:
dmesg – the two errors are related to separate run attempts:
Other
We're testing a integration of our JS widget with a website we're working with to detect possible regressions. We know that this is considered an anti-pattern and we understand the caveats of doing this. This however caused us to run into this issue and a crash should not be cause by any sort of user error.
We have seen a number of similar issues, but we did not find any solution:
As indicated above, we are running the official docker image (
cypress/included:13.6.1
), the host runs Ubuntu 20.04 (also happened with 22.04 as a host) as a part of self-hosted GitLab runner infrastructure on AWS (us-east-1). We're usingm6i.xlarge
instances, but we also tested with AMD-based instances and it did not seem to change anything. The issue happens with a defaultelectron
browser. We are recording runs to Cypress Cloud. It does not seem to be out-of-memory condition. This is also not CI runner instance specific – we had instances that subsequent test runs crashed and succeded on the same host VM.We've been investigating this for some time now and it seems the issue is related to website JS (react?) router changing URL, this in some instances produces "waiting for a new page to load" step and execution enters the visious cycle of "Network.loadingFailed" messages as described below, until it crashes with SIGILL. We were able to adjust our tests so that the URL change does not happen and it reduced number of runs that crashed.
The issue is intermittent in nature and the logs is the only thing we have. This is somewhat reproducible (10-20% of runs suffer this). We run a number of similar tests (that differ only by some numerical parametres) in a single spec – if it happens, it happens in the first spec & test case. Overall it looks like a timing-dependent issue. The log messages from
dmesg
attribute SIGILL either toCypress
orChrome_IOThread
processes.My guess is could be actually a two issues – first one, causing test runner(?) to enter
Network.loadingFailed
cycle, leading to the second one – some low-level buffer overflow that causes SIGILL and subsequent crash.The text was updated successfully, but these errors were encountered: