Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vite-dev-server): Retry dynamic importing of support file and spec file when they fail during CT run with Vite #26202

Closed
wants to merge 7 commits into from

Conversation

astone123
Copy link
Contributor

Additional details

A couple of things to note:

  • This will only affect imports of the component support file and spec file
  • This is only for component tests that are configured to use Vite

When Vite CT tests run, we dynamically import the component support file and spec file modules. Sometimes, seemingly when tests are run on under-resourced machines (like in CI, specifically Github Actions), when we go to import these modules, they aren't ready yet.

In order to make test runs more stable, this PR introduces a strategy to re-try the dynamic imports specifically of the component support file and the spec file itself up to 3 times (with 2 seconds in between each attempt) before eventually failing if it can't import the module.

This is based on the assumption that these import errors are happening because of network issues and/or slow machines, so if we re-try the import, it should be successful on one of the subsequent attempts, allowing the test to continue. In the event that the module does not exist, after the last attempt we will throw the error.

I got the idea for this approach from this article that was linked in this comment on the Vite issue related to this problem.

It's definitely not the best solution, but I wasn't able to figure out what exactly was causing these imports to fail, since according to the debug logs I looked at, these modules should have been available at the time of import. To me, this suggests a complex race condition is happening that is being caused by lack of machine resources.

Hopefully we can add this as a temporary solution to stabilize user's CI runs until we find a better solution.

Steps to test

This is very difficult to test because it doesn't always happen, and when it does it's always in a CI run. I added a test to vite-dev-server.cy.ts which simulates a module not being available on the first try by using cy.intercept to mock a 404 response to the attempt to import the module. Then on the second attempt, the module is successfully imported and the test succeeds.

How has the user experience changed?

This should not affect the user's experience except that their CI runs should be more stable, even on machines with lower resources.

PR Tasks

@astone123 astone123 self-assigned this Mar 23, 2023
@astone123 astone123 changed the title fix: Retry dynamic importing of support file and spec file when they fail during CT run with Vite fix(vite-dev-server): Retry dynamic importing of support file and spec file when they fail during CT run with Vite Mar 23, 2023
@cypress
Copy link

cypress bot commented Mar 23, 2023

28 flaky tests on run #44943 ↗︎

0 26891 1281 0 Flakiness 28

Details:

add comment and changelog entry
Project: cypress Commit: 243502ae98
Status: Passed Duration: 20:04 💡
Started: Mar 24, 2023 4:33 PM Ended: Mar 24, 2023 4:53 PM
Flakiness  create-from-component.cy.ts • 2 flaky tests • app-e2e

View Output Video

Test Artifacts
... > runs generated spec Output Screenshots Video
... > runs generated spec Output Screenshots Video
Flakiness  runner/runner.ui.cy.ts • 1 flaky test • app-e2e

View Output Video

Test Artifacts
src/cypress/runner > reporter interaction > user can stop test execution Output Screenshots Video
Flakiness  specs_list_latest_runs.cy.ts • 1 flaky test • app-e2e

View Output Video

Test Artifacts
App/Cloud Integration - Latest runs and Average duration > when no runs are recorded > shows placeholders for all visible specs Output Screenshots Video
Flakiness  cypress-origin-communicator.cy.ts • 1 flaky test • app-e2e

View Output Video

Test Artifacts
Cypress In Cypress Origin Communicator > cy.origin passivity with app interactions > passes upon test reload mid test execution Output Screenshots Video
Flakiness  commands/net_stubbing.cy.ts • 1 flaky test • 5x-driver-firefox

View Output Video

Test Artifacts
network stubbing > intercepting request > can delay and throttle a StaticResponse Output

The first 5 flaky specs are shown, see all 15 specs in Cypress Cloud.

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

npm/vite-dev-server/client/initCypressTests.js Outdated Show resolved Hide resolved
@@ -115,6 +115,21 @@ describe('Config options', () => {
expect(verifyFile).to.eq('OK')
})
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see a test. Didn't we decide this would be a run-mode only thing, though (this test is for open mode - you'd need to write it in system-tests for a run mode test)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - while I was struggling to write a system test I started to question why this should be run only. Maybe the reason that it should be is that we don't want to modify the dynamic import behavior in open mode? We could make it run only and I'll have to come up with a testing strategy that doesn't use cy.intercept

@AlonMiz
Copy link

AlonMiz commented Mar 24, 2023

Hi, great to see that my article was of use.

just want to point out that the solution is partial, as it will retry the first file in the lazy import tree, it might fail on the next one in the chain.
eg. file1 -> imports file2 -> import file3
if file1 succeeds, but file2 fails, we might not be able to catch the URL of file2 in the same mechanism, and the cache will hold the failed request.
so overall, it does increase the success rate overall, but not resolve the issue entirely. 🙏

@@ -25,7 +61,7 @@ if (supportFile) {
// We need a slash before /cypress/supportFile.js, this happens by default
// with the current string replacement logic.
importsToLoad.push({
load: () => import(`${devServerPublicPathRoute}${supportRelativeToProjectRoot}`),
load: () => importWithRetry(() => import(`${devServerPublicPathRoute}${supportRelativeToProjectRoot}`)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refactor importWithRetry to accept the string path we want to import and just do the import(..) in there and repeat as needed? Then we wouldn't have to parse out the URL from the error message in that function


return await import(url.href)
} catch (e) {
console.error(`retrying import of ${url?.href}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - will print a message that we're going to retry on the last attempt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe more of a warning than an error if we're potentially going to recover from it with a retry. A total failure constitutes an error, at least in my mind

console.error(`retrying import of ${url?.href}`)

// add a timestamp to the end of the URL to force re-load the module instead of using the cache
url.searchParams.set('t', `${+new Date()}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url.searchParams.set('t', `${+new Date()}`)
url.searchParams.set('t', `${Date.now()}`)

Comment on lines +19 to +23
url = new URL(
error.message
.replace('Failed to fetch dynamically imported module: ', '')
.trim(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, what sort of details do we get from this error message? Can we tell the difference between a module not being ready and the dev server being totally hosed (misconfigured, etc) so we don't unnecessarily hang the test on retries if it can't possibly succeed?

@lmiller1990
Copy link
Contributor

Hi, great to see that my article was of use.

just want to point out that the solution is partial, as it will retry the first file in the lazy import tree, it might fail on the next one in the chain. eg. file1 -> imports file2 -> import file3 if file1 succeeds, but file2 fails, we might not be able to catch the URL of file2 in the same mechanism, and the cache will hold the failed request. so overall, it does increase the success rate overall, but not resolve the issue entirely. 🙏

I had this concern, too. From what I can see and what I've seen we always seem to fail on a "top level" import, either a spec or a support file, never the subsequent requests. One concern I have is maybe we are failing on subsequent requests, but the error is always manifesting the same.

Once this work around goes out, we will see if the issue continues to manifest. If it does, we can revert the workaround (since it was not successful).

Did you try the pre-release by any chance? Should be available here: 243502a#comments

@astone123
Copy link
Contributor Author

We have an example of this not working in the Cypress.io repo with the binary from this branch
https://github.com/cypress-io/cypress.io/actions/runs/4522253983/jobs/7964605507#step:6:5043

It re-tries importing the support file multiple times and fails every time

@lmiller1990
Copy link
Contributor

lmiller1990 commented Mar 28, 2023

https://github.com/cypress-io/cypress.io/actions/runs/4522253983/jobs/7964605507#step:6:5043

I see it here

2023-03-26T02:04:17.4614965Z   �[36;1mcypress:launcher:browsers �[0mchrome stderr: [0326/020417.460358:INFO:CONSOLE(36)] "retrying import of http://localhost:5173/__cypress/src/cypress/support/component.ts", source: http://localhost:5173/__cypress/iframes//home/runner/work/cypress.io/cypress.io/src/components/Vue/InputRange.cy.tsx (36) �[36m+2s�[0m
2023-03-26T02:04:17.4628919Z   �[34;1mcypress:server:browsers:chrome �[0mcontinueRequest: { requestId: �[32m'interception-job-337.0'�[39m } �[34m+2s�[0m
2023-03-26T02:04:17.4647471Z   �[36;1mcypress:server:routes-ct �[0mproxying to /__cypress/src, originalUrl /__cypress/src/cypress/support/component.ts?t=1679796257459 �[36m+2s�[0m
2023-03-26T02:04:17.4796381Z   �[36;1mcypress:launcher:browsers �[0mchrome stderr: [0326/020417.478929:INFO:CONSOLE(43)] "retrying import of http://localhost:5173/__cypress/src/cypress/support/component.ts?t=1679796257459", source: http://localhost:5173/__cypress/iframes//home/runner/work/cypress.io/cypress.io/src/components/Vue/InputRange.cy.tsx (43) �[36m+19ms�[0m
2023-03-26T02:04:21.4776439Z   �[36;1mcypress:launcher:browsers �[0mchrome stderr: [0326/020421.476538:INFO:CONSOLE(36)] "retrying import of http://localhost:5173/__cypress/src/cypress/support/component.ts", source: http://localhost:5173/__cypress/iframes//home/runner/work/cypress.io/cypress.io/src/components/Vue/InputRange.cy.tsx (36) �[36m+4s�[0m
2023-03-26T02:04:21.4787912Z   �[34;1mcypress:server:browsers:chrome �[0mcontinueRequest: { requestId: �[32m'interception-job-338.0'�[39m } �[34m+4s�[0m
2023-03-26T02:04:21.4808988Z   �[36;1mcypress:server:routes-ct �[0mproxying to /__cypress/src, originalUrl /__cypress/src/cypress/support/component.ts?t=1679796261475 �[36m+4s�[0m
2023-03-26T02:04:21.4912694Z   �[36;1mcypress:launcher:browsers �[0mchrome stderr: [0326/020421.489808:INFO:CONSOLE(43)] "retrying import of http://localhost:5173/__cypress/src/cypress/support/component.ts?t=1679796261475", source: http://localhost:5173/__cypress/iframes//home/runner/work/cypress.io/cypress.io/src/components/Vue/InputRange.cy.tsx (43) �[36m+14ms�[0m

Looks like this isn't the fix? I wonder what's happening here -- it looks like the file really doesn't exist, not even on the filesystem? I cannot imaging why else this would fail 3 times - it's not hitting the cache, right, since you added the ?t=new Date() param?

Also weird this is the support file - I wonder if that's just because that's the first thing that's imported, so it fails every time (eg - if we do supportFile: false, what happens? Do we see another import fail?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants