Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adds GITHUB_RUN_ATTEMPT counter for tracking retries in the Cypress Cloud #23445

Merged
merged 3 commits into from
Aug 23, 2022

Conversation

jaimefps
Copy link
Contributor

@jaimefps jaimefps commented Aug 18, 2022

As in the parent Epic, we return PASS in the following scenario:

  a. CI runs a set of tests for a single build in multiple parallel processes

  b. One or more tests FAIL

  c. When CI re-runs a slice of the parallelized build Cypress does not run any tests in this re-run

  d. Cypress returns a PASS for the slice, even though the corresponding slice has not run successfully

In this scenario, customer builds that should have been blocked from acceptance by CI are 
mistakenly accepted, which likely results in bad code from being advanced. This is exactly 
the opposite of the value that we promise to deliver into the CI/CD pipeline.

A/C:

- return a FAIL status if the requested slice has not successfully passed

User facing changelog

Includes GITHUB_RUN_ATTEMPT env variable when claiming specs from Cypress Cloud. This allows Cypress Cloud dashboard to distinguish between valid and invalid GH Action requests. This will only fix the problem described above for users of Cypress version 10 and above. Users on Cypress 9 and below will continue to face this issue. We also assume that the customer is generating custom build ids as described here.

See GH Actions API docs here
Specifically this portion, contrasting RUN_ID with RUN_ATTEMPT:
Screen Shot 2022-08-18 at 3 01 52 PM

Additional details

Issue: Customers have reported that retried GH Actions automatically pass when a failed machine is retried. The retried machine is marked as passed without even attempting to rerun Cypress Specs.

Root cause: Cypress Cloud failed to distinguish between a GH Action machine that was "retrying " a task, and a machine that was simply making a request for "unclaimed" Specs in a Run. When the Cypress Run queue for Specs is empty, we would tell the retrying machine that everything is fine and finished, since we didn't know that the request came from a "retry" request.

Temporal solution: We still do not support test retries for partial re-runs of GH Action jobs. But we will correctly exit with an error message in those cases, so that customers don't leave with the impression that the GH Action retry completed successfully.

Steps to test

This requires updates on the Cypress Cloud dashboard. Once those updates are made, the expected behavior is for retries to always throw an error along the lines of "GH Action partial re-runs not supported at this time. Please re-run all jobs". Reach out to Cypress Cloud team to confirm the updates have been included in the Cloud server as well.

How has the user experience changed?

Users will now receive an accurate response from the Cypress Cloud server explaining that we do not support partial retries at this time. For now it is important that customers do not accidentally leave with the impression that failed tests are passing, and that it is safe to merge code after a partial GH Action retry that we incorrectly mark as passed.

This only applies to users of Cypress 10 and above.

PR Tasks

  • Have tests been added/updated?
  • Has the original issue (or this PR, if no issue exists) been tagged with a release in ZenHub? (user-facing changes only)
  • Has a PR for user-facing changes been opened in cypress-documentation?
  • Have API changes been updated in the type definitions?

@cypress-bot
Copy link
Contributor

cypress-bot bot commented Aug 18, 2022

Thanks for taking the time to open a PR!

@CLAassistant
Copy link

CLAassistant commented Aug 18, 2022

CLA assistant check
All committers have signed the CLA.

@jaimefps jaimefps changed the title adds github attempt counter for tracking retries in the Cypress Cloud [CLOUD-784]: adds GITHUB_RUN_ATTEMPT counter for tracking retries in the Cypress Cloud Aug 18, 2022
@jaimefps jaimefps changed the title [CLOUD-784]: adds GITHUB_RUN_ATTEMPT counter for tracking retries in the Cypress Cloud fix: adds GITHUB_RUN_ATTEMPT counter for tracking retries in the Cypress Cloud Aug 18, 2022
@jaimefps jaimefps force-pushed the 784-adds-github-retry-number branch from 55a0843 to 2ec1909 Compare August 18, 2022 21:48
@AtofStryker AtofStryker self-requested a review August 18, 2022 22:08
@cypress
Copy link

cypress bot commented Aug 18, 2022



Test summary

4648 0 375 0Flakiness 0


Run details

Project cypress
Status Passed
Commit 4386cc1
Started Aug 23, 2022 3:43 PM
Ended Aug 23, 2022 3:57 PM
Duration 13:46 💡
OS Linux Debian - 11.3
Browser Firefox 99

View run in Cypress Dashboard ➡️


This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. You can manage this integration in this project's settings in the Cypress Dashboard

@jaimefps jaimefps force-pushed the 784-adds-github-retry-number branch from 2ec1909 to 47d8655 Compare August 18, 2022 22:35
@flotwig
Copy link
Contributor

flotwig commented Aug 19, 2022

Users will now receive an accurate response from the Cypress Cloud server explaining that we do not support retries at this time.

@jaimefps Let me check my understanding: the issue here is with the re-used job ID, correct? Because I re-run CircleCI tests all the time. But each job in Circle has a unique job number, so it does not hit this bug. GH Actions re-uses job numbers for retries, causing this issue, correct?

If that's the case, then why do we need to fail user tests at all? It sounds like we could concatenate RUN_ID + RUN_ATTEMPT and use that as the run ID?

@jaimefps
Copy link
Contributor Author

Users will now receive an accurate response from the Cypress Cloud server explaining that we do not support retries at this time.

@jaimefps Let me check my understanding: the issue here is with the re-used job ID, correct? Because I re-run CircleCI tests all the time. But each job in Circle has a unique job number, so it does not hit this bug. GH Actions re-uses job numbers for retries, causing this issue, correct?

If that's the case, then why do we need to fail user tests at all? It sounds like we could concatenate RUN_ID + RUN_ATTEMPT and use that as the run ID?

Alejandro and I are reviewing if we can avoid throwing errors.
I'll update the description of this PR if we figure it out.
It will likely be something like what you recommend here.

@jaimefps
Copy link
Contributor Author

jaimefps commented Aug 23, 2022

@flotwig This is the solution we've come up with at this time.
The PR is still in progress, but we are more comfortable with the approach being taken.
I'll update the PR description here as well, to reflect our new understanding.

https://github.com/cypress-io/cypress-services/pull/4739

@jaimefps jaimefps merged commit 7d71ffe into develop Aug 23, 2022
@jaimefps jaimefps deleted the 784-adds-github-retry-number branch August 23, 2022 21:01
@cypress-bot
Copy link
Contributor

cypress-bot bot commented Aug 30, 2022

Released in 10.7.0.

This comment thread has been locked. If you are still experiencing this issue after upgrading to
Cypress v10.7.0, please open a new issue.

@cypress-bot cypress-bot bot locked as resolved and limited conversation to collaborators Aug 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants