Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve replay upload resiliency #29174

Merged
merged 39 commits into from Mar 29, 2024

Conversation

cacieprins
Copy link
Contributor

@cacieprins cacieprins commented Mar 20, 2024

Additional details

Major behavioral changes:

  • Protocol uploads will only retry on network-level errors and HTTP errors that should be retried (408, 429, 502, 503, and 504)
  • Protocol uploads will terminate without retrying if the upload stalls. Stall behavior is detected when the upload begins to send data, but pauses in sending data for more than 5 seconds.

To accomplish this:

  • The retry logic from protocol.ts was removed in favor of using fetch-retry-ts, which adds retry logic on top of any provided fetch implementation. This is important, as cross-fetch provides custom network agent support.
  • Implemented an HttpError class, to better handle HTTP errors going forward.
  • The primary Test Replay upload logic was moved from ProtocolManager to cloud/api/putProtocolArtifact.ts and cloud/upload/uploadStream
  • implements a StreamActivityMonitor that produces a Transform stream that monitors a given Readable, and an AbortController that will be signalled if the stream is inactive for too long. This is used in uploadStream.ts to detect stalled uploads, by piping the Test Replay archive's fs.ReadStream through the StreamActivityMonitor's Transform.
  • moves api.ts to cloud/api/api.ts to keep it colocated with putProtocolArtifact.ts, which is too specific of an implementation for the parent cloud/ directory.

There are some slight messaging changes:

  • For accuracy, when a Test Replay tarfile is larger than the allowable size (currently: 5GB): the message, Spec recording too large: db is $SIZE bytes, limit is $LIMIT bytes has changed to Spec recording too large: artifact is $SIZE bytes, limit is $LIMIT bytes.

Steps to test

  • Record and upload a spec with Test Replay enabled

How has the user experience changed?

PR Tasks

@cacieprins cacieprins force-pushed the cacie/perf/improve-replay-upload-resiliency branch 4 times, most recently from a03d09b to 766051c Compare March 26, 2024 15:06
@cacieprins cacieprins force-pushed the cacie/perf/improve-replay-upload-resiliency branch from 766051c to 967e876 Compare March 26, 2024 15:07
@cacieprins cacieprins marked this pull request as ready for review March 26, 2024 15:15
Copy link

cypress bot commented Mar 26, 2024

1 flaky test on run #54695 ↗︎

0 6082 86 0 Flakiness 1

Details:

fix comment style, remove confusingly inapplicable comment about whatwg streams
Project: cypress Commit: 3988cb8f4f
Status: Passed Duration: 16:27 💡
Started: Mar 29, 2024 3:01 PM Ended: Mar 29, 2024 3:18 PM
Flakiness  cypress/e2e/commands/waiting.cy.js • 1 flaky test • 5x-driver-chrome:beta

View Output

Test Artifacts
... > errors > throws when waiting for 2nd response to route Test Replay

Review all test suite changes for PR #29174 ↗︎

@AtofStryker AtofStryker self-requested a review March 26, 2024 18:49
@cacieprins cacieprins self-assigned this Mar 26, 2024
Copy link
Contributor

@AtofStryker AtofStryker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to grok a lot of this, but the refactor / enhancements make a lot of sense and really appreciate the thorough test coverage!

packages/server/lib/cloud/api/HttpError.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/api/putProtocolArtifact.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/api/putProtocolArtifact.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/upload/uploadStream.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/upload/uploadStream.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/upload/StreamActivityMonitor.ts Outdated Show resolved Hide resolved
packages/server/lib/cloud/upload/uploadStream.ts Outdated Show resolved Hide resolved
@@ -3,6 +3,10 @@

_Released 4/2/2024 (PENDING)_

**Performance:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really performance? Or more stability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little of A, a little of B - but the source issue is labeled performance

@@ -9,19 +9,19 @@ const RequestErrors = require('@cypress/request-promise/errors')
const { agent } = require('@packages/network')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but maybe rename this to index.ts so that we don't have to do api/api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 58c8daa

Copy link
Collaborator

@ryanthemanuel ryanthemanuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great. I love the encapsulation and it's really well documented and tested.

@cacieprins cacieprins merged commit 3a739f3 into develop Mar 29, 2024
80 of 82 checks passed
@cacieprins cacieprins deleted the cacie/perf/improve-replay-upload-resiliency branch March 29, 2024 16:22
@cacieprins cacieprins restored the cacie/perf/improve-replay-upload-resiliency branch April 1, 2024 14:09
@cacieprins cacieprins deleted the cacie/perf/improve-replay-upload-resiliency branch April 1, 2024 14:32
@cypress-bot
Copy link
Contributor

cypress-bot bot commented Apr 2, 2024

Released in 13.7.2.

This comment thread has been locked. If you are still experiencing this issue after upgrading to
Cypress v13.7.2, please open a new issue.

@cypress-bot cypress-bot bot locked as resolved and limited conversation to collaborators Apr 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Replay: [Upload] Update timeout logic to fail faster on catastrophic timeouts
3 participants