Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XHR requests are cancelled and re-issued if they don't return in 30 seconds #6426

Closed
alexvy86 opened this issue Feb 12, 2020 · 23 comments · Fixed by #7787
Closed

XHR requests are cancelled and re-issued if they don't return in 30 seconds #6426

alexvy86 opened this issue Feb 12, 2020 · 23 comments · Fixed by #7787
Labels
existing workaround type: unexpected behavior User expected result, but got another

Comments

@alexvy86
Copy link

alexvy86 commented Feb 12, 2020

Current behavior:

When an XHR request takes more than 30 seconds to complete, Cypress cancels the request and issues it again. Depending on what the server does with those requests, this can cause all sorts of problems (e.g. creating duplicate things, or directly resulting in failed tests when the server replies with a 5xx error that it wouldn't have issued if the request hadn't been sent twice).

Desired behavior:

Cypress should never retry an XHR request on its own, because it cannot know what the effects of a duplicate request could be.

Test code to reproduce

cypress-bug.zip (also contains the video generated by Cypress for the failing test).

I used the code in that zip file to generate the screen recording below. The idea is to set up a web server with an API call that takes 34 seconds to reply, then run a Cypress test that visits the home of the web server (which just issues an XHR call to the API method) and waits for the API response, with a timeout longer than the time it takes for the API to respond. You'll see on the logs of the webserver that the API method gets called twice, the second one pretty much exactly 30 seconds after the first one.

To work with the code:

  • run npm i on both the webserver and tester folders
  • run node index.js from the webserver folder
  • once it's running, npm run cypress from the tester folder

You'll see that the webserver logs one call to /, one to /api, and ~30 seconds later another call to /api. Cypress then reports a failed test.

Versions

The issue happens starting with Cypress 3.8.0 (tested with 3.6.1, 3.7.0, 3.8.0, 4.0.1)
Windows 10
Headless tests with Electron

Additional information

I originally discovered this with the Cypress Docker images. In the project where I found it I could even see Nginx reporting that the client (Cypress running tests) had closed the connection, right before receiving the duplicate request.

Screen recording of the code above showing the issue.

cypress retries request

@cypress-bot cypress-bot bot added the stage: needs investigating Someone from Cypress needs to look at this label Apr 10, 2020
@georgeclearnomics
Copy link

Have experienced this behavior as well on version 4.4.0 (likely older versions as well but was not caught until recent update) at both timeouts of 60 seconds and 25 seconds (after seeing this open issue). Requests are made exactly 60 or 25 seconds apart, respectively, after the timeout occurs when the test with the related XHR request fails.

@jennifer-shehane
Copy link
Member

jennifer-shehane commented May 6, 2020

I can recreate this behavior from the repo provided.

I believe this is because of how the client behaves on aborted requests. Gonna have to check on this though.

If Cypress does not see the request within the given timeout, we abort the XHR request (so that the tests can continue to the next assertion/test). Clients are meant to retry requests if the connection closes before receiving a status from the server.

if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.4

@alexvy86
Copy link
Author

alexvy86 commented May 6, 2020

@jennifer-shehane I think the part right before where you started the quote is also very relevant (emphasis mine):

and if the client is not directly connected to an HTTP/1.1 origin server, and if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request

In this case the client is already "directly connected to an HTTP/1.1 origin server", right? It sent the request and is only waiting for the server to reply. I can see why if the client has not yet established a connection to the origin server, and it sees a connection close, then it should retry the request, because it's guaranteed that the server has not received it before. But once it established a connection with the server, I'd imagine that it should not retry a request that it already sent, no matter the kind of failure it sees (including a closed connection), otherwise the spec would be calling for potentially issuing POST requests to the server several times, which I'm not sure makes sense...

Besides that, note that the timeout I specified for Cypress in the repro was 35 seconds, so Cypress should not be aborting these requests at the 30 second mark, correct?

@jennifer-shehane
Copy link
Member

So, I discussed this with the team and my assumptions before were incorrect.

The Cypress proxy layer expects some sort of response within 30 seconds or else it will retry under the hood - this is controlled by the request timeout. This attempts to mimic the way the browser retries, but it seems in this case that this is not mimicking the behavior of the Chrome browser. cc @flotwig

@jennifer-shehane jennifer-shehane added the type: unexpected behavior User expected result, but got another label May 7, 2020
@flotwig
Copy link
Contributor

flotwig commented May 7, 2020

The Cypress proxy layer expects some sort of response within 30 seconds or else it will retry under the hood - this is controlled by the request timeout. This attempts to mimic the way the browser retries, but it seems in this case that this is not mimicking the behavior of the Chrome browser. cc @flotwig

The proxy layer will retry certain failures, but timeouts are not one of them, precisely because it does not match the browser's behavior:

const NETWORK_ERRORS = 'ECONNREFUSED ECONNRESET EPIPE EHOSTUNREACH EAI_AGAIN ENOTFOUND'.split(' ')

However, I tested in Chrome and Chrome does not issue the second request, so it is most likely a bug in Cypress. DEBUG=cypress:* would most likely shed some light on what is issuing the second request.

@alexvy86
Copy link
Author

alexvy86 commented May 7, 2020

I wonder if the browser would retry on an ECONNRESET... since that one could happen after the client has finished sending the HTTP request to the server, and the request might thus be getting processed, it seems to me like an ECONNRESET caused by an intermediate proxy closing its connection should not be cause for the client to retry. If any of the others happen (except EPIPE, I'm not entirely sure on that one), then the client has a guarantee that the server did not receive the request, and thus it's safe to retry, but ECONNRESET (and potentially EPIPE) seem to me like they don't fit that pattern.

@alexagranov7
Copy link

I'm also seeing this issue if my API takes ~30sec or longer. As soon as I adjusted my API to 25 seconds, it worked. Something is in Cypress that doesn't like waiting longer than 28sec or so.

@alexagranov7
Copy link

My solution was to significantly increase the global timeout in the cypress.json configuration.

@Rey-Wang
Copy link

Rey-Wang commented Jun 17, 2020

@alexvy86 Not sure its the same issue in my side, It got net: ERR_EMPTY_RESPONSE once the response is too slow.
image-20200611-072527

@flotwig
Copy link
Contributor

flotwig commented Jun 17, 2020

By default, backend requests are killed after responseTimeout milliseconds. By default, this is 30 seconds: https://docs.cypress.io/guides/references/configuration.html#Timeouts

this._request = Request({ timeout: config.responseTimeout })

Probably the fix is to remove the timeout on the backend (or change the timeout to be equal to Chrome's, which I believe is 120000), since it should not actually be coupled to responseTimeout.

@alexvy86 Workaround: Increase responseTimeout in your cypress.json to 120000, or whatever you need.

@cypress-bot cypress-bot bot added stage: ready for work The issue is reproducible and in scope and removed stage: needs investigating Someone from Cypress needs to look at this labels Jun 17, 2020
@alexvy86
Copy link
Author

That's the effect that this is supposed to have, right? Per the documentation here, responseTimeout for a cy.wait() call should default to timeout, in this case 35 seconds? But we still see Cypress cancelling the request at the 30 second mark.

image

I'll try to make some to test by changing responseTimeout at the "root" level and see if that works, but even then, it sounds like there's still a bug when passing timeout to cy.wait(), no?

Probably the fix is to remove the timeout on the backend

By the backend you mean the server that gets called? That one has no explicit timeout in the test code I uploaded, nor in the environment where we first encountered this issue. I'm certain it's not the server terminating the connection that causes this behavior.

@alexvy86
Copy link
Author

Adding "responseTimeout": 60000 to my cypress.json file did help as a workaround. So I guess this report can be narrowed down to pointing out a bug in how cy.wait() handles timeouts when they are passed in, or an issue with the documentation. If the global responseTimeout is supposed to be a hard limit, it's probably worth pointing that timeouts passed in as parameters to other methods can't go past that hard limit.

@Rey-Wang
Copy link

Rey-Wang commented Jun 18, 2020

@flotwig Also I found same scenario with cy.click, Have a button and use cy.click to trigger click event and send a request to the server-side, it would happen twice once the request is slow. is it also related to the cy.wait in the deep implementation of cy.click?

How I solve this problem

I change to use cy.trigger to click this button, and it would not happen

@flotwig
Copy link
Contributor

flotwig commented Jun 18, 2020

That's the effect that this is supposed to have, right? Per the documentation here, responseTimeout for a cy.wait() call should default to timeout, in this case 35 seconds? But we still see Cypress cancelling the request at the 30 second mark.

@alexvy86 The responseTimeout is used in wait to determine how long it should wait for a response. It is also used in the Cypress proxy to terminate all requests after the timeout is reached, and I believe that is the problem, the Cypress proxy should not be terminating all requests after responseTimeout ms.

@Rey-Wang The problem should apply to any network request, so that sounds like this issue

@alexvy86
Copy link
Author

Just to be clear, the timeout used by the proxy to terminate requests (the one in cypress.json), is called responseTimeout but is not the same as the timeout/responseTimeout we pass to cy.wait(), correct?

I got a bit confused when you said

The responseTimeout is used in wait to determine how long it should wait for a response. It is also used in the Cypress proxy to terminate all requests

I took it to mean they were the same thing, that the timeout we pass to wait should be the one the proxy uses to terminate requests. But after re-reading the whole thread I get the feeling that they're separate, and that's why you said earlier that the proxy timeout should probably be 120000, regardless of what we pass to wait... is that right?

@flotwig
Copy link
Contributor

flotwig commented Jun 18, 2020

Just to be clear, the timeout used by the proxy to terminate requests (the one in cypress.json), is called responseTimeout but is not the same as the timeout/responseTimeout we pass to cy.wait(), correct?

It is the same responseTimeout. It is only set once, when you start the project, so it's unaffected by passing timeout options after that point.

The responseTimeout in the proxy is is the same responseTimeout that cy.wait defaults to.

So you can see how this is a problem if you have a request you're waiting for that lasts longer than responseTimeout. Assuming you have the default responseTimeout: Even if you do cy.wait('@foo', { responseTimeout: 60000 }) to override the 30 second default timeout on cy.wait, the responseTimeout on the proxy layer is still going to be hit at the default of 30 seconds and kill the request at the network level. This will cause the browser to retry the outgoing XHR, which will fail again after responseTimeout, until the timeout on cy.wait is hit and the test fails.

@alexvy86
Copy link
Author

alexvy86 commented Jun 18, 2020

Got it, got it. So... what the documentation for wait() says here, about the responseTimeout passed in to the wait() call overriding the global responseTimeout for that particular request, needs further clarification, right? It probably lets us tell wait to use a shorter timeout than the global responseTimeout (and this doesn't affect the actual request at all, it's just the call to wait that times out earlier), but not a longer one, because the proxy layer will terminate the request when it hits the global responseTimeout. After re-reading your response I guess even if the request is terminated by the proxy layer and retried, if that retry finishes before wait times out, then maybe wait will return successfully once it sees the correct response of that retry, so in a sense that could be considered as "allowing a longer timeout than the global responseTimeout". In any case, now I understand exactly what the issue is, thanks for the explanations @flotwig .

And after thinking about it, it actually makes sense that wait() in particular cannot affect the timeout for the proxy layer because the actual request is not initiated by wait(), it was triggered by something else, and thus it can only see the global responseTimeout.

@flotwig
Copy link
Contributor

flotwig commented Jun 18, 2020

And after thinking about it, it actually makes sense that wait() in particular cannot affect the timeout for the proxy layer because the actual request is not initiated by wait(), it was triggered by something else, and thus it can only see the global responseTimeout.

💯 yes exactly

@cypress-bot cypress-bot bot added stage: work in progress There is an open PR for this issue [WIP] and removed stage: ready for work The issue is reproducible and in scope labels Jun 23, 2020
@cypress-bot cypress-bot bot added stage: needs review The PR code is done & tested, needs review stage: work in progress There is an open PR for this issue [WIP] and removed stage: work in progress There is an open PR for this issue [WIP] labels Jun 30, 2020
@cypress-bot cypress-bot bot removed the stage: needs review The PR code is done & tested, needs review label Jul 2, 2020
@cypress-bot cypress-bot bot added stage: pending release There is a closed PR for this issue and removed stage: work in progress There is an open PR for this issue [WIP] labels Jul 6, 2020
@cypress-bot
Copy link
Contributor

cypress-bot bot commented Jul 6, 2020

The code for this is done in cypress-io/cypress#7787, but has yet to be released.
We'll update this issue and reference the changelog when it's released.

@alexvy86
Copy link
Author

alexvy86 commented Jul 6, 2020

@flotwig one last thing about this topic... I think the documentation for wait still needs clarification, right? It says that passing responseTimeout will override the global responseTimeout, but based on our conversation, this way of providing an override will only work for a value that is shorter than the global responseTimeout, correct?

@alexvy86
Copy link
Author

alexvy86 commented Jul 6, 2020

Actually, if I understand the PR correctly, you made it so the global timeout isn't applied to all requests... and that'll let them respect the responseTimeout provided in wait?

@cypress-bot
Copy link
Contributor

cypress-bot bot commented Jul 7, 2020

Released in 4.10.0.

This comment thread has been locked. If you are still experiencing this issue after upgrading to
Cypress v4.10.0, please open a new issue.

@cypress-bot cypress-bot bot removed the stage: pending release There is a closed PR for this issue label Jul 7, 2020
@cypress-bot cypress-bot bot locked as resolved and limited conversation to collaborators Jul 7, 2020
@flotwig
Copy link
Contributor

flotwig commented Jul 7, 2020

@alexvy86 yep, now the documentation's claims about responseTimeout should hold true.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
existing workaround type: unexpected behavior User expected result, but got another
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants