Skip to content
This repository has been archived by the owner on Jul 30, 2018. It is now read-only.

CancelErrors on Chrome/Android #95

Closed
kitsonk opened this issue Dec 7, 2015 · 14 comments
Closed

CancelErrors on Chrome/Android #95

kitsonk opened this issue Dec 7, 2015 · 14 comments
Assignees
Milestone

Comments

@kitsonk
Copy link
Member

kitsonk commented Dec 7, 2015

There are several tests appear to fail predictably on CI for Chrome/Android:

× chrome on Windows 10 - unit tests - request - browser - .get (5.009s)
CancelError: Timeout reached on chrome on Windows 10 - unit tests - request - browser - .get
  at <__intern/lib/Test.js:162:20>
× chrome on Windows 10 - unit tests - request - browser - JSON filter (5.007s)
CancelError: Timeout reached on chrome on Windows 10 - unit tests - request - browser - JSON filter
  at <__intern/lib/Test.js:162:20>
× android 4.4 on Linux - unit tests - request - browser - .get (5.027s)
CancelError: Timeout reached on android 4.4 on Linux - unit tests - request - browser - .get
Error: Timeout reached on android 4.4 on Linux - unit tests - request - browser - .get
  at <__intern/lib/Test.js:162:20>
× android 4.4 on Linux - unit tests - request - browser - JSON filter (5.017s)
CancelError: Timeout reached on android 4.4 on Linux - unit tests - request - browser - JSON filter
Error: Timeout reached on android 4.4 on Linux - unit tests - request - browser - JSON filter
  at <__intern/lib/Test.js:162:20>
@kitsonk
Copy link
Member Author

kitsonk commented Jan 13, 2016

I suspect this is related to dojo/loader#39, though it needs more digging.

@kitsonk kitsonk assigned tomdye and msssk and unassigned mwistrand and tomdye Feb 11, 2016
@kitsonk kitsonk added this to the alpha.3 milestone Feb 29, 2016
@kitsonk kitsonk assigned vansimke and unassigned msssk Mar 11, 2016
@kitsonk
Copy link
Member Author

kitsonk commented Mar 11, 2016

Based on what @vansimke said in dojo/loader#39, I don't think this is related to that. But what he said is probably related to this.

@vansimke
Copy link
Contributor

Added comment that was mistakenly added to dojo/loader#39:

I've played around with this and I think this may be related to a performance issue in CI. If I set the timeout for the .get() test to 60 seconds and the others to 30 seconds, then all the tests pass in Chrome and Android. Not sure if that is an acceptable solution though since this means that these tests are going to be prone to fail at seemingly random intervals.

@kitsonk
Copy link
Member Author

kitsonk commented Mar 11, 2016

Wow, 60 seconds. That seems crazy, especially when I think the other browsers complete that just fine. I haven't looked at the timing on the other browsers, but I bet it is a lot less than 30-60 seconds.

It seems like something else is causing a problem right?

@vansimke
Copy link
Contributor

I would expect so. Chrome is fine with 30 and 10 (didn't play to see how low I could go, but 10s failed with .get()). I went from 30s to 60s for Android on the get() as a last gasp to see if I could get them all to pass.

I'm going to dig into the guts of the tests next and see if there is something odd going on in there. I can't imagine what though.

@vansimke
Copy link
Contributor

I wonder if we're clogging the browser's request pipeline? Maybe we're making too many HTTP requests as part of running the suite and it is just taking time for them to clean themselves up.

@msssk
Copy link
Contributor

msssk commented Mar 11, 2016

The resources allocated for cloud testing can be extremely limited, making timing-sensitive tests very unreliable. I think our best bet is to give tests very generous timeouts, but have them resolve themselves asap when the test conditions are met.

@vansimke
Copy link
Contributor

I think that I might be on to something. When I run the full suite, the Android browser takes 45+ seconds to return, but running only the tests/unit/request.ts tests brings it down to less than 2s. I'm pretty sure the tests are fighting with Intern for network resources.

@kitsonk
Copy link
Member Author

kitsonk commented Mar 12, 2016

@jason0x43 any thoughts on this? I know we had challenges with IE9 and reordered the way we return the results, but now it seems like on other large test suites we are having something similar and have reliably triggered it now on both Chrome and Android. Maybe something @jacobroufa and @vansimke could work on together?

@jason0x43
Copy link
Member

@vansimke is likely correct that the browser's request pipeline is being saturated. The WebDriver reporter in intern master significantly improves the situation by buffering messages and sending them using a single active connection at a time. This appeared to improve the stability of the core tests in several runs I made this weekend. At least, I got significantly fewer failures with intern master than with 3.0.6.

In 3.0.6 you can set waitForRunner to true. This can significantly slow down the testing process, but by causing the testing system to send lifecycle messages synchronously with the testing process (rather than batching them and sending them asynchronously), it removes the network contention. This also appeared to improve the stability of the request tests.

Assuming network contention really is the issue, it may be useful for Intern to provide a method allowing network-sensitive tests to clear out the lifecycle message queue before the test starts.

Even though the above two changes (using intern master and using waitForRunner in 3.0.6) did improve stability, I still did get occasional failures. As @msssk points out, using cloud resources can make relying on timing, particularly network timing, a bit of a nightmare. Sometimes a request to the echo server would simply take longer than the timeout (which I left set at 5000ms).

@vansimke
Copy link
Contributor

Setting waitForRunner to true did, in fact, stabilize the tests without having to change the timeout, but the entire suite took much longer to run (about 6x). The .get() test ran in only 0.281s vs 10s+ on Win10/Chrome and 0.272s vs ~45s for Android with this set.

I'm going to keep working with this to see if we can improve the stability without compromising the execution time, but for now, I would recommend setting the waitForRunner flag to true so that results can be relied upon.

@kitsonk
Copy link
Member Author

kitsonk commented Mar 14, 2016

@vansimke I wouldn't be opposed either to trying out "intern": "theintern/intern" as the development dependency to utilise what is in master. If that also stabilises things, then it obviously puts pressure on us to get the next release of Intern out.

@vansimke
Copy link
Contributor

@kitsonk That is what I'm experimenting with now. I'll update this issue when I get results.

vansimke added a commit to vansimke/core that referenced this issue Mar 14, 2016
kitsonk pushed a commit that referenced this issue Mar 15, 2016
Temporary measure to help improve test stability

Closes #123
Refs #95
@kitsonk kitsonk added this to the 2.0.0 milestone Mar 17, 2016
@kitsonk kitsonk removed this from the alpha.3 milestone Mar 17, 2016
@kitsonk kitsonk modified the milestones: 2016.06, 2.0.0 Apr 8, 2016
@kitsonk kitsonk modified the milestones: 2016.06, 2016.07 Jul 4, 2016
@kitsonk kitsonk modified the milestones: 2016.07, 2016.08 Aug 1, 2016
@kitsonk
Copy link
Member Author

kitsonk commented Oct 4, 2016

sinon does not do well in certain async situations which was the root cause to this.

@kitsonk kitsonk closed this as completed Oct 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants