CancelErrors on Chrome/Android #95

kitsonk · 2015-12-07T14:57:17Z

There are several tests appear to fail predictably on CI for Chrome/Android:

× chrome on Windows 10 - unit tests - request - browser - .get (5.009s)
CancelError: Timeout reached on chrome on Windows 10 - unit tests - request - browser - .get
  at <__intern/lib/Test.js:162:20>
× chrome on Windows 10 - unit tests - request - browser - JSON filter (5.007s)
CancelError: Timeout reached on chrome on Windows 10 - unit tests - request - browser - JSON filter
  at <__intern/lib/Test.js:162:20>
× android 4.4 on Linux - unit tests - request - browser - .get (5.027s)
CancelError: Timeout reached on android 4.4 on Linux - unit tests - request - browser - .get
Error: Timeout reached on android 4.4 on Linux - unit tests - request - browser - .get
  at <__intern/lib/Test.js:162:20>
× android 4.4 on Linux - unit tests - request - browser - JSON filter (5.017s)
CancelError: Timeout reached on android 4.4 on Linux - unit tests - request - browser - JSON filter
Error: Timeout reached on android 4.4 on Linux - unit tests - request - browser - JSON filter
  at <__intern/lib/Test.js:162:20>

kitsonk · 2016-01-13T10:14:23Z

I suspect this is related to dojo/loader#39, though it needs more digging.

kitsonk · 2016-03-11T21:39:21Z

Based on what @vansimke said in dojo/loader#39, I don't think this is related to that. But what he said is probably related to this.

vansimke · 2016-03-11T21:41:31Z

Added comment that was mistakenly added to dojo/loader#39:

I've played around with this and I think this may be related to a performance issue in CI. If I set the timeout for the .get() test to 60 seconds and the others to 30 seconds, then all the tests pass in Chrome and Android. Not sure if that is an acceptable solution though since this means that these tests are going to be prone to fail at seemingly random intervals.

kitsonk · 2016-03-11T21:47:57Z

Wow, 60 seconds. That seems crazy, especially when I think the other browsers complete that just fine. I haven't looked at the timing on the other browsers, but I bet it is a lot less than 30-60 seconds.

It seems like something else is causing a problem right?

vansimke · 2016-03-11T21:55:36Z

I would expect so. Chrome is fine with 30 and 10 (didn't play to see how low I could go, but 10s failed with .get()). I went from 30s to 60s for Android on the get() as a last gasp to see if I could get them all to pass.

I'm going to dig into the guts of the tests next and see if there is something odd going on in there. I can't imagine what though.

vansimke · 2016-03-11T22:02:01Z

I wonder if we're clogging the browser's request pipeline? Maybe we're making too many HTTP requests as part of running the suite and it is just taking time for them to clean themselves up.

msssk · 2016-03-11T22:02:47Z

The resources allocated for cloud testing can be extremely limited, making timing-sensitive tests very unreliable. I think our best bet is to give tests very generous timeouts, but have them resolve themselves asap when the test conditions are met.

vansimke · 2016-03-11T22:16:51Z

I think that I might be on to something. When I run the full suite, the Android browser takes 45+ seconds to return, but running only the tests/unit/request.ts tests brings it down to less than 2s. I'm pretty sure the tests are fighting with Intern for network resources.

kitsonk · 2016-03-12T13:22:40Z

@jason0x43 any thoughts on this? I know we had challenges with IE9 and reordered the way we return the results, but now it seems like on other large test suites we are having something similar and have reliably triggered it now on both Chrome and Android. Maybe something @jacobroufa and @vansimke could work on together?

jason0x43 · 2016-03-14T04:38:20Z

@vansimke is likely correct that the browser's request pipeline is being saturated. The WebDriver reporter in intern master significantly improves the situation by buffering messages and sending them using a single active connection at a time. This appeared to improve the stability of the core tests in several runs I made this weekend. At least, I got significantly fewer failures with intern master than with 3.0.6.

In 3.0.6 you can set waitForRunner to true. This can significantly slow down the testing process, but by causing the testing system to send lifecycle messages synchronously with the testing process (rather than batching them and sending them asynchronously), it removes the network contention. This also appeared to improve the stability of the request tests.

Assuming network contention really is the issue, it may be useful for Intern to provide a method allowing network-sensitive tests to clear out the lifecycle message queue before the test starts.

Even though the above two changes (using intern master and using waitForRunner in 3.0.6) did improve stability, I still did get occasional failures. As @msssk points out, using cloud resources can make relying on timing, particularly network timing, a bit of a nightmare. Sometimes a request to the echo server would simply take longer than the timeout (which I left set at 5000ms).

vansimke · 2016-03-14T14:37:52Z

Setting waitForRunner to true did, in fact, stabilize the tests without having to change the timeout, but the entire suite took much longer to run (about 6x). The .get() test ran in only 0.281s vs 10s+ on Win10/Chrome and 0.272s vs ~45s for Android with this set.

I'm going to keep working with this to see if we can improve the stability without compromising the execution time, but for now, I would recommend setting the waitForRunner flag to true so that results can be relied upon.

kitsonk · 2016-03-14T14:44:40Z

@vansimke I wouldn't be opposed either to trying out "intern": "theintern/intern" as the development dependency to utilise what is in master. If that also stabilises things, then it obviously puts pressure on us to get the next release of Intern out.

vansimke · 2016-03-14T14:48:59Z

@kitsonk That is what I'm experimenting with now. I'll update this issue when I get results.

Temporary measure to help improve test stability Closes #123 Refs #95

kitsonk · 2016-10-04T18:14:16Z

sinon does not do well in certain async situations which was the root cause to this.

kitsonk added the needs attention label Dec 7, 2015

kitsonk assigned morrinene Dec 7, 2015

morrinene assigned mwistrand and unassigned morrinene Dec 12, 2015

kitsonk assigned tomdye and msssk and unassigned mwistrand and tomdye Feb 11, 2016

kitsonk added this to the alpha.3 milestone Feb 29, 2016

kitsonk assigned vansimke and unassigned msssk Mar 11, 2016

vansimke added a commit to vansimke/core that referenced this issue Mar 14, 2016

Fix dojo#95: update intern dependency to point to master branch

f5cc6b8

vansimke mentioned this issue Mar 14, 2016

Fix #95: update intern dependency to point to master branch #123

Closed

kitsonk pushed a commit that referenced this issue Mar 15, 2016

Update intern dependency to point to master

ac90f4c

Temporary measure to help improve test stability Closes #123 Refs #95

kitsonk added bug - test case and removed needs attention labels Mar 17, 2016

kitsonk added this to the 2.0.0 milestone Mar 17, 2016

kitsonk removed this from the alpha.3 milestone Mar 17, 2016

kitsonk unassigned vansimke Mar 18, 2016

kitsonk modified the milestones: 2016.06, 2.0.0 Apr 8, 2016

kitsonk assigned matt-gadd Jun 7, 2016

kitsonk modified the milestones: 2016.06, 2016.07 Jul 4, 2016

kitsonk modified the milestones: 2016.07, 2016.08 Aug 1, 2016

kitsonk closed this as completed Oct 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CancelErrors on Chrome/Android #95

CancelErrors on Chrome/Android #95

kitsonk commented Dec 7, 2015

kitsonk commented Jan 13, 2016

kitsonk commented Mar 11, 2016

vansimke commented Mar 11, 2016

kitsonk commented Mar 11, 2016

vansimke commented Mar 11, 2016

vansimke commented Mar 11, 2016

msssk commented Mar 11, 2016

vansimke commented Mar 11, 2016

kitsonk commented Mar 12, 2016

jason0x43 commented Mar 14, 2016

vansimke commented Mar 14, 2016

kitsonk commented Mar 14, 2016

vansimke commented Mar 14, 2016

kitsonk commented Oct 4, 2016

CancelErrors on Chrome/Android #95

CancelErrors on Chrome/Android #95

Comments

kitsonk commented Dec 7, 2015

kitsonk commented Jan 13, 2016

kitsonk commented Mar 11, 2016

vansimke commented Mar 11, 2016

kitsonk commented Mar 11, 2016

vansimke commented Mar 11, 2016

vansimke commented Mar 11, 2016

msssk commented Mar 11, 2016

vansimke commented Mar 11, 2016

kitsonk commented Mar 12, 2016

jason0x43 commented Mar 14, 2016

vansimke commented Mar 14, 2016

kitsonk commented Mar 14, 2016

vansimke commented Mar 14, 2016

kitsonk commented Oct 4, 2016