[hold] Bug: Randomly failing E2E tests #376

wishfulthinkerme · 2018-11-22T13:46:27Z

Expected Results

Stop E2E from failing randomly.

Observed Results

Randomly, random tests are failing locally and in the pipeline. There isn't any case, which this can be reproduced. Sometimes it works, sometimes not. We have to check if that is caused by our E2E tests, connection issues or backend problems.

Xymmer · 2018-12-05T19:46:19Z

moving this back in, gil is saying this still happens, sometimes, with the server;
Marcin, i can't remember why we moved this to done, do you?

wishfulthinkerme · 2018-12-06T09:52:59Z

I couldn't reproduce random failing, so we decided to move it to the done for now.

marlass · 2018-12-11T09:07:58Z

First we need to resolve the issue with wrong backend payment url.

Xymmer · 2018-12-11T21:04:19Z

hey marcin, e2e fixed, can you resume investigation?

marlass · 2018-12-18T13:20:06Z

I looked at it for a few hours and could found the real issue. Failures are completely random. One time some test fail and then it works for few runs. In the meantime, another one fails and the story repeats. Will look more at this issue in next weeks and might check some alternatives to protractor that are much more stable and easier to debug and work with. Current solution slows whole development too much. We lose way to much time on triggering pipelines and the process of writing tests is also not efficient. We need to check our work line by line and writing whole tests that just runs in the first try is almost impossible. After finding some better replacement I will introduce it to a wider audience and we will decide the e2e tests future in the project.

dunqan · 2018-12-18T14:00:02Z

@marlass Please take into account that it could not be a protractor to blame (at least the only one), but underlying selenium. So if you'll decide on an alternative library that also uses selenium, the same issues will probably resurface after some time, where the number of tests will grow, especially in case of our app, where almost every part of the page is created dynamically (we need to make a backend call first).

What I wanted to point out is that there may be no easy/obvious solution to this, so please take caution on any miraculous alternatives.

And yup, writing them is hard (and it would be good to make it easier) but take into account, that testing all the stuff manually at the same rate is not just hard, it's impossible.

Also, there is a good one to read:
https://sqa.stackexchange.com/questions/32542/how-to-make-selenium-tests-more-stable

Against flaky test, even Google themselves content in vain.

marlass · 2018-12-18T14:58:45Z

Yeah. I first want to check Cypress that does not use selenium under the hood. In the previous project, we didn't have a failure rate that is even close to the current situation in the project and one thing that it improved dramatically was ease of debugging and writing tests. I will try first to move the happy path to it and if it will bring enough improvement, we will discuss it and decide if we want to move it.

hackergil · 2018-12-18T17:30:33Z

According to the link above that @dunqan provided and a couple other if you do some research, flakiness on selenium based tests is normal and expected. Rewriting the tests in another framework is not in scope for this ticket or something we should consider at this point.

I'd propose to consider the following:

First, we can get some stats on the success/failure ratio of e2e tests vs daily builds (travis has APIs for this)
Once we know this, maybe we can do some analysis on the sample data to see if it's the same test or tests that fail and go from there
Also, we could consider retrying the failing e2e tests using protractor-retry or something similar https://www.npmjs.com/package/protractor-retry
Additionally, we could even generate reports of the success/failure ratio

dunqan · 2018-12-18T18:54:26Z

Agree with @hackergil, that changing testing framework is not in the scope of this ticket, but...
if we plan to write many e2e tests, maybe it's not a bad idea to create another ticket, just to give a quick try with Cypress and compare the results?
If we can live with the main drawback, which is lack of support for firefox/ie-edge/safari (we probably can) the eventual better debugging and easier test writing makes it worth at least trying (not to mention better stability, because of not using selenium).

marlass · 2018-12-28T14:50:40Z

I checked our build statistics with Travis API and got the following results (last 3000 builds of 3725 total):

However this doesn't give the whole picture. Travis API doesn't return any information about jobs/builds restarts, so we only see the final result (which might be sometimes result of 3-4 restarts).

Almost 50% failure rate doesn't look good. Most of them fails at the 'Unit tests' stage that probably indicates the flaky e2e tests.

Source code: https://github.com/marlass/travis-build-stats

We can inspect more the tests, introduce some sort of auto retries or try some new solutions, because it really have a great impact on everyones work. Waiting sometimes 30 minutes to merge and then finding out that your branch is again not up to date and retrying this process is extremly frustrating. In my opinion this one thing is the biggest factor to our current development speed.

Let me know what in your opinion we should do next?

dunqan · 2018-12-31T09:12:16Z

As we discussed: because of the fact that Travis runs for each commit (and even runs twice if we have a PR), those stats include failures from each commit on "work in progress" branches that could have not yet adjusted unit (or e2e) tests.
And if those stats include only final results (without restarts) then they effectively exclude failures caused by flaky tests on the most representative (ready to merge) commits.

So I'd vote for implementing a more robust logging mechanism, that could take into account protractor output, job id, branch, commit -> then we could use that info to find most often failing tests, most flaky ones (that finally works after some restarts), etc.

And of course, test/implement a retry mechanism for protractor as soon as it is possible, to ease developers life. It's a part of this ticket (#580), but IMO deserves its own ticket.

marlass · 2018-12-31T17:18:58Z

After a quick search, I found that implementing better logging mechanism is pretty easy.
Here is a PR for gathering all our protractor result on S3 that we can use for advanced and reliable analysis: #790

marlass · 2019-01-10T11:23:39Z

Status of issue: I will review the stats in next week and prepare some script for the future.

Xymmer · 2019-05-24T12:28:10Z

no longer needed as we moved from protractor to cypress

wishfulthinkerme added the bug Something isn't working label Nov 22, 2018

Xymmer added CI Tasks related to the CI, pipeline, and releasing e2e-tests labels Nov 22, 2018

wishfulthinkerme self-assigned this Nov 23, 2018

Xymmer added this to the ALPHA-0 milestone Nov 27, 2018

Xymmer added team/asterix and removed team/asterix labels Dec 6, 2018

Xymmer added this to To Do in Asterix SPRINT via automation Dec 6, 2018

wishfulthinkerme removed their assignment Dec 7, 2018

Xymmer assigned marlass Dec 7, 2018

marlass moved this from To Do to In Progress in Asterix SPRINT Dec 11, 2018

marlass moved this from In Progress to To Do in Asterix SPRINT Dec 11, 2018

marlass moved this from To Do to In Progress in Asterix SPRINT Dec 14, 2018

marlass moved this from In Progress to To Do in Asterix SPRINT Dec 18, 2018

Xymmer moved this from To Do to In Progress in Asterix SPRINT Dec 19, 2018

Xymmer moved this from In Progress to To Do in Asterix SPRINT Dec 19, 2018

marlass moved this from To Do to In Progress in Asterix SPRINT Dec 28, 2018

Xymmer added team/asterix and removed team/asterix labels Jan 8, 2019

marlass moved this from In Progress to To Do in Asterix SPRINT Jan 10, 2019

marlass moved this from To Do to In Progress in Asterix SPRINT Jan 21, 2019

marlass moved this from In Progress to To Do in Asterix SPRINT Jan 22, 2019

kacperknapik changed the title ~~Bug: Randomly failing E2E tests~~ [hold] Bug: Randomly failing E2E tests Jan 22, 2019

kacperknapik removed this from To Do in Asterix SPRINT Jan 22, 2019

Xymmer modified the milestones: ALPHA-0, BETA-0 Mar 11, 2019

Xymmer modified the milestones: 1.0 Beta-0, 1.0 RC-0 Apr 25, 2019

Xymmer added the STILL NEEDED? label Apr 25, 2019

Xymmer modified the milestones: 1.0 RC-0, 1.? Milestone TBD, graveyard May 22, 2019

Xymmer closed this as completed May 24, 2019

Xymmer added this to the before-5.0 milestone Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hold] Bug: Randomly failing E2E tests #376

[hold] Bug: Randomly failing E2E tests #376

wishfulthinkerme commented Nov 22, 2018

Xymmer commented Dec 5, 2018 •

edited

wishfulthinkerme commented Dec 6, 2018

marlass commented Dec 11, 2018

Xymmer commented Dec 11, 2018

marlass commented Dec 18, 2018

dunqan commented Dec 18, 2018 •

edited

marlass commented Dec 18, 2018

hackergil commented Dec 18, 2018 •

edited

dunqan commented Dec 18, 2018

marlass commented Dec 28, 2018

dunqan commented Dec 31, 2018

marlass commented Dec 31, 2018

marlass commented Jan 10, 2019

Xymmer commented May 24, 2019

[hold] Bug: Randomly failing E2E tests #376

[hold] Bug: Randomly failing E2E tests #376

Comments

wishfulthinkerme commented Nov 22, 2018

Expected Results

Observed Results

Xymmer commented Dec 5, 2018 • edited

wishfulthinkerme commented Dec 6, 2018

marlass commented Dec 11, 2018

Xymmer commented Dec 11, 2018

marlass commented Dec 18, 2018

dunqan commented Dec 18, 2018 • edited

marlass commented Dec 18, 2018

hackergil commented Dec 18, 2018 • edited

dunqan commented Dec 18, 2018

marlass commented Dec 28, 2018

dunqan commented Dec 31, 2018

marlass commented Dec 31, 2018

marlass commented Jan 10, 2019

Xymmer commented May 24, 2019

Xymmer commented Dec 5, 2018 •

edited

dunqan commented Dec 18, 2018 •

edited

hackergil commented Dec 18, 2018 •

edited