Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in CI with HttpStream/FakeReportServer #1617

Merged
merged 21 commits into from
Apr 12, 2022

Conversation

aurelien-reeves
Copy link
Contributor

Tests are not stable enough on the CI
It seems to be due to a race condition around http_stream and/or the fakeReportServer used for the tests.

@aurelien-reeves aurelien-reeves changed the title Add a test which reproduce the error Error in CI with HttpStream/FakeReportServer Mar 19, 2021
@aurelien-reeves aurelien-reeves added the 🔧 build Related to build / release process label Apr 26, 2021
@mattwynne
Copy link
Member

@aurelien-reeves is this still a live issue?

@aurelien-reeves
Copy link
Contributor Author

Yes, the issue is still live
But it happens far less often that it used to.

We may close that one until it raises often again

@mattwynne
Copy link
Member

@coveralls
Copy link

coveralls commented Feb 16, 2022

Coverage Status

Coverage remained the same at 98.235% when pulling 37073ad on http-stream-tests-race-condition into 9e9ebd2 on main.

mattwynne added a commit that referenced this pull request Feb 23, 2022
This hopefully resolves the intermittend failures we've had in CI.

Ref #1617
aurelien-reeves and others added 15 commits February 23, 2022 13:39
It looks like the problem occurs when the FakeReportServer starts up -
sometimes although it says it's ready and listening, the server isn't
really available, however long you wait for it.

We'll need to clean up after this commit, go back and make a more
minimal change, but I wanted to check it in for the record.

Here's what I tried:

- added HttpTerminator for managing the server shutdown (in case it
  wasn't shutting down cleanly)
- added waitOn to poll for the server being up after calling `start()`
- keep app in an instance variable
- use the same port for each server instance

In the end, it looks like the problem was caused by letting the http
server choose its own port (by passing in 0 for the port). Sometimes
this just doesn't seem to give us a working server. If we hard-code
a fixed port, at least while we're also using `waitOn`, it seems to
work consistently.

I'm going to revert this commit next and try to pare down these changes to
isolate the minimal change to fix the problem.
This hopefully resolves the intermittend failures we've had in CI.

Ref #1617
No need for all that socket management stuff
This reverts commit 03a5d89.

Turns out it was being used after all!
@mattwynne mattwynne force-pushed the http-stream-tests-race-condition branch from 70665af to 9467a1f Compare February 23, 2022 21:39
@mattwynne mattwynne marked this pull request as ready for review February 23, 2022 21:39
@mattwynne
Copy link
Member

OK I think we have identified the culprit and have a fix.

Pairing with @aurelien-reeves we isolated the problem down to starting the FakeReportServer: when the problem occurs, we get a listening event back from the HTTP server, but in fact the server isn't responding to connection attempts.

On a hunch, I tried using a fixed port instead of this trick of passing a 0 port to server.listen (which tells the server you want it to find a free port) and it started working.

In order to avoid hard-coding a port, I've used a library called portfinder to look up a free port to use, instead of letting the server do that. It seems to be more reliable.

Once I had it working I dismantled some complexity in the FakeReportServer around managing/closing sockets, which doesn't seem to be necessary.

@mattwynne
Copy link
Member

One thing that occurs to me is that we could move all of the cucumber-reports integration stuff out into some kind of plugin. It might be a useful exercise to create the extension points, and it would also mean we could leave these tests (and code) out of our main codebase / CI run.

This reverts commit 5c9599a.

I'd like to see this run once on the pull request before we remove it
for good
@aslakhellesoy
Copy link
Contributor

Nice!!!!

This reverts commit 59782d4.

We don't want these tests in the main branch, and I don't think we need
them anymore now we've fixed the race condition.
@mattwynne
Copy link
Member

Well, I guess I spoke too soon! https://github.com/cucumber/cucumber-js/runs/5310937749?check_suite_focus=true

Maybe we need a new test to isolate this issue. That previous one definitely seems to be passing OK now.

@aurelien-reeves
Copy link
Contributor Author

Actually that "ERR_PREMATURE_CLOSE" error is the one we originally had and tried to reproduce and fix.

@mattwynne
Copy link
Member

Actually that "ERR_PREMATURE_CLOSE" error is the one we originally had and tried to reproduce and fix.

Yeah, it's the same one I witnessed here, too.

Interesting that it happened on Windows both times. I wonder if that's relevant.

@mattwynne
Copy link
Member

I would like us to still merge the changes in this PR as they're an improvement, even if they don't fix all the flickers in this area of the test suite.

@mattwynne mattwynne enabled auto-merge (squash) April 12, 2022 14:48
@mattwynne mattwynne merged commit ade14f7 into main Apr 12, 2022
@mattwynne mattwynne deleted the http-stream-tests-race-condition branch April 12, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 build Related to build / release process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants