Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance/mitigate flaky tests #139

Merged

Conversation

ahmedneilhussain
Copy link
Contributor

This would probably be better entitled 'mitigate noisy tests' - I found there was an awful lot of noise from the teardown code having race conditions with some of the lazily server startup/document connection. I think this was mostly benign but made it very hard to try and see what the problem might be when investigating test failures. Although it turns out benign exceptions that just get logged are not so benign if NoErrorLoggedRule is being used! The logs are a lot quieter now, hopefully, and I've found and tried to fix a few definite cases of flakiness.

The changes can be summarised:

  • Add some hooks to allow the test teardown logic to wait for some of the startup/connect logic to complete. It does make the tests run a bit slower but as a full run is only about 100s on my machine that's pretty respectable for UI tests that require the whole IDE to start up. An alternative might have been to allow the setup logic to detect when the server was already being shut down and not bother continuing, but I think it's safer and better to make test code accommodate production code than vice versa.
  • Added some hooks so that files created on-the-fly are tracked and removed by some common code in TestUtils and AllClearRule so that where possible files are not deleted while document connection might still be pending in a Job queue or worker pool. Changed a lot of existing tests to use this common framework rather than doing it with duplicated logic (minus the synchronisation)
  • Changed the test stream connection provider to only close the outbound streams in both directions, and relying on the EOF to close the corresponding paired stream. This got rid of a lot of the AsynchronousCloseExceptions
  • Changed the tests that look for annotations of a specific colour to use a Euclidian distance and a fudge factor rather than an exact comparison, as an exact match doesn't play well with platform anti-aliasing.

I think b8340c3 is an outright bug fix rather than a testing improvement, because the stop() and disconnect() could both call one another resulting in multiple 'shutdown' jobs being scheduled. I nearly introduced a bug because I forgot that language servers can be restarted once stopped but I hope I've got this right. Would be very grateful if someone else can take a look.

@mickaelistria
Copy link
Contributor

Thanks. I'll try to cherry-pick and apply the most trivial patches first and will then come back to discuss the less simple ones.

@mickaelistria
Copy link
Contributor

Can you please tro to rebase it?

@ahmedneilhussain ahmedneilhussain force-pushed the maintenance/mitigate-flaky-tests branch from ae94504 to 64c3700 Compare June 15, 2022 13:24
@ahmedneilhussain
Copy link
Contributor Author

Can you please tro to rebase it?

HI @mickaelistria I've got it consistent with master now (I think) and all of the changes you cherry-picked excluded.

Make the test teardown a bit cleaner so that some of the console noise is reduced.
Makes it easier to ascertain which tests are causing noise and have clean-up sync issues.
Could possibly have made this an eclipse configuration item (as is currently the case
for lsp4e wire logging) but a system property is easier to control via maven-based CI
…age server

Stop() calls disconnect() on any disconnected documents, but disconnect() also calls stop...
Multiple calls to stop() allowed (server can be restarted), just not multiple *simultaneous* calls to stop
Can't quite shut up testProjectClose() - async LSPFoldingReconciler still running in flight sometimes when files torn down
Not sure what to do about that
@ahmedneilhussain ahmedneilhussain force-pushed the maintenance/mitigate-flaky-tests branch from 64c3700 to 5d289a0 Compare July 11, 2022 11:12
@sebthom
Copy link
Member

sebthom commented Jul 11, 2022

@ahmedneilhussain can you please bump the bundle version of org.eclipse.lsp4e.tests.mock. this solves the build error:

[ERROR] Failed to execute goal org.eclipse.tycho.extras:tycho-p2-extras-plugin:2.7.3:compare-version-with-baselines (compare-version-with-baseline) on project org.eclipse.lsp4e.tests.mock: 
Only qualifier changed for (org.eclipse.lsp4e.tests.mock/0.14.4.202207111108). Expected to have bigger x.y.z than what is available in baseline (0.14.4.202205311633) -> [Help 1]


// Put back so shutdown doesn't time out
MockLanguageServer.INSTANCE.setTimeToProceedQueries(0);
assertTrue(command.isEnabled() && command.isHandled());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please keep the asserts separate:

assertTrue(command.isEnabled());
assertTrue(command.isHandled());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - the email notification of your comment only arrived after I had independently fixed the build errors and @mickaelistria had merged... Please feel free to revert to two separate asserts.

@@ -59,6 +59,8 @@ public class TestUtils {
private TestUtils() {
// this class shouldn't be instantiated
}

private static List<File> tempFiles = new ArrayList<>();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a second static field with the same name tempFiles to the class.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was a merge clash caused by this part of the original PR being cherry picked out and integrated independently a few weeks ago. Fixed anyway!

@@ -53,14 +59,17 @@ protected void finished(Description description) {
}

private void clear() {
PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage().closeAllEditors(false);
// Give the platform three attempts to shut down windows
for (int i = 3; i > 0 && !PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage().closeAllEditors(false); i--) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of doing this in a loop without a sleep or something in between?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fairly long-running call so I didn't feel the need to stick a sleep in between, but I take your point!

@mickaelistria mickaelistria merged commit 8775fa8 into eclipse:master Jul 12, 2022
@mickaelistria
Copy link
Contributor

Thanks!

@mickaelistria
Copy link
Contributor

I've merged it already so the PR doesn't remain as a zombie for a too long time until it becomes harder to merge it. Further PRs to improve it are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants