-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable flaky DNS cancellation tests during jitstress #70089
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsRelated to #69993. Original fix (#70044), while removing the flakyness, made the test effectively useless (see #70044 (comment)). Based on the discussion we decided that it is better to disable the test during jitstress. This PR also reverts #70009, as the implementation of async name resolution on *nix platforms was reverted.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience cleaning this up!
This test does not only fail in jitstress runs. As I have tried to point out in the original issue, jitstress just runs a lot of configurations and therefore runs tests a lot of times, so if they are flaky it is often exposed under jitstress. |
Agreed with @jakobbotsch. CC @karelz.
|
Those failures are on linux, I mistakenly thought that async name resolution has been implemented on Linux in #34633, so I enabled the test on UNIX platforms (it was disabled before). I did not notice that the PR was later reverted (the issue mentioned in the ActiveIssue attribute was still marked as closed). This PR again disables the tests on UNIX so those failures will not repeat. There are no recent failures of this test on Windows. |
The test failure seems unrelated. |
Got it, but this still just seems like a short-term fix. Any future change, test mode, or even just regular background processes can affect timings in the test execution and has the potential of hitting the same problem. For example, if we wanted to run libraries tests under GCStress (which has been discussed before) that would also affect timings significantly. With that said, I am more assured now given that the Linux failures were something else. |
@jakobbotsch unfortunately there are many networking features which can not be tested at all in a manner that is not dependent on timing. We can choose to not test them at all, or to respect the fact that they need better isolation in the test infra. I'm very much for the latter, since I believe we should not leave features untested, even if it comes with a price.
Then we will have to exclude timing-dependent networking tests from A long term solution could be to create some sort of marker attribute so these tests can be auto-excluded from runs where isolation is not possible. |
This does not sound like the only possibilities to me. Other possibilities could be test stubs or a test DNS server that the test controls where timings can be deterministically controlled (I realize this is a lot of work). |
we should control the Helix environment. Part of the problem is that some networking test are time dependent e.g. they test that something does happen in given time frame. While there is always potential for improvements it is not trivial too make the 100% reliable. It is also possible that the failures in jitsress are real bugs instead of just timing differences. But the investigation is not easy and there are more important tasks for the release. |
This can solve the DNS issues but not other cases where networking code may become racy.
This is definitely a lot of work for DNS or Sockets, not convinced it's worth it. Having a mindful approach to Helix way seems easier to me. |
Related to #69993.
Original fix (#70044), while removing the flakyness, made the test effectively useless (see #70044 (comment)). Based on the discussion we decided that it is better to disable the test during jitstress.
This PR also reverts #70009, as the implementation of async name resolution on *nix platforms was reverted in #48666