-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase some of the DnsEndPointTest timeouts #58129
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsLong story short: I recommend to bump the timeout values for now, and see if it helps with the issue. If not, we may invest into another round of investigation. (hopefully) fixes #57929
|
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -272,7 +272,7 @@ public void Socket_ConnectAsyncDnsEndPoint_HostNotFound() | |||
bool willRaiseEvent = sock.ConnectAsync(args); | |||
if (willRaiseEvent) | |||
{ | |||
Assert.True(complete.WaitOne(TestSettings.PassingTestTimeout), "Timed out while waiting for connection"); | |||
Assert.True(complete.WaitOne(TimeSpan.FromSeconds(30)), "Timed out while waiting for connection"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it as TestSettings.PassingTestTimeout * 3
e.g. factor of baseline instead of hardcoded value.
(or add new constant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the 10 sec PassingTestTimeout
to PassingTestShortTimeout
, and define another constant PassingTestLongTimeout = 30 sec
?
But maybe better to do it in a separate PR, to keep the changes in this one minimal? (It's about to be backported.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be actually better to do the rename in the release/*
branches too to keep the code in sync, avoiding conflicts in future PR-s. I'm undecided.
Did you get VM from the Helix gallery @antonfirsov ? |
@wfurt also reacting to #57929 (comment): Note the most problematic tests ( runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/DnsEndPointTest.cs Lines 56 to 69 in d9a1302
I believe this means that the OS level name resolution works fine, and the problem is more likely some sort of threadpool execution timeout, since async name resolution is actually is async-over-sync on Unix.
It isn't but the rest of the socket test suite is using threadpool threads in parallel quite heavily, and we've seen timeouts because of that. The only difference between |
Do we have any stats how long does it take to run the sync test comparing to other platforms? |
I did some microbenchmarks for the failing connect, numbers were something like: Windows (surface laptop): 280 ms, Ubuntu 100ms, SLES: 5ms. So weirdly SLES is actually the fastest (at least on the Helix DTL machine I got). I think if there is a problem it's not with DNS but rather with the threadpool or our async-over-sync resolution code. (Or the way it works with DnsConnectAsync.) |
Would it make sense to move it to the non-parallel bucket? |
I think it would make the error go away with higher chance, but also hide potential issues with the async-over-sync code, plus the usual penalty in test run time on all platforms, so I prefer not to do it in this particular case. |
Reacting to #58129 (comment), I pushed a change that renames existing |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@@ -8,9 +8,10 @@ namespace System.Net.Sockets.Tests | |||
public static class TestSettings | |||
{ | |||
// Timeout values in milliseconds. | |||
public const int PassingTestTimeout = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep this untouched to be consistent with other tests? e.g. PassingTestTimeout
is normal/default timeout. If we need anything different for come case we would make it with specific name - like PassingTestLongTimeout
(that part looks ok to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted this renaming to reduce the diff size.
I think more verbosity in the names wouldn't hurt here. 10 sec is quite a short time in comparison to Http.Tests's default PassingTestTimeout=30sec
. We use hardcoded 30 seconds at a bunch of places in socket tests, which we will eventually replace with PassingTestLongTimeout
, meaning that the "default" PassingTestTimeout=10sec
will become much less common.
Test run reported networking test failures 4x this specific test:
That may be just some networking hiccup I guess ... cc @wfurt Rerunning the failed legs. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test failures:
Rerunning tests. |
Yes, |
Same System.Runtime.Tests unrelated failures on Debian.10 and Debian.11 - tracked by #58616:
|
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1197645068 |
Long story short:
I have spent several hours trying to get a test run on SLES in order to repro #57929, no success so far, and it may take days to make progress, since I'm unfamiliar with SLES.
I recommend to bump the timeout values for now, and see if it helps with the issue. If not, we may invest into another round of investigation.
(hopefully) fixes #57929