Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcpReceiveSendGetsCanceledByDispose tests fail on Fedora 38 #91543

Closed
tmds opened this issue Sep 4, 2023 · 15 comments · Fixed by #93198
Closed

TcpReceiveSendGetsCanceledByDispose tests fail on Fedora 38 #91543

tmds opened this issue Sep 4, 2023 · 15 comments · Fixed by #93198
Assignees
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) test-run-core Test failures in .NET Core test runs
Milestone

Comments

@tmds
Copy link
Member

tmds commented Sep 4, 2023

The following tests fail on Fedora 38:

System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)

The exceptions look like:

System.AggregateException: System.AggregateException : One or more errors occurred. (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null))\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)
        at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 102
at System.Net.Sockets.Tests.SendReceive`1.TcpReceiveSendGetsCanceledByDispose(Boolean receiveOrSend, Boolean ipv6Server, Boolean dualModeClient, Boolean owning) in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1032
--- End of stack trace from previous location ---
----- Inner Stack Trace #1 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #2 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #3 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #4 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #5 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #6 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #7 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #8 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94

These tests don't fail on Fedora 37, so the behavior is possibly triggered by a change in kernel behavior.

I will investigate further when I have some time.

cc @omajid

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 4, 2023
@ghost
Copy link

ghost commented Sep 4, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The following tests fail on Fedora 38:

System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: False, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True, owning: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False, owning: True)

The exceptions look like:

System.AggregateException: System.AggregateException : One or more errors occurred. (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)) (Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null))\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)\n---- Assert.Equal() Failure\nExpected: ConnectionReset\nActual:   (null)
        at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 102
at System.Net.Sockets.Tests.SendReceive`1.TcpReceiveSendGetsCanceledByDispose(Boolean receiveOrSend, Boolean ipv6Server, Boolean dualModeClient, Boolean owning) in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1032
--- End of stack trace from previous location ---
----- Inner Stack Trace #1 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #2 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #3 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #4 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #5 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #6 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #7 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94
----- Inner Stack Trace #8 (Xunit.Sdk.EqualException) -----
at System.Net.Sockets.Tests.SendReceive`1.<>c__DisplayClass21_0.<<TcpReceiveSendGetsCanceledByDispose>b__0>d.MoveNext() in /home/tester/runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs:line 1120
--- End of stack trace from previous location ---
at System.RetryHelper.ExecuteAsync(Func`1 test, Int32 maxAttempts, Func`2 backoffFunc, Predicate`1 retryWhen, String testName) in /home/tester/runtime/src/libraries/Common/tests/TestUtilities/System/RetryHelper.cs:line 94

These tests don't fail on Fedora 37, so the behavior is possibly triggered by a change in kernel behavior.

I will investigate further when I have some time.

cc @omajid

Author: tmds
Assignees: -
Labels:

area-System.Net.Sockets

Milestone: -

@tmds tmds removed the untriaged New issue has not been triaged by the area owner label Sep 4, 2023
@karelz
Copy link
Member

karelz commented Sep 8, 2023

@tmds please do not remove untriaged label. We do it only when we set milestone. Otherwise issues might slip through :(

@karelz karelz added the untriaged New issue has not been triaged by the area owner label Sep 8, 2023
@tmds
Copy link
Member Author

tmds commented Sep 8, 2023

@tmds please do not remove untriaged label. We do it only when we set milestone. Otherwise issues might slip through :(

Ok, I haven't had this feedback before. I'll stop removing the label.

@karelz karelz added this to the Future milestone Sep 14, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 14, 2023
@karelz karelz added os-linux Linux OS (any supported distro) test-run-core Test failures in .NET Core test runs labels Sep 14, 2023
@directhex
Copy link
Member

@tmds this is failing for you on x64 with Core, right?

These exact failures are plaguing our partners at IBM, for their configurations (Mono on PPC64 little endian, Mono on s390x). At the very least that seems to imply it's not a runtime-specific issue.

e.g. https://dev.azure.com/dnceng-public/public/_build/results?buildId=421005&view=results

@tmds
Copy link
Member Author

tmds commented Sep 28, 2023

These exact failures are plaguing our partners at IBM, for their configurations (Mono on PPC64 little endian, Mono on s390x). At the very least that seems to imply it's not a runtime-specific issue.

Right. As mentioned in the initial comment, I think it's in the kernel.

Note that these tests don't fail on our CI when they run on ppc64le with RHEL 8.
And, RHEL8 has a different kernel than what CI is using here.

@directhex
Copy link
Member

The IBM failures are on Ubuntu 20.04, with a 5.4 kernel. Fedora 38 is 6.2.9. That seems too wide a range (especially since Fedora 37 was OK, with kernel 6.0.7)

@tmds
Copy link
Member Author

tmds commented Sep 28, 2023

It could be different kernel bugs ...

Has this test ever passed on the public CI setup? If so, do you know when it started to fail?

@directhex
Copy link
Member

directhex commented Sep 28, 2023

We only have the resources to run the main branch community jobs twice a day, it looks like the problem appeared between 9c4b135..7f191ad - about a 20 commit range

@directhex
Copy link
Member

@tmds
Copy link
Member Author

tmds commented Sep 28, 2023

We only have the resources to run the main branch community jobs twice a day, it looks like the problem appeared between https://github.com/dotnet/runtime/commit/9c4b135ae2b1ffb5adfae8b76486cddc92995ec5..https://github.com/dotnet/runtime/commit/7f191adb868d8665ceada6f5ac5ad9150884d490 - about a 20 commit range

Nothing in the commit range stands out.
Did the VM stay exactly the same over this period?

I will investigate further when I upgrade to Fedora 38, which will next month or so.

@directhex
Copy link
Member

Interesting question. According to the dpkg logs, OS updates happened on August 23rd, and September 13th - there's nothing on the OS/VM side which changed in the problem range.

@alhad-deshpande
Copy link
Contributor

I tried on Ubuntu 20.04 with kernel version 5.4.0-162-generic and it worked there.

Below is the console log:
root@dotnet1:/dotnet-ppc64le/runtime/artifacts/bin/System.Net.Sockets.Tests/Release/net7.0-unix# ../../../testhost/net7.0-Linux-Release-ppc64le/dotnet ../../../testhost/net7.0-Linux-Release-ppc64le/dotnet exec --runtimeconfig System.Net.Sockets.Tests.runtimeconfig.json --depsfile System.Net.Sockets.Tests.deps.json xunit.console.dll System.Net.Sockets.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing^C
root@dotnet1:
/dotnet-ppc64le/runtime/artifacts/bin/System.Net.Sockets.Tests/Release/net7.0-unix# ../../../testhost/net7.0-Linux-Release-ppc64le/dotnet exec --runtimeconfig System.Net.Sockets.Tests.runtimeconfig.json --depsfile System.Net.Sockets.Tests.deps.json xunit.console.dll System.Net.Sockets.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing
Discovering: System.Net.Sockets.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.Net.Sockets.Tests (found 1258 of 1713 test cases)
Starting: System.Net.Sockets.Tests (parallel test collections = on, max threads = 8)
System.Net.Sockets.Tests.DualModeConnectToHostString.DualModeConnect_LoopbackDnsToHost_Helper [SKIP]
Condition(s) not met: "LocalhostIsBothIPv4AndIPv6"
System.Net.Sockets.Tests.DualModeConnectToDnsEndPoint.DualModeConnect_DnsEndPointToHost_Helper [SKIP]
Condition(s) not met: "LocalhostIsBothIPv4AndIPv6"
System.Net.Sockets.Tests.DualModeConnectAsync.DualModeConnectAsync_DnsEndPointToHost_Helper [SKIP]
Condition(s) not met: "LocalhostIsBothIPv4AndIPv6"
System.Net.Sockets.Tests.UdpClientTest.Finalize_NoExceptionsThrown [SKIP]
Condition(s) not met: "IsPreciseGcSupported"
System.Net.Sockets.Tests.CreateSocket.Ctor_Raw_NotSupported_ExpectedError [SKIP]
Condition(s) not met: "NotSupportsRawSockets"

@tmds
Copy link
Member Author

tmds commented Oct 5, 2023

Seems I don't need to upgrade to Fedora 38 to investigate, as the issue is now also reproducing on my Fedora 37 system.

These are my findings.

unexpectedly gets EBUSY when it calls connect here, which causes it to call shutdown as a fallback:

err = connect(fd, &addr, sizeof(addr));
if (err != 0)
{
// On some older kernels connect(AF_UNSPEC) may fail. Fall back to shutdown in these cases:
err = shutdown(fd, SHUT_RDWR);
}

This is covered by the mayShutdownGraceful case of the test:

try
{
Assert.Equal(SocketError.ConnectionReset, peerSocketError);
}
catch when (mayShutdownGraceful)
{
Assert.Null(peerSocketError);
}

This reproduces on my Fedora 37 system with the 6.4.15-100.fc37.x86_64 kernel, but it does not reproduce on our Fedora 37 CI system which has an older 6.4.15-100.fc37.x86_64 kernel.

I will investigate further with a kernel engineer what is causing the change in behavior from the kernel side.

To unblock the CI, you can add the failing cases to this:

// RHEL7 kernel has a bug preventing close(AF_UNKNOWN) to succeed with IPv6 sockets.
// In this case Dispose will trigger a graceful shutdown, which means that receive will succeed on socket2.
// This bug is fixed in kernel 3.10.0-1160.25+.
// TODO: Remove this, once CI machines are updated to a newer kernel.
bool mayShutdownGraceful = UsesSync && PlatformDetection.IsRedHatFamily7 && receiveOrSend && (ipv6Server || dualModeClient);

@tmds
Copy link
Member Author

tmds commented Oct 5, 2023

unexpectedly gets EBUSY when it calls connect here

This is caused by this change: torvalds/linux@4faeee0.

It explicitly prevents connect to work when operations are on going.
While we're using it to abort things which are on going...

We prefer connect(AF_UNSPEC) over shutdown(SHUT_RDWR) because the observable effects were the same as on Windows.

We'll see the test fail on newer kernels, and kernels that back-port this change.

We should set mayShutdownGraceful to true on any Linux.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 9, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 13, 2023
@karelz
Copy link
Member

karelz commented Jun 24, 2024

Fixed in main (9.0) in PR #93198 and in 8.0.x in PR #93502 and in 7.0.x in PR #93505 and in 6.0.x in PR #93554.

@karelz karelz modified the milestones: 9.0.0, 6.0.x Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) test-run-core Test failures in .NET Core test runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants