Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcpReceiveSendGetsCanceledByDispose: update test for change in Linux kernel. #93198

Merged
merged 1 commit into from
Oct 13, 2023

Conversation

tmds
Copy link
Member

@tmds tmds commented Oct 9, 2023

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 9, 2023
@ghost
Copy link

ghost commented Oct 9, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #91543.

@antonfirsov @directhex ptal.

Author: tmds
Assignees: -
Labels:

area-System.Net.Sockets, community-contribution

Milestone: -

@rzikm
Copy link
Member

rzikm commented Oct 9, 2023

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member

rzikm commented Oct 9, 2023

oops, nevermind, wrong PR :D

@tmds
Copy link
Member Author

tmds commented Oct 9, 2023

May be also useful here, as the ppc64le/s390x jobs this was observed on are very likely not part of the regular run.

@rzikm
Copy link
Member

rzikm commented Oct 9, 2023

/azp list

@azure-pipelines
Copy link

CI/CD Pipelines for this repository:

@rzikm
Copy link
Member

rzikm commented Oct 9, 2023

/azp run runtime-community

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +1034 to +1036
// .NET uses connect(AF_UNSPEC) to abort on-going operations on Linux.
// Linux 6.4+ introduced a change (4faeee0cf8a5d88d63cdbc3bab124fb0e6aed08c) which disallows
// this operation while operations are on-going.
Copy link
Member

@antonfirsov antonfirsov Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that distros will eventually update to 6.4+, does this mean that we should stop using connect(AF_UNSPEC)? Is there any alternative?

Copy link
Member Author

@tmds tmds Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the time being we should keep connect(AF_UNSPEC) as there are many kernels still that provide this, and the resulting behavior matches closer to what Windows does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, though, this means our Linux implementation will not behave the way we want it to, right? Is there something better we can / should be doing instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing the best we can.

I've been investigating the issue with a kernel dev. He wrote this patch which fixes the regression.
We'll see if it gets included. 🤞

@antonfirsov
Copy link
Member

antonfirsov commented Oct 12, 2023

So far it looks like this fixes the recent spike in TcpReceiveSendGetsCanceledByDispose failures in CI, so this should be merged as soon as the run on the PR completes.

@dotnet/dnceng were there any recent changes in the linux queues where the failures started to happen? (I suspect a kernel update.)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=436660&view=ms.vss-test-web.build-test-results-tab

@directhex
Copy link
Contributor

directhex commented Oct 13, 2023

@dotnet/dnceng were there any recent changes in the linux queues where the failures started to happen? (I suspect a kernel update.)

Ubuntu kernel 5.15.0-81.90 included a backport of the breaking change in early august - but we were probably hit by a more recent kernel package which was uploaded to the security repo rather than the updates repo - 5.15.0-82.91 in early September, 5.15.0-83.92 in mid-September, or 5.15.0-86.96 in early October.

@antonfirsov
Copy link
Member

The CI failures are unrelated, TcpReceiveSendGetsCanceledByDispose is passing everywhere.

@antonfirsov
Copy link
Member

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6514772847

@antonfirsov
Copy link
Member

/backport to release/7.0

@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/6514775056

@antonfirsov
Copy link
Member

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/6514776759

@github-actions
Copy link
Contributor

@antonfirsov backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: TcpReceiveSendGetsCanceledByDispose: update test for change in Linux kernel.
Using index info to reconstruct a base tree...
M	src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs
CONFLICT (content): Merge conflict in src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 TcpReceiveSendGetsCanceledByDispose: update test for change in Linux kernel.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

@antonfirsov an error occurred while backporting to release/6.0, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@antonfirsov
Copy link
Member

/backport to release/7.0-staging

@github-actions
Copy link
Contributor

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/6514878161

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TcpReceiveSendGetsCanceledByDispose tests fail on Fedora 38
7 participants