Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic support for TCP fast open #99490

Merged
merged 24 commits into from May 23, 2024
Merged

basic support for TCP fast open #99490

merged 24 commits into from May 23, 2024

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Mar 10, 2024

There are three somewhat separated parts here:

  1. This adds API approved in TCP Fast Open implementation? #1476. It is just simple enum so we can do all platforms right.
  2. bring platform parity for patten described in Socket connect with payload via SocketAsyncEventArgs fails #79654 (and TCP Fast Open implementation? #1476 as well)
using var client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
var saea = new SocketAsyncEventArgs();
saea.RemoteEndPoint = new IPEndPoint(IPAddress.Loopback, 42);
saea.SetBuffer(new byte[42]); // <== the presence of the buffer is what triggers the failure

Tis currently works on Windows using connectex. With this PR it will light up on macOS using connectx and on Linux using combination of Connect & Send. Test added in #79669 is now portable across platforms.

  1. With the work above, we will attempt TFO if combined. On Windows connectex can already do that so as connectx on macOS. To make it work consistently across platforms I used TCP_FASTOPEN_CONNECT. That is available since Kernel 4.11 and that should cover all supported releases AFAIK. This is still best effort as the actual behavior will change based on /proc setting as well as TCP cookie availability. If either one is missing we will simply fallback to behavior described in 2)

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@wfurt wfurt self-assigned this Mar 12, 2024
@wfurt wfurt requested review from stephentoub and a team March 12, 2024 04:33
@wfurt wfurt added this to the 9.0.0 milestone Mar 12, 2024
@wfurt wfurt marked this pull request as ready for review March 12, 2024 04:34
Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you validated this with packet captures on all 3 platforms?

I don't know how it works on Mac, but note that this does not add support for server-side TFO on Linux therefore it's not a full implementation of #1476. We should either document this limitation or implement (& validate) server-side TFO.

@@ -137,6 +137,7 @@ public enum SocketOptionName
BsdUrgent = 2,
Expedited = 2,
TcpKeepAliveRetryCount = 16,
FastOpen = 15,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the concerns (=fully controlled datapath needed) and the limitations around this flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have no problem with good description. Even the man page for Linux talks about differences based on cookie availability.

@wfurt
Copy link
Member Author

wfurt commented Mar 12, 2024

I don't know how it works on Mac, but note that this does not add support for server-side TFO on Linux

what do you think is missing?

@wfurt
Copy link
Member Author

wfurt commented Mar 12, 2024

Have you validated this with packet captures on all 3 platforms?

I did macOS & Linux since Windows was reported working.

@antonfirsov
Copy link
Member

what do you think is missing?

With server side sockets (listener/accept) TCP_FASTOPEN_CONNECT will not make it set a TFO cookie (and it's only being set on connect). The server side needs TCP_FASTOPEN enabled to operate. (This is Linux, didn't research how it works on Mac.)

@wfurt
Copy link
Member Author

wfurt commented Mar 12, 2024

what do you think is missing?

With server side sockets (listener/accept) TCP_FASTOPEN_CONNECT will not make it set a TFO cookie (and it's only being set on connect). The server side needs TCP_FASTOPEN enabled to operate. (This is Linux, didn't research how it works on Mac.)

yes, This is what the part 1) does e.g. the new SocketOption

case SocketOptionName_SO_TCP_FASTOPEN:
           *optName = TCP_FASTOPEN;
            return true;

@antonfirsov
Copy link
Member

case SocketOptionName_SO_TCP_FASTOPEN:

That logic is only triggered for mapping SocketOptionName.FastOpen to TCP_FASTOPEN in SystemNative_GetSockOpt. Maybe I'm missing something, but I don't see a logic enabling TCP_FASTOPEN for listener sockets after listen.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.FastOpen, 1) calls.

@wfurt
Copy link
Member Author

wfurt commented Mar 12, 2024

That logic is only triggered for mapping SocketOptionName.FastOpen to TCP_FASTOPEN in SystemNative_GetSockOpt. Maybe I'm missing something, but I don't see a logic enabling TCP_FASTOPEN for listener sockets after listen.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.FastOpen, 1) calls.

This part I don't understand. The rest works just like any other option. We map the .NET value to platform value and call setsockopt. No additional code is needed besides the mapping.

@antonfirsov
Copy link
Member

antonfirsov commented Mar 12, 2024

No additional code is needed besides the mapping.

Ok, it should be good (I missed part of the overall option setting logic). We still need the option roundtrip tests though.

Expedited = 2,
TcpKeepAliveRetryCount = 16,
/// <summary>
/// This enables TCP Fast Open as defined in RFC-7413. The actual observed behavior depend on OS configuration and state of kernel TCP cookie cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe that we need to document the risks of TFO in an easily discoverable way, including the fact that it's only recommended for fully controlled datapaths. I think this summary - or if it's valid for enum members - a remarks section would be a good place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What dog you suggest? Remarks probably should be edited directly in doc repo...?
I was thinking about it more and perps we can add something like

This enables TCP Fast Open as defined in RFC-7413. The actual observed behavior depend on OS configuration and state of kernel TCP cookie cache. Enabling TFO can impact interoperability and casue connectivity issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables TCP Fast Open as defined in RFC-7413. The actual observed behavior depend on OS configuration and state of kernel TCP cookie cache. Enabling TFO can impact interoperability and casue connectivity issues.

I would be ok with changing the summary to the text above & opening an issue against https://github.com/dotnet/dotnet-api-docs/ to track adding more detailed information to an appropriate place, SocketOptionName enum remarks maybe.

@wfurt
Copy link
Member Author

wfurt commented May 14, 2024

dotnet/dotnet-api-docs#9910 created to track adding remarks.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the test failure will be addressed.

@@ -943,4 +1088,41 @@ public async Task SendTo_DifferentEP_Success(bool ipv4)
Assert.Equal(sendBuffer.Length, result.ReceivedBytes);
}
}

internal static class ConnectExtensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it might be worth to move this into a separate file.

@antonfirsov
Copy link
Member

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wfurt
Copy link
Member Author

wfurt commented May 23, 2024

all the outer loop tests failures seems unrelated.

@wfurt wfurt merged commit e377f16 into dotnet:main May 23, 2024
159 of 167 checks passed
@wfurt wfurt deleted the tfo branch May 23, 2024 14:58
@antonfirsov
Copy link
Member

@wfurt the failure I mentioned is not in outerloop. It's still there in the latest runtime run, and there were no commits addressing it: https://github.com/dotnet/runtime/pull/99490/checks?check_run_id=25312010880

@wfurt
Copy link
Member Author

wfurt commented May 24, 2024

it was clean on the last run - but I guess that was lock. it should be fixed by #102668 e.g. #102650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants