Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Update to msquic 2 #67383

Merged
merged 7 commits into from
Apr 1, 2022
Merged

[QUIC] Update to msquic 2 #67383

merged 7 commits into from
Apr 1, 2022

Conversation

ManickaP
Copy link
Member

@ManickaP ManickaP commented Mar 31, 2022

Fixes #64554, #53090

Please review, but don't merge it yet, I need to update our testing docker images first.

@ManickaP ManickaP added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 31, 2022
@ghost ghost assigned ManickaP Mar 31, 2022
@ghost
Copy link

ghost commented Mar 31, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #64554

Please review, but don't merge it yet, I need to update our testing docker images first.

Author: ManickaP
Assignees: -
Labels:

NO-MERGE, area-System.Net.Quic

Milestone: -

@@ -694,7 +694,6 @@ internal override ValueTask ConnectAsync(CancellationToken cancellationToken = d
}
catch
{
_state.StateGCHandle.Free();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still get SHUTDOWN_COMPLETE if the connection start fails. If we free the context object here, the callback gets random memory and blows up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please leave this as a comment in the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense now since we only free the GC handle in failed ctors and in SHUTDOWN_COMPLETE, which makes sense.
Before this change, the call to free in failed connect was rather out-of-place.

@@ -386,6 +386,7 @@ public async Task ConnectWithClientCertificate(bool sendCerttificate)
}

[Fact]
[ActiveIssue("https://github.com/dotnet/runtime/issues/67302")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to wait for stream start callback event in all cases.

@ManickaP
Copy link
Member Author

ManickaP commented Mar 31, 2022

@wfurt looking at this, we should already be consuming latest and greatest msquic:
https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/dcc36f08baa367d98f9c6cfcdde28fc0d5f6c6d9/src/fedora/34/helix/amd64/Dockerfile#L25-L34

How come it didn't broke the tests? Or is it because we need to update the image tag in our test pipeline and we haven't done it yet?

EDIT: I'm looking at our registry here https://mcr.microsoft.com/v2/dotnet-buildtools/prereqs/tags/list and the newest fedora-34 image (fedora-34-helix-20211214164220-4f64125) seems to contain only msquic 1.9, so I guess we need to somehow force build of this image.

EDIT2: filed an issue: dotnet/dotnet-buildtools-prereqs-docker#593

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just few questions

@@ -176,7 +178,7 @@ internal MsQuicStream(MsQuicConnection.State connectionState, QUIC_STREAM_OPEN_F
QuicExceptionHelpers.ThrowIfFailed(status, "Failed to open stream to peer.");

Debug.Assert(!Monitor.IsEntered(_state), "!Monitor.IsEntered(_state)");
status = MsQuicApi.Api.StreamStartDelegate(_state.Handle, QUIC_STREAM_START_FLAGS.FAIL_BLOCKED);
status = MsQuicApi.Api.StreamStartDelegate(_state.Handle, QUIC_STREAM_START_FLAGS.FAIL_BLOCKED | QUIC_STREAM_START_FLAGS.SHUTDOWN_ON_FAIL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the stream start is asynchronous now, do we need to do anything to prevent stream operations before we get the StartComplete callback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most possibly, but I believe it will be part of #67302 (I also mentioned it in additional thoughts there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msquic can handle stream operation before the stream is started, so we don't have to prevent anything. But tests and behavior depending on max stream limit is broken now. #67302 should be of very high priority now thanks to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior depending on max stream limit is broken now

only throwing on reaching the limit is broken, right? the wait loop implemented in H3 should be fine? @ManickaP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it might be worth having a conversation about the FAIL_BLOCKED usage here. MsQuic allows you to start a stream and send on it before starting a connection (practically a requirement for 0-RTT scenarios), and without this flag it allows you to queue safely. But with this flag, it would just fail because you are blocked on starting the connection. It might make sense to use this in your HTTP/3 scenarios, but not universally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do plan to rework stream starting in #67302. We have more issues around this than just 0-RTT 😄

@@ -40,6 +40,8 @@ private sealed class State
public MsQuicConnection.State ConnectionState = null!; // set in ctor.
public string TraceId = null!; // set in ctor.

public uint StartStatus = MsQuicStatusCodes.Success;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that it will be "success" even before the event arrives?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we check it only in shutdown complete to unblock pending operations with proper exception. There's no point in distinguishing between start not called yet and called and succeeded.
I also assume this will get reworked with #67302.

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

/azp list

@azure-pipelines
Copy link

@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -69,10 +69,10 @@ internal enum QUIC_STREAM_OPEN_FLAGS : uint
internal enum QUIC_STREAM_START_FLAGS : uint
{
NONE = 0x0000,
FAIL_BLOCKED = 0x0001, // Only opens the stream if flow control allows.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into using MsQuic's generated interop layer instead of doing all this manually in .NET. Several of these enums are out of date, and don't include all the latest values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at first we've planned to do both of the changes together, but later decided to split to a separate issue #67377

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Comment on lines +162 to +163
// TODO: solve listener stopping in better way now that it receives STOP_COMPLETED event.
StopAsync().GetAwaiter().GetResult();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added this async stop functionality at the request of .NET. 😄 We were told it would make things easier for you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have it in my TODO list and plan to address. This is just a hack to get it working and be able to move forward, I'm aware it's ugly as heck 😃

@ManickaP ManickaP removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 1, 2022
@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

Outerloop failure are badssl.com is down: #63224
H/3 stress didn't run due to docker issues me and @rzikm observed locally, we should investigate more: #67442

@@ -253,17 +253,6 @@ public async Task ShutdownAsync()
// So nothing to do.
return;
}
catch (OperationCanceledException)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should bring this back? #58078 is not fixed yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bring it back, but interestingly I haven't seen it in any of my local runs nor pipelines...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a race that is not so easy to catch I guess 😅 but it definitely exists

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ManickaP
Copy link
Member Author

ManickaP commented Apr 1, 2022

Failing tests seem like fallout from #67037, I'll revert the change around EnableReceive and we'll see if it helps.
My theory is that at that time of #67037 we already had updated msquic in the transport package for Windows so the CI didn't run the tests there (IsSupported would fail due to version incompatibility) and this is actually the first time they are run.

If my theory proves true, @rzikm you might want to revisit this.

BTW, it seems to work on Linux, in CI and locally.

Theory was wrong, change reverted in #67481, ReceiveComplete became void. For some reason, the last runtime pipeline run on this PR didn't fail for me.

@ManickaP ManickaP merged commit d5b1a7f into dotnet:main Apr 1, 2022
@ManickaP ManickaP deleted the mapichov/msquic_2 branch April 1, 2022 21:26
@karelz karelz added this to the 7.0.0 milestone Apr 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QUIC] Update to msquic 2.0
5 participants