Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/3 & QUIC: fix abort read on dispose and cancellation #55724

Merged
merged 12 commits into from
Jul 20, 2021

Conversation

CarnaViire
Copy link
Member

@CarnaViire CarnaViire commented Jul 15, 2021

This is a new take on #48624 instead of my previous PR #54334 because further investigation shown this has opened a can of worms.

Fixed behaviors:

HTTP/3:

  1. Dispose on Http3RequestStream doesn't clear the link to QuicStream anymore, so no NRE anymore; a race between Dispose and CancellationToken may still end up in ObjectDisposedException -- I hope that is expected? cc @JamesNK

  2. Don't call Abort on QuicOperationAbortedException - it means stream was already aborted by user

  3. HandleReadResponseContentException should abort Read side of the stream, not Write (was possibly a copy-paste bug)

QUIC:

  1. Dispose should abort read on the wire not only in non-final states, but also in Aborted state that was entered in CancellationToken callback (because we only change state there). There is an issue Cancellation of QuicStream.ReadAsync/WriteAsync should either abort the stream or allow subsequent reads/writes #55485 about aborting on the wire right away in CT callback, but I decided not to do it that way yet, because there is no clear way how to pass correct HTTP/3 error code there.

  2. Dispose should Complete an outstanding Read operation with an appropriate exception

  3. Also a copy-paste bug (or naming confusion) for extracting error code from event HandleEventPeerRecvAborted

Other changes:

  1. Added tests on aborting response content stream by CancellationToken, Dispose, and CancellationToken&Dispose at the same time

  2. GetAsync_LargeHeader_Success was failing quite a few times while I was investigating, so I've put it under respective ActiveIssue [HTTP/3] Test bug (flaky) - premature connection close - need some synchro on app level #55508

NOTES:

I've spent a lot of time digging why server didn't get StreamAbortedException on Write even after client clearly called AbortRead and it was synchronized by semaphores. I found out that it is possible that PEER_RECEIVE_ABORTED event will arrive with a significant delay after peer calls AbortRead. In that case even with synchronization via semaphores, first writes after peer aborting may "succeed" (get SEND_COMPLETE event). I ended up asserting that PEER_RECEIVE_ABORTED would arrive eventually.


Fixes #48624

@ghost
Copy link

ghost commented Jul 15, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a new take on #48624 instead of my previous PR #54334 because further investigation shown this has opened a can of worms.

Fixed behaviors:

HTTP/3:

  1. Dispose on Http3RequestStream doesn't clear the link to QuicStream anymore, so no NRE anymore; a race between Dispose and CancellationToken may still end up in ObjectDisposedException -- I hope that is expected? cc @JamesNK

  2. Don't call Abort on QuicOperationAbortedException - it means stream was already aborted by user

  3. HandleReadResponseContentException should abort Read side of the stream, not Write (was possibly a copy-paste bug)

QUIC:

  1. Dispose should abort read on the wire not only in non-final states, but also in Aborted state that was entered in CancellationToken callback (because we only change state there). There is an issue Cancellation of QuicStream.ReadAsync/WriteAsync should either abort the stream or allow subsequent reads/writes #55485 about aborting on the wire right away in CT callback, but I decided not to do it that way yet, because there is no clear way how to pass correct HTTP/3 error code there.

  2. Dispose should Complete an outstanding Read operation with an appropriate exception

  3. Also a copy-paste bug (or naming confusion) for extracting error code from event HandleEventPeerRecvAborted

Other changes:

  1. Added tests on aborting response content stream by CancellationToken, Dispose, and CancellationToken&Dispose at the same time

  2. GetAsync_LargeHeader_Success was failing quite a few times while I was investigating, so I've put it under respective ActiveIssue [HTTP/3] Test bug (flaky) - premature connection close - need some synchro on app level #55508

NOTES:

I've spent a lot of time digging why server didn't get StreamAbortedException on Write even after client clearly called AbortRead and it was synchronized by semaphores. I found out that it is possible that PEER_RECEIVE_ABORTED event will arrive with a significant delay after peer calls AbortRead. In that case even with synchronization via semaphores, first writes after peer aborting may "succeed" (get SEND_COMPLETE event). I ended up asserting that PEER_RECEIVE_ABORTED would arrive eventually.

Author: CarnaViire
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@ghost
Copy link

ghost commented Jul 15, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a new take on #48624 instead of my previous PR #54334 because further investigation shown this has opened a can of worms.

Fixed behaviors:

HTTP/3:

  1. Dispose on Http3RequestStream doesn't clear the link to QuicStream anymore, so no NRE anymore; a race between Dispose and CancellationToken may still end up in ObjectDisposedException -- I hope that is expected? cc @JamesNK

  2. Don't call Abort on QuicOperationAbortedException - it means stream was already aborted by user

  3. HandleReadResponseContentException should abort Read side of the stream, not Write (was possibly a copy-paste bug)

QUIC:

  1. Dispose should abort read on the wire not only in non-final states, but also in Aborted state that was entered in CancellationToken callback (because we only change state there). There is an issue Cancellation of QuicStream.ReadAsync/WriteAsync should either abort the stream or allow subsequent reads/writes #55485 about aborting on the wire right away in CT callback, but I decided not to do it that way yet, because there is no clear way how to pass correct HTTP/3 error code there.

  2. Dispose should Complete an outstanding Read operation with an appropriate exception

  3. Also a copy-paste bug (or naming confusion) for extracting error code from event HandleEventPeerRecvAborted

Other changes:

  1. Added tests on aborting response content stream by CancellationToken, Dispose, and CancellationToken&Dispose at the same time

  2. GetAsync_LargeHeader_Success was failing quite a few times while I was investigating, so I've put it under respective ActiveIssue [HTTP/3] Test bug (flaky) - premature connection close - need some synchro on app level #55508

NOTES:

I've spent a lot of time digging why server didn't get StreamAbortedException on Write even after client clearly called AbortRead and it was synchronized by semaphores. I found out that it is possible that PEER_RECEIVE_ABORTED event will arrive with a significant delay after peer calls AbortRead. In that case even with synchronization via semaphores, first writes after peer aborting may "succeed" (get SEND_COMPLETE event). I ended up asserting that PEER_RECEIVE_ABORTED would arrive eventually.

Author: CarnaViire
Assignees: -
Labels:

area-System.Net.Http, area-System.Net.Quic

Milestone: -

@CarnaViire
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits, looks good, thanks.
BTW, should this close #48624 and should we close #54334 or is there anything else to salvage?

@CarnaViire
Copy link
Member Author

CarnaViire commented Jul 15, 2021

I observed a failure in ReadAbortedWithoutReading_WriteThrows which looks like this:

Assert.Equal() Failure
Expected: 1234
Actual:   4294967295

The test called AbortRead(1234) and then the stream got disposed, and that triggered AbortRead(0xffffffff) (0xffffffff=4294967295) again per my change.
It seems like sometimes msquic does not treat a second call to AbortRead as a no-op and effectively overwrites the error code. Is that expected? @nibanks

Meanwhile, I guess I will have to add either a new state or a new flag to distinguish cancellation and user abort 🙁

@nibanks
Copy link

nibanks commented Jul 15, 2021

Please open an issue on MsQuic and we'll fix.

@CarnaViire
Copy link
Member Author

@nibanks thanks!
@wfurt FYI - if you were going to grab msquic updates to create new package, could you please also wait for microsoft/msquic#1822 so we won't need to do this twice?

@CarnaViire
Copy link
Member Author

Issue for updating msquic package: #55746

@CarnaViire
Copy link
Member Author

I see a timeout in an added test in CI.

System.Net.Http.Functional.Tests.SocketsHttpHandlerTest_Http3_MsQuic.ResponseCancellation_BothCancellationTokenAndDispose_Success [FAIL]
   System.TimeoutException : The operation has timed out.
   Stack Trace:
     /_/src/libraries/Common/tests/System/Threading/Tasks/TaskTimeoutExtensions.cs(49,0): at System.Threading.Tasks.TaskTimeoutExtensions.WhenAllOrAnyFailed(Task[] tasks, Int32 millisecondsTimeout)
     /_/src/libraries/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Http3.cs(613,0): at System.Net.Http.Functional.Tests.HttpClientHandlerTest_Http3.ResponseCancellation_BothCancellationTokenAndDispose_Success()
     --- End of stack trace from previous location ---

I will be looking into that.

@CarnaViire CarnaViire marked this pull request as draft July 15, 2021 18:24
@CarnaViire
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@CarnaViire
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@CarnaViire
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@CarnaViire CarnaViire marked this pull request as ready for review July 20, 2021 21:01
@CarnaViire
Copy link
Member Author

After increasing the timeouts in ResponseCancellation_BothCancellationTokenAndDispose_Success, I haven't seen it timeouting anymore, so I guess initial timeouts were just too small. If it will happen again, I'll create a separate issue.

As after the approval there were no changes except for NITs and aforementioned timeouts, and CI is green - merging.

@CarnaViire CarnaViire merged commit 978b0db into dotnet:main Jul 20, 2021
@CarnaViire CarnaViire deleted the quic-fix-abort-on-cancellation branch July 20, 2021 21:09
@karelz karelz added this to the 6.0.0 milestone Jul 22, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[HTTP/3] NullReferenceException on cancellation
4 participants