Change Recieve methods so that failures return null packets #350

Wraith2 · 2019-12-12T00:35:55Z

When working on the managed networking implementation in corefx a problem with a seemingly random timeout kept occurring and caused problems with tests. Separately in a clean branch I did a lot of work to add packet caching and lifetime tracking, this work found the bug that was causing the initial timeout. The fix was not added to the original PR (which was merged) because the bug was being investigated by someone on the sqlclient ms team, I did leave a description of my findings though. This PR is a follow up to that description fixing the bad behaviour.

When using managed networking the real (not mars wrapper) handles have their receive methods called with ref or out parameters. While tracing individual packet lifetimes if found that some packets were being disposed of twice and after investigation is became clear that it was possible for the packet contents to be processed twice if there happened to be one in some situations. This PR changes the implementation of the Receive and ReceiveAsync so that any path other than success nulls out the ref/out packet parameter so that the caller can no longer reach the released packet.

I can't add a test for this because it's extremely difficult to find the packet double release without extensive changes to the managed implementation. Logically a packet should not be accessed after it has been released so I thought it safe to try this small change as an independent fix because it contributes to reliability in difficult to reproduce situations.

David-Engel

LGTM

cheenamalhotra · 2019-12-12T01:03:57Z

Hi @Wraith2

Apparently one of the pipelines (CI-Ub16-SQL17L-Core30 — #19346.01 failed) did fail with Timeout error when performing Bulk Copy, is there an additional code path that needs a fix?

cheenamalhotra · 2019-12-12T01:28:01Z

Also it seems the original issue was discovered in EF Core tests, I'll try to setup a build that runs EF Core tests in CI to capture potential problems that we could introduce accidentally. They were always manually run till now, but it would be good to run in CI with proposed PRs.

Wraith2 · 2019-12-12T09:00:51Z

I'm not aware of any other changes that were required. bulk copy is still going to use the same Receive methods on the underlying handles. I ran the test suite with forced managed mode to ensure that everything was still working, it was.

Any chance you could just re-run that leg in case it's a real timeout?

cheenamalhotra · 2019-12-12T18:00:38Z

It was a random one, as expected which confirms there are possibilities of getting random timeouts still, yet hard to reproduce.

The change doesn't look very complex to me, any would bring improvements, but just wanted to pen down, it doesn't solve all random timeout problems, or maybe the reason for timeout errors is something else.

Wraith2 · 2019-12-12T20:58:56Z

It was a random one, as expected which confirms there are possibilities of getting random timeouts still, yet hard to reproduce.

The timeout situation that this fixes isn't currently evident (as far as I know) but that's down to timing and delicate ordering of operations. If you start moving around things in the managed implementation it turns up with varying frequency. I made drastic changes to the managed implementation which made it a deterministic error and then spent a couple of weeks tracing individual packet lifetimes to isolate it. The faster you're able to run the managed implementation the more likely it seems to hit the problem, I'd like to be able to make the managed version faster than native if possible.

We're currently in a stable position on a slightly unstable foundation and this change firms up the foundation a little. I'm certain there's still more to be done.

change Recieve methods so that failures retutn null packets

2e9e999

cheenamalhotra requested review from cheenamalhotra, David-Engel, karinazhou and JRahnama December 12, 2019 00:42

David-Engel approved these changes Dec 12, 2019

View reviewed changes

cheenamalhotra added this to In progress in SqlClient v2.0.0 via automation Dec 12, 2019

cheenamalhotra added this to the 2.0.0-preview1 milestone Dec 12, 2019

cheenamalhotra moved this from In progress to Reviewer approved in SqlClient v2.0.0 Dec 12, 2019

cheenamalhotra approved these changes Dec 12, 2019

View reviewed changes

cheenamalhotra merged commit c95b0b3 into dotnet:master Dec 12, 2019

SqlClient v2.0.0 automation moved this from Reviewer approved to Done Dec 12, 2019

Wraith2 deleted the bug-packetreuse branch December 13, 2019 09:20

Wraith2 mentioned this pull request Jan 19, 2020

perf: Add managed packet recycling #389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Recieve methods so that failures return null packets #350

Change Recieve methods so that failures return null packets #350

Wraith2 commented Dec 12, 2019 •

edited

David-Engel left a comment

cheenamalhotra commented Dec 12, 2019

cheenamalhotra commented Dec 12, 2019

Wraith2 commented Dec 12, 2019

cheenamalhotra commented Dec 12, 2019

Wraith2 commented Dec 12, 2019 •

edited

Change Recieve methods so that failures return null packets #350

Change Recieve methods so that failures return null packets #350

Conversation

Wraith2 commented Dec 12, 2019 • edited

David-Engel left a comment

Choose a reason for hiding this comment

cheenamalhotra commented Dec 12, 2019

cheenamalhotra commented Dec 12, 2019

Wraith2 commented Dec 12, 2019

cheenamalhotra commented Dec 12, 2019

Wraith2 commented Dec 12, 2019 • edited

Wraith2 commented Dec 12, 2019 •

edited

Wraith2 commented Dec 12, 2019 •

edited