Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlClient ManualTest MARSSyncTimeoutTest fails in managed mode #108

Closed
Wraith2 opened this issue Dec 8, 2018 · 10 comments · Fixed by dotnet/corefx#37068, dotnet/corefx#38266 or #249
Closed
Assignees

Comments

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 8, 2018

When using the managed sni implementation in SqlClient the MARS related test MARSSyncTimeoutTest fails with an assertion in debug mode. This may not cause a problem in release builds because there are sufficient checks in the method body to prevent errors occurring or bubbling up to the user. I suspect that the assertion is incorrect but it needs scrutiny from someone more familiar with the library.

To reproduce on windows. Checkout master, change UseManagedSNI to true. Build and run manual tests, or run the specific named test If you know the commandline incantation required.

  Assertion Failed
  AsyncResult null on callback

     at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](IntPtr key, T packet, UInt32 error) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\TdsParserStateObject.cs:line 2736
     at System.Data.SqlClient.TdsParserStateObject.ReadAsyncCallback[T](T packet, UInt32 error) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\TdsParserStateObject.cs:line 2688
     at System.Data.SqlClient.SNI.SNIMarsHandle.HandleReceiveError(SNIPacket packet) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsHandle.cs:line 321
     at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveError(SNIPacket packet) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsConnection.cs:line 136
     at System.Data.SqlClient.SNI.SNIMarsConnection.HandleReceiveComplete(SNIPacket packet, UInt32 sniErrorCode) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIMarsConnection.cs:line 165
     at System.Data.SqlClient.SNI.SNIPacket.<>c__DisplayClass32_0.<ReadFromStreamAsync>b__0(Task`1 t) in E:\Programming\csharp7\corefx\src\System.Data.SqlClient\src\System\Data\SqlClient\SNI\SNIPacket.cs:line 276
     at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) in E:\A\_work\104\s\src\System.Private.CoreLib\shared\System\Threading\ExecutionContext.cs:line 321
     at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) in E:\A\_work\104\s\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs:line 2448
     at System.Threading.ThreadPoolWorkQueue.Dispatch() in E:\A\_work\104\s\src\System.Private.CoreLib\src\System\Threading\ThreadPool.cs:line 621
    Finished:    System.Data.SqlClient.ManualTesting.Tests

It is happens when the socket for the connection is closed by the timeout in the test, in this case there is no data array in the SNIPacket and that means that the assertion on managed CheckPacket returns false indicating the packet is invalid by asking the packet if it's invalid, which checks the data. The native CheckPacket doesn't check the contents of the packet, only that it is a pointer to a valid packet structure which is a slightly different check to the managed version, the managed version requires that the packet be non-null and that it contain a valid array which is the cause of the problem.

/cc @afsanehr @saurabh500 @keeratsingh

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 9, 2018

To be clear, I'm happy to fix this and run it through the required testing then PR if we can agree what the fix should be.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 10, 2018

This looked simple but quickly went down a rabbit hole filled with angry badgers. If the assert is removed the call to DecrementPendingCallbacks will assert because the callback count reaches -2, it's allowed to go to -1 but no further. Changing that assertion to allow any negative number causes a later assert in ProcessSNIError because the state object SNIContext is undefined.
The worrying part is that all of these are asserts. In a release build it's all just going to keep working and I can't see quite what effect that might have. If the pending callback count goes negative the object gchandle should be disposed I think in which case it's safe, otherwise I've no idea.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 12, 2018

It looks like any timeout in managed mode will fail at this assert. It isn't limited to mars. Once the timeout occurs the socket closing causes the stack trace seen above which causes the failure mode. I think this needs to be handled before it bubbles up to the HandleReceiveError layer because at that point the error code is lost and it becomes just an empty packet, the possible association with the closing of the socket is lost.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Dec 25, 2018

The manual tests affected by this are:

ConnectionPoolTest.ConnectionPool_NonMars
ConnectionPoolTest.ConnectionPool_Mars
DataStreamTest.RunAllTestsForSingleServer_TCP.TimeoutDuringReadAsyncWithClosedReaderTest
DataStreamTest.RunAllTestsForSingleServer_NP.TimeoutDuringReadAsyncWithClosedReaderTest
SqlCredentialTest.CreateSqlConnectionWithCredential
SqlCredentialTest.SqlConnectionChangePasswordPlaintext
SqlCredentialTest.SqlConnectionChangePasswordSecureString
PoolBlockPeriodTest.TestAzureBlockingPeriod
PoolBlockPeriodTest.TestNonAzureBlockingPeriod
PoolBlockPeriodTest.TestSetPolicyWithVariations
MARSTest.MARSAsyncTimeoutTest
MARSTest.MARSSyncTimeoutTest

@AfsanehR-zz
Copy link

@Wraith2 we are not sure what the potential fix for this issue is right now. If you have time to investigate, please feel free to do so and let us know what any potential options are, otherwise we will come back to this issue in the future.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 27, 2019

I think the best thing might be to remove the check which is asserting. If you run the tests in release mode everything behaves as expected and the logic for the assertion is of dubious quality in my opinion,
I can see why it was added when the state object was split into managed and native but I can't see any purpose for the more stringent behaviour in the managed case. My only reservation is that I might be seeing something which has a historic purpose that I can't see.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 22, 2019

I've been looking at this again and I've changed my thinking.
The problem is essentially that in the case of a connection closing unexpectedly the code attempts to process the empty packet and hits an assertion that the packet is not empty. I was wrong to think that removing that assertion would be the best idea, it's telling us there's a real problem.

I think the problem is that in the SNIMarsHandle when it receives an error it attempts to process the packet indiscriminately by passing it to the callback. In the case where there is no packet contents this is the wrong thing to do.

I currently think that the right thing to do is to add in a uint error parameter to SNIMarsConnection.HandleReceiveError and SNIMarsConnection.HandleReceiveError and pass the error code through from the SNIPacket.ReceiveFromStreamAsync function. This function can then be augmented to return SNI_WSAECONNRESET (which is defined but not used) instead of always simply using SNI_ERROR. In SniMarsConnection.HandleRecieveError we can then check for that specific error case and use the context object to call BreakConnection() and at that point the connection will be doomed and everything should unravel safely.

The code in this are is complex and filled with async. While I can try this and may not see any problems I'd like some feedback on the logic and feeling of it from people with experience in the networking side before I try.

/cc tagging new people @tarikulsabbir @Gary-Zh

@divega
Copy link

divega commented May 16, 2019

Reopening as @Wraith2 indicated that the issue has not been solved by the PRs.

@divega divega reopened this May 16, 2019
@divega divega transferred this issue from dotnet/corefx May 16, 2019
@David-Engel David-Engel added this to the 1.0.0 milestone May 17, 2019
@David-Engel
Copy link
Contributor

@Wraith2 If you have time to look at this, great. Otherwise we will work on it when we have bandwidth. Thanks!

@Wraith2
Copy link
Contributor Author

Wraith2 commented May 17, 2019

I posted a couple of solutions and discussed it at some length with @saurabh500 in their discussions. This is something we've discussed offline and it's waiting for him to get chance to investigate a fix which fits better with the architecture.

dotnet/corefx#37068 (comment)

Let me do a bit more digging around this area to provide more pointers and have a concrete explanation. Don't burn your cycles on this PR.

This is the main blocker on continued improvements to the managed implementation because it prevents many important tests from running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment