Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fix SqlClient diagnostic test failures #16349

Merged
merged 3 commits into from
Feb 22, 2017

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 21, 2017

Two issues:

  • Tests from the DiagnosticTests class might run concurrently with tests from other test classes. This could cause interference between the tests.
  • The CollectStatisticsDiagnosticsAsync method was void returning, which meant the test harness would not wait for its completion before processing other tests. This could both cause interference between the tests (including other diagnostics tests) and cause unhandled exceptions to tear down the test process.

Fixes:

  • Use RemoteInvoke to launch each diagnostic test in its own process so that it doesn't experience or cause interference with other tests.
  • Ensure that all operations are appropriately waited for.

Fixes https://github.com/dotnet/corefx/issues/15383
cc: @saurabh500, @danmosemsft

(I suggest reviewing while ignoring whitespace: https://github.com/dotnet/corefx/pull/16349/files?w=1)

@saurabh500
Copy link
Contributor

Is there a way to create a Single Server for this whole test instead of spinning multiple servers?
There were multiple server being created before this change as well, as the constructor was being called before each test. Looking for ideas about how the number of TCP ports being used in a test can be minimized

@stephentoub
Copy link
Member Author

Is there a way to create a Single Server for this whole test instead of spinning multiple servers?

It could be done by implementing an xunit class fixture. xunit can then create an instance of that and reuse it across all test classes it instantiates.

You could certainly do that (we have a couple of cases of this in the repo I believe), but I'll leave it to you to do subsequently. My goal right now is unblocking CI legs, which are failing due to this Guid check very frequently.

@saurabh500
Copy link
Contributor

Also before you pushed in the 2nd commit adding the check for conditional connection string, the failure at https://ci.dot.net/job/dotnet_corefx/job/master/job/ubuntu14.04_debug_prtest/3668/testReport/junit/System.Data.SqlClient.Tests/DiagnosticTest/ExecuteXmlReaderAsyncErrorTest/ didn't show the test assertion, but it showed an assertion for the success code of the remote process being a mismatch with what came out of the test. Is there a way to surface the test assertion when a failure happens in the process?

@saurabh500
Copy link
Contributor

You could certainly do that (we have a couple of cases of this in the repo I believe), but I'll leave it to you to do subsequently. My goal right now is unblocking CI legs, which are failing due to this Guid check very frequently.

Definitely, that's not a blocker for this PR.

@stephentoub
Copy link
Member Author

stephentoub commented Feb 21, 2017

didn't show the test assertion, but it showed an assertion for the success code of the remote process being a mismatch with what came out of the test

It does have it, it just wasn't correlated in the log. If you look just above it in the log here:
https://ci.dot.net/job/dotnet_corefx/job/master/job/ubuntu14.04_debug_prtest/3668/consoleText
you'll see:

Exception from RemoteExecutorConsoleApp(System.Data.SqlClient.Tests, Version=4.1.2.0, Culture=neutral, PublicKeyToken=9d77cc7ad39b68eb, System.Data.SqlClient.Tests.DiagnosticTest+<>c, <ExecuteXmlReaderAsyncErrorTest>b__19_0):
  Assembly: System.Data.SqlClient.Tests, Version=4.1.2.0, Culture=neutral, PublicKeyToken=9d77cc7ad39b68eb
  Type: System.Data.SqlClient.Tests.DiagnosticTest+<>c
  Method: Int32 <ExecuteXmlReaderAsyncErrorTest>b__19_0(System.String)
  Exception: System.Reflection.TargetParameterCountException: Parameter count mismatch.
     at System.Reflection.RuntimeMethodInfo.InvokeArgumentsCheck(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
     at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
     at RemoteExecutorConsoleApp.Program.Main(String[] args) in /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/Common/tests/System/Diagnostics/RemoteExecutorConsoleApp/RemoteExecutorConsoleApp.cs:line 48
  
  Unhandled Exception: System.Reflection.TargetParameterCountException: Parameter count mismatch.
     at RemoteExecutorConsoleApp.Program.Main(String[] args) in /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/Common/tests/System/Diagnostics/RemoteExecutorConsoleApp/RemoteExecutorConsoleApp.cs:line 60
     System.IO.Tests.FileStream_ctor_str_fm_fa_fs_buffer_fo.ValidFileOptions [SKIP]
        Condition(s) not met: \"IsNotWindowsNanoServer\"
     System.IO.Tests.FileStream_ctor_str_fm_fa_fs_buffer_fo.ValidFileOptions_Encrypted [SKIP]
        Condition(s) not met: \"IsNotWindowsNanoServer\"
     System.Data.SqlClient.Tests.DiagnosticTest.ExecuteXmlReaderAsyncErrorTest [FAIL]
        Assert.Equal() Failure
        Expected: 42
        Actual:   134
        Stack Trace:
           /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/Common/tests/System/Diagnostics/RemoteExecutorTestBase.cs(177,0): at System.Diagnostics.RemoteExecutorTestBase.RemoteInvokeHandle.Dispose()
           /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/System.Data.SqlClient/tests/FunctionalTests/DiagnosticTest.cs(389,0): at System.Data.SqlClient.Tests.DiagnosticTest.ExecuteXmlReaderAsyncErrorTest()

This happens because right now the only way the RemoteInvoke mechanism has to provide results to the caller is via the exit code, so the remote process app prints out details of the failure to stderr and then returns an error code that's not the "known success" value of 42. In this case the remote app crashed with an unhandled exception (hence the return error code of 134, which means SIGABRT, which happens when there's an unhandled exception), and you can see the details of the unhandled exception printed out. All of this is in the log, but Jenkins doesn't know that this text correlates with that specific test failure, so it doesn't show up in that summary page... which is why I never look at that test summary page and instead always just look at the log :) (cc: @danmosemsft) We could fix this with some effort, by using a pipe or reading from stderr or some such thing in the RemoteInvoke call... we've just never taken the time to do so.

@saurabh500
Copy link
Contributor

Perfect. Thanks for the explanation @stephentoub

I am still wondering why my PR with Remote Executor was leading to hangs in the tests. I see that you have initialized the server inside the Remote Executor code. Probably that's the difference. Though I still don't know why that's an issue because the server and client would communicate over TCP. That would be an action item for me to debug.

Thanks for taking care of this.

@stephentoub
Copy link
Member Author

Though I still don't know why that's an issue because the server and client would communicate over TCP.

My guess: the server gets torn down when the test exits, and many of the tests weren't blocking waiting for the async void functions to complete, so the SqlConnections were trying to connect to a server that was very likely already or in the process of getting torn down, and they ended up timing out trying to connect.

Two issues:
- Tests from the DiagnosticTests class might run concurrently with tests from other test classes.  This could cause interference between the tests.
- The CollectStatisticsDiagnosticsAsync method was void returning, which meant the test harness would not wait for its completion before processing other tests.  This could both cause interference between the tests (including other diagnostics tests) and cause unhandled exceptions to tear down the test process.

Fixes:
- Use RemoteInvoke to launch each diagnostic test in its own process so that it doesn't experience or cause interference with other tests.
- Ensure that all operations are appropriately waited for.
Increase fail-wait timeout in case a machine is bogged down, and add a better assert message
@stephentoub
Copy link
Member Author

@dotnet-bot test this please (all legs passed, but running them again before merging)

@stephentoub stephentoub merged commit bb6482c into dotnet:master Feb 22, 2017
@stephentoub stephentoub deleted the diagnostictests branch February 22, 2017 12:36
@karelz karelz modified the milestone: 2.0.0 Feb 22, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Fix SqlClient diagnostic test failures

Commit migrated from dotnet/corefx@bb6482c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants