Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XUnit infrastructure to handle unsupported statements and test runner crashes #152

Merged
merged 2 commits into from
Oct 15, 2023

Conversation

lauxjpn
Copy link
Member

@lauxjpn lauxjpn commented Oct 13, 2023

The PR adds additional XUnit infrastructure.


If specific exceptions are thrown that we explicitly throw if Jet is unable to support a feature, then the test runner will skip the test after it has already failed.
This frees us from explicitly annotating tests with Skip = properties if those tests will never succeed with Jet.

We currently do this now for translation errors due to the following unsupported expressions:

  • Cross Apply
  • Outer Apply
  • Row Number

If other expressions are encountered, they are automatically added to the ./test/EFCore.Jet.FunctionalTests/bin/Debug/net8.0-windows7.0/UnsupportedTranslations.txt file. No further processing is done with this file, but it can be used by someone to decide, whether additional expressions should be added to the aforementioned list of unsupported expressions.


The test framework now also contains logic to automatically handle crashes of the test runner.

Unfortunately, it is not uncommon for Jet or the underlying providers that we use (ODBC and OLE DB) to crash in specific scenarios (e.g. when using certain UNION queries, see #43).

When running a test, we now output a temporary file for that test case in ./test/EFCore.Jet.FunctionalTests/bin/Debug/net8.0-windows7.0/TestRunnerCrashCache.
If the test executes without crashing the test runner, that temporary file is deleted again.
If the test executes and crashes the test runner, that temporary file remains.

On the next test run, if the environment variable EFCoreJet_DetectCrashesOfPreviousRuns is explicitly set to true, all remaining temporary files will be merged into ./test/EFCore.Jet.FunctionalTests/TestsKnownToCrashTestRunner.txt. (So by default, crashed tests will not be merged. This is because manually aborting the test runner on the command line with Ctrl+C would leave the temporary file as well and could lead to a false positive this way).

The test runner will automatically skip all tests it finds in ./test/EFCore.Jet.FunctionalTests/TestsKnownToCrashTestRunner.txt, if the environment variable EFCoreJet_AutoSkipTestRunnerCrashingTests is not explicitly set to false. (So by default, previously crashed tests will automatically be skipped.)

To prime the TestsKnownToCrashTestRunner.txt file, something similar to the following PowerShell script can be used:

Get-ChildItem '.\test\EFCore.Jet.FunctionalTests\bin\Debug\net8.0-windows7.0\TestRunnerCrashCache' | Remove-Item -Verbose
$env:EFCoreJet_DetectCrashesOfPreviousRuns = 'true'
try {
    do {
        .\.dotnet_x86\dotnet.exe test -v:n -p:ParallelizeTestCollections=false -p:FixedTestOrder=true .\test\EFCore.Jet.FunctionalTests\EFCore.Jet.FunctionalTests.csproj --logger "trx;LogFileName=TestResult.xml" --results-directory .\TestResults --logger "console;verbosity=minimal" --blame-hang-timeout 3m
        Start-Sleep -Seconds 10
    } while ($LASTEXITCODE -ne 0)
} finally {
    $env:EFCoreJet_DetectCrashesOfPreviousRuns = ''
}

Once the file has been primed, it can be committed so the CI and others can profit from it as well.

Future improvements could limit skipping of test runner crashing tests to certain configurations (e.g x64 and OLDE DB).

There is now also an attribute TestRunnerCrashAttribute, that can be used to manually decorate a test method or class with it and thereby skip this test method or class.


/cc @ChrisJollyAU

@lauxjpn lauxjpn added this to the 8.0.0-alpha.2 milestone Oct 13, 2023
@lauxjpn lauxjpn self-assigned this Oct 13, 2023
@ChrisJollyAU
Copy link
Member

Do we actually still get test runner crashes? I know I haven'tseen any when running in VS although I haven't looked closely at the x64 test results in AZDO where it might be seen. If we aren't we probably don't need to bother about it

Also UNION queries have had a couple of fixes. Queries like the following (from #43 (comment) )

SELECT null FROM mytable
UNION
SELECT mycolumn FROM mytable

have a full fix and work 100% now

The other translation error we would consider is Skip. While I have managed to get Skip...Take to work (but only if it is ordered), Skip on its own isn't working. I do have an idea for that though

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 13, 2023

Also UNION queries have had a couple of fixes. Queries like the following (from #43 (comment) )

SELECT null FROM mytable
UNION
SELECT mycolumn FROM mytable

have a full fix and work 100% now

Sounds interesting. Can you point me to the fix?


Do we actually still get test runner crashes? I know I haven'tseen any when running in VS although I haven't looked closely at the x64 test results in AZDO where it might be seen. If we aren't we probably don't need to bother about it

Unfortunately, the test runner crashes or hangs all the time using ACE 16 (any architecture, any underlying provider).

Take a look at any of the 4 logs of the ...2016... jobs of the GitHub Action run on my fork. (Just search for crashed in the raw logs).

Feel free to locally pull my lauxjpn:ci/github_actions feature branch (it is the one from #148) and then push it onto your own repo fork on GitHub to play around with it. (Or just use docker to run the tests on different environment variations.)

BTW, the 2010, x64 runs crashed as well. Only the 2010, x86 runs completed without crash.

@ChrisJollyAU
Copy link
Member

Also UNION queries have had a couple of fixes. Queries like the following (from #43 (comment) )

SELECT null FROM mytable
UNION
SELECT mycolumn FROM mytable

have a full fix and work 100% now

Sounds interesting. Can you point me to the fix?

First fix was 5b591b7 . This fixed the data type mismatch when trying to use it in a JOIN expression in the ON clause. We detect this and add a convert in to its correct type.
2nd fix: 08718b8 .Convert NULL to a variant in the projections. This works better when UNIONing a NULL column against another table with a numeric type with the same name

BTW, the 2010, x64 runs crashed as well. Only the 2010, x86 runs completed without crash.

My VS is only running against the 2010,x86 engine. I have the click-to-run office x64 installed so the 2016 installer doesn't work

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 13, 2023

My VS is only running against the 2010,x86 engine. I have the click-to-run office x64 installed so the 2016 installer doesn't work

In that case:

Feel free to locally pull my lauxjpn:ci/github_actions feature branch (it is the one from #148) and then push it onto your own repo fork on GitHub to play around with it. (Or just use docker to run the tests on different environment variations.)


Convert NULL to a variant in the projections. This works better when UNIONing a NULL column against another table with a numeric type with the same name

Hmm, then this does not fully cover the issue. It is not about NULL in particular but about any scalar value in general. For the exact conditions, see #43 (comment), where I outlined them in detail back then.


Anyway, I will tackle the crashes.

@ChrisJollyAU
Copy link
Member

Hmm, then this does not fully cover the issue. It is not about NULL in particular but about any scalar value in general. For the exact conditions, see #43 (comment), where I outlined them in detail back then.

Will take a closer look. It's definitely progress and did fix all the cases I had in the test suite

BTW just double check. It looks like we actually get OdbcException in the logs WHERE we are meant to be on OleDb

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 13, 2023

BTW just double check. It looks like we actually get OdbcException in the logs WHERE we are meant to be on OleDb

Yeah, it is entirely possible that the CI tests run in a configuration that they shouldn't. I'll look at it later.

@ChrisJollyAU
Copy link
Member

Hmm, then this does not fully cover the issue. It is not about NULL in particular but about any scalar value in general. For the exact conditions, see #43 (comment), where I outlined them in detail back then.

That's a different problem then. The comment I linked to the problem was more with the wrong data type being determined with UNION queries

For the Access Violations, have you tried against the click-to-run install version in x64. Managed to get it going on my pc (realised I had the full Access x64 installed so automatically had OleDb Jet x64). Ran the JetProviderExceptionTests and configured it to run the NorthwindTestJetCommand, NorthwindTestOleDbCommand and IceCreamTest and no crashes. JetCommand and OleDbCommand completed all the way through and I cancelled IceCreamTest because I wanted to move on

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 13, 2023

That's a different problem then. The comment I linked to the problem was more with the wrong data type being determined with UNION queries

I see, it is independent of the Access Violation issues then.


For the Access Violations, have you tried against the click-to-run install version in x64. Managed to get it going on my pc (realised I had the full Access x64 installed so automatically had OleDb Jet x64). Ran the JetProviderExceptionTests and configured it to run the NorthwindTestJetCommand, NorthwindTestOleDbCommand and IceCreamTest and no crashes. JetCommand and OleDbCommand completed all the way through and I cancelled IceCreamTest because I wanted to move on

You can take a look at the logs from the run I referenced earlier (or run it yourself). The Access Violation issue is generally non-deterministic and will appear later in the run (not necessarily directly after the inciting test). On local runs using x86/OLE DB using the following command, it appeared especially after some end-to-end tests (e.g. EntityFrameworkCore.Jet.FunctionalTests.CompositeKeyEndToEndJetTest.Can_use_two_non_generated_integers_as_composite_key_end_to_end or EntityFrameworkCore.Jet.FunctionalTests.JetEndToEndTest.Can_round_trip_changes_with_changed_only_notification_entities to name just two):

.\.dotnet_x86\dotnet.exe test -v:n -p:ParallelizeTestCollections=false -p:FixedTestOrder=true .\test\EFCore.Jet.FunctionalTests\EFCore.Jet.FunctionalTests.csproj --logger "trx;LogFileName=TestResult.xml" --results-directory .\TestResults --logger "console;verbosity=minimal" --blame-hang-timeout 3m

This runs the whole EFCore.Jet.FunctionalTests test suite in a deterministic order, one test after another.

You should get similar results for x64 runs.
As said earlier, you can just run the GitHub Action on your own fork or create a couple of Windows based docker files so you can run all the different variations locally on your machine (similar to what the CI scripts are doing).


Non-deterministic crashes due to memory violations of a piece of software from a third party are tricky. As I said above, I will tackle this. So feel free to continue with the work you were doing before I brought up this issue, if you want to. 👍

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 15, 2023

[...] Or just use docker to run the tests on different environment variations. [...]

We just added docker files in #153.

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 15, 2023

BTW just double check. It looks like we actually get OdbcException in the logs WHERE we are meant to be on OleDb

Yeah, it is entirely possible that the CI tests run in a configuration that they shouldn't. I'll look at it later.

There were a couple of edge case tests that used hard coded exceptions or provider libraries. Should now be fixed by 73b5c06.

@lauxjpn
Copy link
Member Author

lauxjpn commented Oct 15, 2023

I opened separate issues for discussed but not yet fixed/implemented points.

I'll go on and merge this PR then. Once the test suite is stable, we can decide to either remove the crash support code again or to keep it (it shouldn't do any harm).

@lauxjpn lauxjpn merged commit 73f91b2 into CirrusRedOrg:master Oct 15, 2023
5 checks passed
@lauxjpn lauxjpn deleted the test/xunit_extensions branch October 15, 2023 21:30
@ChrisJollyAU
Copy link
Member

This is still skipping tests it shouldnt

Trying to run Where_simple_closure_via_query_cache_nullable_type in x64 mode comes up with this error

Command parameter[0] '' data value could not be converted for reasons other than sign mismatch or data overflow.

As the default in SkipFailedTet is true, anything not handled by your if statements is automatically skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants