New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono?] Cannot run host tests due to xunit version mismatch #60550
Comments
/cc @akoeplinger |
Can you try if you see the issue on main too or just release/6.0? |
On main, I don't get that far: here the problem is that it tries to build the tests as net7.0 code, but using the host SDK, which doesn't even support net7.0 yet ... Not sure what's going on there. |
@uweigand can you run with |
To the best of our ability we've tried to match mono's loader to the the assembly binder behavior in CoreCLR with respect to versions
It's possible there's some non-determinism that causes Mono to load the older xunit assembly and CoreCLR to load the newer one. I'm not sure whether xunit is using ALCs here - it's also possible we're either looking in the wrong ALC or doing some unintended sharing. |
Well, it rejects it because of the version:
It also seems to be the default ALC according to those logs. As to:
As far as I can see, at execution time there is just one assembly present. During restore, files from the two dependent packages are copied over, and given that each of those two packages contains a copy of that assembly (just at different versions), at the end of the restore process, one copy actually made it into the application directory. And that is the one the Mono loader finds (I don't think it even could find any other version at this point), but it is has the wrong version. |
yea doesn't seem like the loader has a lot of options here. It would be nice to confirm that the coreclr loader isn't being less strict in this case for some reason, but it seems like an issue with the build. |
I did confirm that in my Intel build (using CoreCLR), where the tests succeed, we also have the older version of |
Given that the error is triggered by |
If that's the case this is probably a hard problem.
|
Huh, that's interesting. While the copy in the application directory is indeed the same, on Intel I see (just using strace) the test execution directly use the copy in the .nuget package directory - which is of course the correct version:
versus on s390x:
I'm not sure how this happens -- is there a way to create traces for the CoreCLR loader as well? |
https://docs.microsoft.com/en-us/dotnet/core/dependency-loading/collect-details |
It turns out that the CoreCLR loader doesn't find the
Unfortunately, when running under Mono, we never even get to installing that Resolving event handler. This is because the runtime aborts already during compilation of that very
( This is the |
Talked with some CoreCLR JIT folks. Apparently they always generate essentially a call to a helper and pass it the class token:
So for reference types they might not have to do any class intialization (or at least instance size calculation) at JIT time. To do the same on Mono we would need some variant initialization path that does enough work to initialize the methods (or at least the .ctor being called) but not compute the field layout. I'm not sure how feasible that is. The initialization order is pretty fragile. |
I've tried to work around the problem, following the initial assumption, by avoiding installing the In fact, it turns out my initial assumption (that this is due to a conflict installing two copies of the same file originating from different packages) was actually wrong. Only It also turns out that with CoreCLR, the It seems this behavior of the Mono loader (scanning the "Application Base" directory -- this may be a relic of Mono's -mostly removed- appdomain handling?) is itself an incompatibility with CoreCLR, and that actually causes a couple of failures in the host.tests suite where the test relies on an assembly not being loaded if it isn't mentioned in deps.json. It's just that while the versions happened to match, these two incompatibilities "canceled" each other and the test execution happened to work ...
Looking in more detail at the Mono loader implementation of |
I'm quite pessimistic that we can make it work. One thing we could try: instead of always calling But actually, looking at the
I think this is worth a separate issue. |
I had an idea of how we could refine this a little. We add some flags that the loader can set to tell the JIT "do something different with this class". And we define a single "something different" right now: always delay constructor calls to a JIT icall.
@uweigand I'm going to try to make a prototype with this approach. It should at least be enough to see if we need to do something about |
@lambdageek thanks for working on this! I'm wondering if we can avoid the hard-coded list by just detecting the failure during allocation - always try the current approach first, and if it fails to initialize the class, then instead of throwing an exception switch to emitting the icall and trying again there.
Agreed, I'll open one with more details from the host.tests failures I was seeing. |
I don't think that will work. When we call Thinking about the right long term fix might be one of:
|
Thanks. Unfortunately, it still doesn't work - it looks like you were right about the
Line 7270 in the patched sources corresponds to this: Also, I realized that even if it did work, it still wouldn't help to just apply the patch to the runtime sources - the xunit tests are run using the host SDK, so we'd have to get that fix in there as well. Maybe it would be easier after all to put a workaround for this particular issue in xunit, i.e. replace this:
with something along the lines of:
and then do both the allocation and the actual call in the helper. (Unfortunately I wasn't able to test this so far - it doesn't seem to be (easily?) possible to even build xunit v2 on Linux at all ...) |
On platforms where Mono is the dotnet SDK runtime (Linux s390x, currently), work around a limitation of the Mono JIT by moving the instantiation of ConsoleRunner to a separate function from the call to SubscribeResolveForAssembly. Unlike CoreCLR, when the Mono JIT is compiling Main and sees a call to new ConsoleRunner, it needs to load the types of the fields of ConsoleRunner. Some of those fields come from assemblies that can only be found using the resolve helper. But the helper is not installed until the JIT has finished compiling Main and starts executing it. This is a fundamental limitation of the Mono JIT and is unlikely to be fixed in the short term. The workaround is to move the code that depends on the assembly resolve to a separate non-inlinable function. As a result, Mono with JIT and invoke Main without trying to load and initialize ConsoleRunner until after the assembly resolve event handler is installed. Fixes dotnet/runtime#60550
@uweigand I updated #60987 to also address the I think this might be less of a hack than I initially thought - in particular I wonder if this might have a benefit for app startup. I wonder if it's possible to more broadly avoid class initialization until an instance is created . At least in cases where nothing difficult is going on (ie non-generic reference types without static constructors) |
And removing Questions/Next steps:
|
I don't think the installer tests do anything special, they just use the "normal" arcade logic for test projects, which always uses the upstream console runner. For the library tests, in contrast, we have
and that file |
I really wonder if we need a package reference to |
Unfortunately that doesn't help, see #60550 (comment) |
Add a rarely used MonoClass property for "special jit flags" (`m_class_has_special_jit_flags` and `mono_class_get_special_jit_flags`) that is set at class loading time on some classes known to the runtime. Add an `enum MonoSpecialJitClassFlags` with the various values. Currently there's a single value `MONO_SPECIAL_JIT_USE_ICALL_NEWOBJ`. When loading `Xunit.ConsoleClient.ConsoleRunner` from `xunit.console.dll`, set the `USE_ICALL_NEWOBJ` flag. The JIT handling of this flag is not yet implemented. The `USE_ICALL_NEWOBJ` should tell the JIT to avoid initializing the class at JIT-time, and to delegte new instance creation to an icall that will initialize the class, allocate the memory and invoke the constructor. This is part of a workaround for dotnet#60550
Avoid calling `mono_class_init_internal` for classes marked with `MONO_SPECIAL_JIT_USE_ICALL_NEWOBJ` until runtime, rather than JIT-time. Related to dotnet#60550
Description
Attempting to run the host tests on a Mono-based linux-s390x runtime fails with:
Reproduction Steps
Run
./build host.tests --test
on the currentrelease/6.0
branch on linux-s390x.Expected behavior
Test execution starts.
Actual behavior
Test fails to start with the error message shown above.
Regression?
Yes. This worked before 491ed9a .
Known Workarounds
Revert
XUnitVersion
back to 2.4.1. (This also requires moving to back-level versions ofMicrosoftDotNetXUnitExtensionsVersion
andMicrosoftDotNetXUnitConsoleRunnerVersion
.)Configuration
Runtime from current .NET 6 branch built on linux-s390x.
Other information
The root cause of the issue appears to be that the host tests (e.g.
Microsoft.NET.HostModel.AppHost.Tests
) depend on bothand
Both of these packages bring their own copy of
xunit.runner.utility.netcoreapp10.dll
, but in two different versions.xunit.runner.console
version 2.4.2-pre.9 brings version 2.4.2 of that DLL, whilexunit.runner.visualstudio
version 2.4.2 brings version 2.4.1 of that DLL.During restore, it looks like one of those copies "wins", and this happens to be the older one (2.4.1). This causes the Mono loader error while loading dependencies of xunit.runner.console, which expects the newer version (2.4.2).
I tried to re-create the issue on Intel, but there I'm not seeing the error, even though the overall situation looks identical. So maybe part of the root cause may also involve the Mono loader - maybe this is more strict in checking version matches than the CoreCLR loader for some reason?
CC @steveisok @directhex @maryamariyan
The text was updated successfully, but these errors were encountered: