Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing compressed singlefile scenario on osx-arm64 #79894

Merged
merged 5 commits into from
Dec 23, 2022
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Dec 22, 2022

Re: #79267

Also fixes host tests so they can restore/run on osx-arm64
Fixes: #79470

@ghost
Copy link

ghost commented Dec 22, 2022

Tagging subscribers to this area: @agocke, @vitek-karas, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Re: #79267

Also fixes host tests so they can restore/run on osx-arm64

Author: VSadov
Assignees: VSadov
Labels:

area-Single-File

Milestone: -

@@ -775,10 +775,15 @@ void* FlatImageLayout::LoadImageByCopyingParts(SIZE_T* m_imageParts) const
}
#endif // FEATURE_ENABLE_NO_ADDRESS_SPACE_RANDOMIZATION

DWORD allocationType = MEM_RESERVE | MEM_COMMIT;
#if defined(__APPLE__) && defined(HOST_ARM64)
allocationType |= MEM_RESERVE_EXECUTABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this for HOST_UNIX in general like we do in the ExecutableAllocator::Reserve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would cause that memory to be taken from the pre-reserved block in PAL that is close to the libcoreclr.so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it help R2R code in the same way as JITed code (making it closer to coreclr and avoiding stubs) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not see you've answered this already.

@VSadov
Copy link
Member Author

VSadov commented Dec 22, 2022

I do not think by default CI will run installer test leg with osx-arm64.
I am not sure if/how that can be forced.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@am11
Copy link
Member

am11 commented Dec 22, 2022

I do not think by default CI will run installer test leg with osx-arm64.

I think currently we don't run installer tests on any arm64, which we can (should?) change by adding arm64 here:

not(in(parameters.archType, 'x64', 'x86')),

@agocke, @elinor-fung, should we just remove this archType condition instead?

@VSadov
Copy link
Member Author

VSadov commented Dec 23, 2022

Thanks!!

@VSadov VSadov merged commit f3e2260 into dotnet:main Dec 23, 2022
@VSadov VSadov deleted the osxComp2 branch December 23, 2022 21:31
@elinor-fung
Copy link
Member

should we just remove this archType condition instead?

We probably could, but unfortunately, I don't think that would actually let the CI run installer tests on arm64. The installer tests currently run directly on the build machine, so they are disabled when cross-building. I think we'd need to get them running with Helix (and also wouldn't be surprised if they have issues on win-arm64 right now) - related: #77800.

@lewing
Copy link
Member

lewing commented Jan 5, 2023

is this a candidate for servicing?

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

@lewing - yes, but we may want to fix tests. How would we validate this in 7.0 if tests do not run?

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

@elinor-fung - I actually tried to enable installer tests on osx-arm64 tests in #80198

The tests did run, but there are failures. More failures that I would expect.
I think we should look into failures and enable the tests.

@elinor-fung
Copy link
Member

I think that is because the tests are running directly on the build machine. I believe the build machine is x64, so trying to run the arm64 host fails. Looking at some of the failures, they fail trying to start the process with Bad CPU type in executable.

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

Oh, that would explain the issues. I did not realize we build on x64.

We really need to fix the test issue, but it looks harder than I thought.

@am11
Copy link
Member

am11 commented Jan 6, 2023

With today's daily build, the uncompressed single-file is also failing to load embedded libSystem.Native.dylib and trying to locate the library on disk with single-file app on M1. (which was working with December 2022 build #79229 (reply in thread))

$ uname -m
arm64

$ dotnet8 --version
8.0.100-alpha.1.23055.16

$ dotnet8 publish -o publish -p:PublishSingleFile=true -c Release
MSBuild version 17.5.0-preview-23054-02+762ae6c6b for .NET
  Determining projects to restore...
  Restored /Users/am11/projects/helloworldapp/helloworldapp.csproj (in 51 ms).
/Users/am11/.dotnet8/sdk/8.0.100-alpha.1.23055.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.RuntimeIdentifierInference.targets(287,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [/Users/am11/projects/helloworldapp/helloworldapp.csproj]
  helloworldapp -> /Users/am11/projects/helloworldapp/bin/Release/net8.0/osx-arm64/helloworldapp.dll
  helloworldapp -> /Users/am11/projects/helloworldapp/publish/

$ strings publish/helloworldapp | grep '@(#)'
@(#)Version 8.0.23.5418 @Commit: a2243718922e85370f137bdfa3e30a8eedebe954

$ publish/helloworldapp 
Error message: Unable to load shared library 'libSystem.Native' or one of its dependencies. In order to help diagnose loading problems, consider setting the DYLD_PRINT_LIBRARIES environment variable: 
dlopen(/Users/am11/projects/helloworldapp/publish/libSystem.Native.dylib, 0x0001): tried: '/Users/am11/projects/helloworldapp/publish/libSystem.Native.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/am11/projects/helloworldapp/publish/libSystem.Native.dylib' (no such file), '/Users/am11/projects/helloworldapp/publish/libSystem.Native.dylib' (no such file)
dlopen(libSystem.Native.dylib, 0x0001): tried: 'libSystem.Native.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibSystem.Native.dylib' (no such file), '/usr/lib/libSystem.Native.dylib' (no such file, not in dyld cache), 'libSystem.Native.dylib' (no such file), '/usr/local/lib/libSystem.Native.dylib' (no such file), '/usr/lib/libSystem.Native.dylib' (no such file, not in dyld cache)
dlopen(/Users/am11/projects/helloworldapp
Failed to create CoreCLR, HRESULT: 0x80131524
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00000001001e2ef4 helloworldapp`(anonymous namespace)::LoadLibErrorTracker::Throw(SString&)
    frame #1: 0x00000001001e315c helloworldapp`NativeLibrary::LoadLibraryByName(char16_t const*, Assembly*, int, unsigned int, int) + 288
    frame #2: 0x000000010008b5b0 helloworldapp`CorHost2::CreateAppDomainWithManager(char16_t const*, unsigned int, char16_t const*, char16_t const*, int, char16_t const**, char16_t const**, unsigned int*) + 1344
    frame #3: 0x000000010056d3f8 helloworldapp`coreclr_initialize + 872
    frame #4: 0x0000000100021528 helloworldapp`coreclr_t::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, char const*, coreclr_property_bag_t const&, std::__1::unique_ptr<coreclr_t, std::__1::default_delete<coreclr_t> >&) + 420
    frame #5: 0x0000000100038210 helloworldapp`(anonymous namespace)::create_coreclr() + 432
    frame #6: 0x0000000100037bc4 helloworldapp`corehost_main + 160
    frame #7: 0x000000010000f2fc helloworldapp`fx_muxer_t::handle_exec_host_command(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, host_startup_info_t const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unordered_map<known_options, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, known_options_hash, std::__1::equal_to<known_options>, std::__1::allocator<std::__1::pair<known_options const, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328
    frame #8: 0x000000010000e3f0 helloworldapp`fx_muxer_t::execute(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860
    frame #9: 0x000000010000ae94 helloworldapp`hostfxr_main_bundle_startupinfo + 196
    frame #10: 0x000000010004fe04 helloworldapp`exe_start(int, char const**) + 1132
    frame #11: 0x00000001000500e8 helloworldapp`main + 152
    frame #12: 0x00000001968b3e50 dyld`start + 2544

Update: same error on linux-x64

@janvorli
Copy link
Member

janvorli commented Jan 6, 2023

@am11 this sounds like a problem with my coreclr initialization failure logging additions. The change added attempt to load libSystem.Native during coreclr initialization to make sure it is present and valid. It is compiled in only for the case when CORECLR_EMBEDDED is not defined. So it seems that either the symbol is not defined when it should be.

Thank you for reporting this issue just in time, I was just about to backport it to .NET 7 as there were no errors reported by any of our tests during the last month.

@elinor-fung it seems we are missing host tests for this kind of scenario.

@VSadov
Copy link
Member Author

VSadov commented Jan 6, 2023

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3851903021

@am11
Copy link
Member

am11 commented Jan 6, 2023

it seems we are missing host tests for this kind of scenario.

Yup. Single-file tests were added and then removed for cost saving: #67062 (comment). We should revert that change as at least a few "basic" single-file tests should be available in this repro. The end-to-end test in installer repo does not catch anything related to single-file and we have always found breakages with manual testing or consumer reports.

@janvorli
Copy link
Member

janvorli commented Jan 6, 2023

@am11 even enabling the test didn't repro the issue. The problem is that the test build still copies the libSystem.Native.so into the test binary directory.
Anyways, I've found the problem and I am going to create a PR with a fix in a minute.

@am11
Copy link
Member

am11 commented Jan 6, 2023

test build still copies the libSystem.Native.so into the test binary directory

@janvorli, this defeats the purpose of PublishSingleFile sandbox test. We should fix it by passing -p:PublishDir=<always a new path> to avoid mixing things.

BTW, this is exactly what I am currently working on in #80154. crossgen2.exe is currently the only thing in runtime repo which is published as SingleFile or AOT (depending on the platform) and we are not using the "published" binary (which we ship in publicly available nuget package), but instead invoking its assembly with <dotnet.exe or corerun.exe> crossgen2.dll.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 5, 2023
@VSadov
Copy link
Member Author

VSadov commented May 17, 2023

/backport to release/6.0-staging

@github-actions github-actions bot unlocked this conversation May 17, 2023
@github-actions
Copy link
Contributor

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/4998296122

@github-actions
Copy link
Contributor

@VSadov backporting to release/6.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: force 6.0 instead of 3.0 on osx-arm64
Applying: disable legacy test on osx-arm64
Applying: fix osx-arm64 case
Using index info to reconstruct a base tree...
M	src/coreclr/vm/peimagelayout.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/peimagelayout.cpp
CONFLICT (content): Merge conflict in src/coreclr/vm/peimagelayout.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 fix osx-arm64 case
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

@VSadov an error occurred while backporting to release/6.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Host tests fail to restore on osx-arm64
5 participants