Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] xcode15+ linker issue #97745

Open
VSadov opened this issue Jan 31, 2024 · 23 comments
Open

[NativeAOT] xcode15+ linker issue #97745

VSadov opened this issue Jan 31, 2024 · 23 comments
Labels
area-NativeAOT-coreclr os-mac-os-x macOS aka OSX tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@VSadov
Copy link
Member

VSadov commented Jan 31, 2024

There is a chance that this is something with my machine, but it looks like I cannot build run tests with 8.0
It could be related to xcode 15+

I have 15.2 and tried downgrade to 15.0 with the same result. It does not seem possible to downgrade further as older versions claim to be incompatible with the new OS.

I am using the following command:

/build.sh clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs -rc Release -lc Release ; src/tests/build.sh nativeaot Release 'tree nativeaot' ; src/tests/run.sh --runnativeaottests Release

That works fine on main branch (product and tests build and tests pass).
But on release/8.0 branch I get a bunch of errors like:

/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/build/Microsoft.NETCore.Native.targets(308,5): error MSB3073: The command ""/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/ilc-published/ilc" @"/Users/vs/Hosting01/runtime/artifacts/tests/coreclr/obj/osx.arm64.Release/Managed/nativeaot/SmokeTests/StackTraceMetadata/StackTraceMetadata_Stripped/native/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 31, 2024
@ghost
Copy link

ghost commented Jan 31, 2024

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

There is a chance that this is something with my machine, but it looks like I cannot build run tests with 8.0
It could be related to xcode 15+

I have 15.2 and tried downgrade to 15.0 with the same result. It does not seem possible to downgrade further as older versions claim to be incompatible with the new OS.

I am using the following command:

/build.sh clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs -rc Release -lc Release ; src/tests/build.sh nativeaot Release 'tree nativeaot' ; src/tests/run.sh --runnativeaottests Release

That works fine on main branch (product and tests build and tests pass).
But on release/8.0 branch I get a bunch of errors like:

/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/build/Microsoft.NETCore.Native.targets(308,5): error MSB3073: The command ""/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/ilc-published/ilc" @"/Users/vs/Hosting01/runtime/artifacts/tests/coreclr/obj/osx.arm64.Release/Managed/nativeaot/SmokeTests/StackTraceMetadata/StackTraceMetadata_Stripped/native/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.

Author: VSadov
Assignees: -
Labels:

untriaged, area-NativeAOT-coreclr

Milestone: -

@VSadov
Copy link
Member Author

VSadov commented Jan 31, 2024

@filipnavara - could this be another linker issue similar to #92520?
Or maybe even the same issue, just showing up in a different way?

@VSadov
Copy link
Member Author

VSadov commented Jan 31, 2024

I think the lab is still on xcode 14 and that could be the reason this does not happen in CI

@MichalStrehovsky
Copy link
Member

The command "".../ilc" @"..,/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.

This is ILC crashing. One option to troubleshoot would be just to re-run this line under a debugger.

@VSadov
Copy link
Member Author

VSadov commented Feb 1, 2024

Passing -ld64 to ILC build and to the Microsoft.NETCore.Native.Unix.targets makes the problem go away - the ILC is functional can build tests and tests pass*

This does seem to be some kind of incompatibility with the new linker.

  • all tests pass - except Reflection_FromUsage. Possibly something unrelated, or just something was left from previous tries to build. I will try a clean rebuild.
  • on a clean rebuild all smoke tests pass, but we do need -ld64 for both ILC and the tests.

@filipnavara
Copy link
Member

could this be another linker issue similar to #92520?

Generally speaking, yes. I won’t have time to look into it until next week, unfortunately.

@VSadov
Copy link
Member Author

VSadov commented Feb 1, 2024

I think nothing is blocked on this yet, so this is mostly a heads up.
Since the lab uses xcode14, it is not affected. Also I am unsure if xcode15+ is a supported end user configuration for 8.0. Since the lab is still on 14, probably not yet.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2024

I am unsure if xcode15+ is a supported end user configuration for 8.0.

Yes, it is supported. Prerequisites in our documentation do not ask for a specific old XCode version.

@VSadov
Copy link
Member Author

VSadov commented Feb 1, 2024

Yes, it is supported. Prerequisites in our documentation do not ask for a specific old XCode version.

That is desirable, but practically it is hard to guarantee support for something newer than what the lab runs.
That said, moving to new xcode has been mostly uneventful in the past. It is just v15 brings a new linker with a bunch of incompatibilities. I would not rule out that some of those incompatibilities are unintentional and basically bugs that may eventually be fixed.

Maybe the right course of action for 8.0 is to use -ld_classic until ld_prime becomes more stable?

@agocke agocke added this to the 9.0.0 milestone Feb 1, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 1, 2024
@jkotas
Copy link
Member

jkotas commented Feb 1, 2024

it is hard to guarantee support for something newer than what the lab runs

Yes, we have same problem with support of new OS versions. We take fixes for these types of issues in servicing.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 2, 2024
@filipnavara
Copy link
Member

Shouldn't the milestone be set to 8.0.x instead of 9.0.0 since this apparently doesn't happen on main?

@filipnavara
Copy link
Member

filipnavara commented Feb 3, 2024

Nevermind, there is definitely something broken even on main with the new linker. I updated to Xcode 15.2 and I get crash in nativeaot/SmokeTests/FrameworkStrings/Baseline test at the end. Relinking with -ld_classic and no other changes makes it go away. So does linking against Debug build of libRuntime.WorkstationGC.a. I'll investigate next week.

I did some debug build earlier and I started seeing this, which may be related:

        --------------------------------------------------
        Debug Assertion Violation
        
        Expression: 'm_pInstance->IsManaged(m_ControlPC) && "unwind from throw site stub failed"'
        
        File: /Users/filipnavara/Projects/runtime/src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp, Line: 1289
        --------------------------------------------------

@filipnavara
Copy link
Member

The issue I get in the Baseline test is trashed SP after a caught exception (before: 0x000000016fdff1a0; after: 0x00000001814790e0). The unwinding information may be corrupted.

@filipnavara
Copy link
Member

So far I traced it to UnwindHelpers::GetUnwindProcInfo returning bogus info belonging to a different method than asked for. The unwinding data itself in the executable is likely fine. I did a spot check on it, and lldb unwinds the same stack trace just fine.

@filipnavara
Copy link
Member

filipnavara commented Feb 4, 2024

So, the linker indeed produces garbage unwind tables (as verified by objdump -u):

    Second level index[6]: offset in section=0x0000a3f8, base function offset=0x000f55f0
      [0]: function offset=0x000f55f0, encoding=0x44000000
      [1]: function offset=0x000f5640, encoding=0x03011500
      [2]: function offset=0x000f56f0, encoding=0x03011528
      [3]: function offset=0x000f5740, encoding=0x0301154c
...
      [106]: function offset=0x000f92d0, encoding=0x03011e00
      [107]: function offset=0x000f9320, encoding=0x03011e24
      [108]: function offset=0x0018a5c8, encoding=0x00000000
      [109]: function offset=0x000f93e0, encoding=0x03011e4c
      [110]: function offset=0x0018a608, encoding=0x00000000
      [111]: function offset=0x000f9440, encoding=0x03011e70
      [112]: function offset=0x0018a668, encoding=0x00000000
      [113]: function offset=0x000f94c0, encoding=0x03011e94
      [114]: function offset=0x0018a6e8, encoding=0x00000000
      [115]: function offset=0x000f9580, encoding=0x03011eb8
      [116]: function offset=0x0018a7a8, encoding=0x00000000
      [117]: function offset=0x000f9620, encoding=0x03011ee0
      [118]: function offset=0x0018a848, encoding=0x00000000
      [119]: function offset=0x000f96b0, encoding=0x44000000
      [120]: function offset=0x0018a8d8, encoding=0x00000000
      [121]: function offset=0x000f96e0, encoding=0x44000000
      [122]: function offset=0x0018a908, encoding=0x00000000
      [123]: function offset=0x000f9710, encoding=0x44000000
      [124]: function offset=0x0018a938, encoding=0x00000000
      [125]: function offset=0x000f9740, encoding=0x44000000
      [126]: function offset=0x0018a968, encoding=0x00000000
      [127]: function offset=0x000f9790, encoding=0x44000000
      [128]: function offset=0x0018a9b8, encoding=0x00000000
      [129]: function offset=0x000f97b0, encoding=0x03011f04
      [130]: function offset=0x0018a9d8, encoding=0x00000000
      [131]: function offset=0x000f9960, encoding=0x03011f30
      [132]: function offset=0x0018ab88, encoding=0x00000000
…

The function offsets are supposed to be sorted. I'll collect the build artifacts and submit a report to Apple.

@filipnavara
Copy link
Member

filipnavara commented Feb 4, 2024

Feedback sent to Apple (FB13584275). For reference, here's the repro with files and a build.sh script:
ld64-repro.zip

Broken linker will produce unsorted/corrupted unwind tables, which can be verified with objdump -u Baseline.

@filipnavara
Copy link
Member

filipnavara commented Feb 5, 2024

The bogus unwinding information comes from functions in the __unbox section. We currently don't generate any unwind information for that so presumably the linker generates its own "dummy" one and fails to sort it properly. I'll investigate whether generating it ourselves makes a difference.

Update: It does not. :/

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 12, 2024
@filipnavara
Copy link
Member

Response from Apple:

Thank you for filing the feedback report. This is a bug in our linker.

You can use the following workaround until a fix become available:
Please use the -Wl,-Id_classic linker option

@MichalStrehovsky MichalStrehovsky added the tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly label Feb 20, 2024
@jkotas jkotas changed the title [NativeAOT][8.0] Possible problems with building tests [NativeAOT] xcode15+ linker issue Feb 20, 2024
@ivanpovazan
Copy link
Member

Regarding:

Maybe the right course of action for 8.0 is to use -ld_classic until ld_prime becomes more stable?

Apple announced that they are removing this linker option in https://developer.apple.com/documentation/xcode-release-notes/xcode-16-release-notes so this problem becomes more pressing.

@filipnavara
Copy link
Member

Apple updated my Feedback tickets with request to try Xcode 16 and report back.

@filipnavara
Copy link
Member

filipnavara commented Jun 12, 2024

The original problem with corrupted unwind tables seems to be fixed [with Xcode 16 Beta 1] on the few small samples that I tried. We need to do a full run to make sure that there are no other issues.

@am11
Copy link
Member

am11 commented Sep 23, 2024

FWIW, with Xcode 16 (released last Monday), we get: ld: warning: -ld_classic is deprecated and will be removed in a future release with PublishAot=true.

@filipnavara
Copy link
Member

filipnavara commented Sep 23, 2024

FWIW, with Xcode 16 (released last Monday), we get: ld: warning: -ld_classic is deprecated and will be removed in a future release with PublishAot=true.

I am aware of this. I run the smoke tests with ld-prime from Xcode 16 release and it seems to be ok. Likewise, the macOS/iOS/tvOS workloads were updated last week to use ld-prime on Xcode 16+: xamarin/xamarin-macios#21231.

Technically we are still relying on some unwritten guarantees about code/data layout. I have two issues open with Apple to further investigate the usage of .alt_entry to mitigate that. The gist is that ld-prime has a bug with multiple symbols defined at the same address which could lead to incorrect dead-stripping (workaroundable with the N_NO_DEAD_STRIP flag on ILC output). Unfortunately, ld64 has a different bug which limits the number of consecutive .alt_entry symbols due to recursive algorithm running out of stack space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-NativeAOT-coreclr os-mac-os-x macOS aka OSX tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
Status: No status
Development

No branches or pull requests

7 participants