Use MSBuild task instead of Exec for PublishDotnetAot#6576
Conversation
The Exec-based child `dotnet publish` introduced via flow of dotnet/sdk#54222 caused the spawned process to rebuild dotnet-aot.csproj's ProjectReferences (Microsoft.DotNet.Cli.Utils, Cli.CoreUtils, NativeWrapper) without inheriting the outer build's MSBuild global properties (DotNetBuild=True, DotNetBuildFromVMR=True, Arcade/source-build flags, DebugType, signing, version overrides). The child's rebuilds clobbered PDBs the outer build had produced, breaking the outer Copy/Pack steps with MSB3030 / NU5026. This regressed dotnet-unified-build verticals on dnceng/internal (build 2972531, after flow PR dotnet#6524). Verified from the failing build's sdk binlog that: - The centralized sdk.slnx restore IS RID-aware (RuntimeIdentifier=$(TargetRid) flows from the dotnet-aot.csproj declaration added in dotnet/sdk#54222). - ProcessFrameworkReferences on dotnet-aot.csproj during the SolutionRestore evaluation logs "Added PackageDownload for Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113". - The pack lands on disk at artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/. So no separate Restore is required for runtime packs; the previous "Restore is NOT skipped here" rationale was based on a misdiagnosis (the actual failure was the PDB clobber, not missing runtime packs). Removing the Exec also removes the need for the BuildManager-interference workaround. With a single in-process <MSBuild Targets="Publish"> call, MSBuild reuses the outer build's already-built ProjectReferences from its BuildManager cache instead of re-running CoreCompile on them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the SDK layout build to publish the dotnet-aot NativeAOT shared library using an in-process <MSBuild Targets="Publish" ...> invocation instead of spawning a separate dotnet publish process via <Exec>. This aligns the publish with the outer build’s BuildManager session (and its global properties), addressing the reported internal unified-build regression where a child publish rebuilt project references under different settings and clobbered previously produced PDBs.
Changes:
- Replace the
dotnet publish<Exec>inPublishDotnetAotwith an in-proc<MSBuild Targets="Publish" ...>call. - Update the surrounding comment to document why in-proc publish is required and why a separate restore isn’t needed in this context.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Will this target/MSBuild call ever race with another build of this project? This shares many of the same directories for the configuration, but's being built with different global properties, which break MSBuild's project build cache. It might be OK if it never races with the actual project build. |
…ties @ericstj noted that passing RuntimeIdentifier/PublishDir as global properties to <MSBuild Targets="Publish"> creates a separate BuildManager project instance from the outer build's evaluation of dotnet-aot.csproj. Because redist.csproj has no ProjectReference to dotnet-aot.csproj, the outer Build and this PublishDotnetAot path can in principle run on different parallel build nodes and race over the shared obj\ and bin\ directories. Drop the global properties so this call shares the outer build's project instance (BuildManager cache reuse, no race). The csproj's conditional <RuntimeIdentifier>=$(TargetRid) declaration sets the RID as a non-global property during evaluation, which is enough. Also call the PublishItemsOutputGroup target and capture its TargetOutputs to discover where Publish wrote the native library, instead of hard-coding the publish path. The published item's OutputPath metadata is the absolute path of the published file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I've fixed this so that it no longer sets any global properties on the call to Publish. I think this will probably fix the issues. /cc @NikolaMilosavljevic @ViktorHofer @MichaelSimons @JeremyKuhne |
|
Running the |
dotnet#6576 changed PublishDotnetAot to call <MSBuild Targets="Publish;PublishItemsOutputGroup"> and filter the published item list by '%(Filename)%(Extension)' to find the NativeAOT shared library. On macOS the dSYM bundle's inner Mach-O lives at libdotnet-aot.dylib.dSYM/Contents/Resources/DWARF/libdotnet-aot.dylib, so it has the same %(Filename)%(Extension) as the real dylib. Both items pass the filter, both get copied to $(OutputPath) via DestinationFolder, and the dSYM Mach-O (filetype MH_DSYM = 10) ends up overwriting the real dylib in the SDK layout. Any 'dotnet' invocation that hits the muxer's AOT fast path then fails dlopen with 'unloadable mach-o file type 10'. Match on %(TargetPath) (the file's relative path under the publish output) instead; the dSYM inner Mach-O has TargetPath 'libdotnet-aot.dylib.dSYM/Contents/Resources/DWARF/libdotnet-aot.dylib', which no longer matches '$(_DotnetAotNativeLibName)'. The real dylib's TargetPath is just 'libdotnet-aot.dylib'. Verified by inspecting the upstream osx-arm64 SDK tarball: before: sdk/<version>/libdotnet-aot.dylib is 'Mach-O 64-bit dSYM companion file arm64' after: expected to be 'Mach-O 64-bit dynamically linked shared library arm64' Fixes dotnet/sdk#54296 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per @MichalStrehovsky on #6576, running the Publish target directly without also setting _IsPublishing=true has been a source of subtle issues in the past, because just running Publish is not enough to get all the Publish-time behaviors. _IsPublishing is the property the SDK uses to gate Publish-aware logic in Microsoft.NET.RuntimeIdentifierInference.targets and Microsoft.NET.Publish.targets (e.g. suppressing apphost generation when PublishAot is true, AOT-specific error conditions, etc.). This addresses the AOT NETSdkError condition guarded by '$(PublishAot)' == 'true' and '$(_IsPublishing)' != 'true' and ... Note: passing _IsPublishing=true as a global property means BuildManager treats this call as a different project instance from the outer solution build's evaluation of dotnet-aot.csproj. The previous form (no global properties) deliberately shared the instance to avoid that race. With this change, dotnet-aot.csproj is now evaluated three times (SolutionRestore, outer Build, separate Publish-with-_IsPublishing), and the separate Publish call's Build dependency could in principle race with the outer Build over the shared bin\ and obj\ paths. In practice the surface is narrow because dotnet-aot is OutputType=Library + NativeLib=Shared (no apphost, no exe-specific behaviors that differ between the two evaluations) and the outer Build's outputs are idempotent across the two evaluations, but the structural race remains. If this race needs to be eliminated (and not just narrowed), the next step is to add a ProjectReference from redist.csproj to dotnet-aot.csproj with ReferenceOutputAssembly=false / SkipGetTargetFrameworkProperties=true so the outer Build is guaranteed to complete before redist's GenerateSdkLayout target runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Replace the
<Exec>ofdotnet publishinPublishDotnetAotwith an in-process<MSBuild Targets="Publish" ...>task call. The Exec was the root cause of the dotnet-unified-build internal regression on dnceng/internal (build 2972531) — see error details below.Root cause
The Exec-based child
dotnet publishintroduced bydotnet/sdk#54222and adjusted by73010df(manual VMR fix removing--no-restore) spawned a fresh MSBuild process to publishdotnet-aot.csproj. The child process:DotNetBuild=True,DotNetBuildFromVMR=True, Arcade / source-build flags, DebugType, signing, version overrides, etc.).dotnet-aot.csproj'sProjectReferences —Microsoft.DotNet.Cli.Utils,Cli.CoreUtils,NativeWrapper— from scratch with different evaluated properties.The outer build's subsequent Copy and Pack steps then failed:
This regression doesn't surface in dotnet/sdk public CI because public CI doesn't run the source-build Copy/Pack steps that consume those PDBs — the clobber still happens, it's just invisible.
Why an in-process
<MSBuild>task fixes itAn in-process
<MSBuild Targets="Publish" ...>call shares the outer build's BuildManager. When that call needsdotnet-aot.csproj'sProjectReferences, the BuildManager sees they were already built earlier in the same session (matching Configuration and global properties) and returns the cached results instead of rebuilding. The PDBs from the outer build stay intact.Why this does NOT bring back NETSDK1112 (Marc's stated reason for removing
--no-restore)Verified directly from the failing build's sdk binlog (
Windows_x64_BuildLogs_Attempt1/artifacts/log/Release/sdk/Build.binlogfrom build 2972531):The centralized
sdk.slnxrestore in the VMR is RID-aware. The SolutionRestore evaluation ofdotnet-aot.csprojhasRuntimeIdentifier=win-x64(flowing from the conditional declaration in dotnet-aot.csproj added in Fix PublishDotnetAot non-deterministic NETSDK1047 with RestoreForce sdk#54222).ProcessFrameworkReferencesondotnet-aot.csprojduring the SolutionRestore evaluation logs:The pack lands on disk at
artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/11.0.0-preview.5.26261.113.So the NativeAOT runtime pack is reliably present after the centralized restore — no separate Restore call is needed. The previous "Restore is NOT skipped here because runtime packs are only downloaded during a RID-aware restore" rationale was based on a misdiagnosis: the actual VMR failure was the PDB clobber, not missing runtime packs.
Validation
build.cmd -ci -c Release: 0 errors, AOT shared library (dotnet-aot.dll) produced and placed in the SDK layout as expected.cc @marcpopMSFT (who authored the previous VMR manual fix
73010df— the NU5026/MSB3030 in build 2972531 is what motivated finding the real root cause)