Skip to content

Use MSBuild task instead of Exec for PublishDotnetAot#6576

Merged
dsplaisted merged 2 commits into
dotnet:mainfrom
dsplaisted:fix-publishdotnetaot-exec
May 13, 2026
Merged

Use MSBuild task instead of Exec for PublishDotnetAot#6576
dsplaisted merged 2 commits into
dotnet:mainfrom
dsplaisted:fix-publishdotnetaot-exec

Conversation

@dsplaisted
Copy link
Copy Markdown
Member

@dsplaisted dsplaisted commented May 12, 2026

Summary

Replace the <Exec> of dotnet publish in PublishDotnetAot with an in-process <MSBuild Targets="Publish" ...> task call. The Exec was the root cause of the dotnet-unified-build internal regression on dnceng/internal (build 2972531) — see error details below.

Root cause

The Exec-based child dotnet publish introduced by dotnet/sdk#54222 and adjusted by 73010df (manual VMR fix removing --no-restore) spawned a fresh MSBuild process to publish dotnet-aot.csproj. The child process:

  1. Did not inherit the outer build's MSBuild global properties (DotNetBuild=True, DotNetBuildFromVMR=True, Arcade / source-build flags, DebugType, signing, version overrides, etc.).
  2. Re-Built dotnet-aot.csproj's ProjectReferences — Microsoft.DotNet.Cli.Utils, Cli.CoreUtils, NativeWrapper — from scratch with different evaluated properties.
  3. Clobbered the PDBs the outer build had produced, writing them with different DebugType/symbol settings (or not producing them at all).

The outer build's subsequent Copy and Pack steps then failed:

MSB3030: Could not copy the file
  "...\artifacts\bin\Microsoft.DotNet.Cli.CoreUtils\Release\net11.0\Microsoft.DotNet.Cli.CoreUtils.pdb"
because it was not found.

NU5026: The file
  "...\artifacts\bin\Microsoft.DotNet.Cli.Utils\Release\net11.0\Microsoft.DotNet.Cli.Utils.pdb"
to be packed was not found on disk.

This regression doesn't surface in dotnet/sdk public CI because public CI doesn't run the source-build Copy/Pack steps that consume those PDBs — the clobber still happens, it's just invisible.

Why an in-process <MSBuild> task fixes it

An in-process <MSBuild Targets="Publish" ...> call shares the outer build's BuildManager. When that call needs dotnet-aot.csproj's ProjectReferences, the BuildManager sees they were already built earlier in the same session (matching Configuration and global properties) and returns the cached results instead of rebuilding. The PDBs from the outer build stay intact.

Why this does NOT bring back NETSDK1112 (Marc's stated reason for removing --no-restore)

Verified directly from the failing build's sdk binlog (Windows_x64_BuildLogs_Attempt1/artifacts/log/Release/sdk/Build.binlog from build 2972531):

  • The centralized sdk.slnx restore in the VMR is RID-aware. The SolutionRestore evaluation of dotnet-aot.csproj has RuntimeIdentifier=win-x64 (flowing from the conditional declaration in dotnet-aot.csproj added in Fix PublishDotnetAot non-deterministic NETSDK1047 with RestoreForce sdk#54222).

  • ProcessFrameworkReferences on dotnet-aot.csproj during the SolutionRestore evaluation logs:

    Adding tool pack ILCompiler for runtime 11.0
    ...
    Added ILCompiler runtime pack 'runtime.win-x64.Microsoft.DotNet.ILCompiler@11.0.0-preview.5.26261.113'
    Checking for cross-targeting compilation packs for win-x64
    Added Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113 for cross-targeting compilation for win-x64
    Added PackageDownload for Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113 for cross-targeting compilation for win-x64
    
  • The pack lands on disk at artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/11.0.0-preview.5.26261.113.

So the NativeAOT runtime pack is reliably present after the centralized restore — no separate Restore call is needed. The previous "Restore is NOT skipped here because runtime packs are only downloaded during a RID-aware restore" rationale was based on a misdiagnosis: the actual VMR failure was the PDB clobber, not missing runtime packs.

Validation

  • Locally validated the equivalent fix on dotnet/sdk main with a clean build.cmd -ci -c Release: 0 errors, AOT shared library (dotnet-aot.dll) produced and placed in the SDK layout as expected.
  • Full VMR build not run locally (impractical). A private dotnet-unified-build run from this PR is the right validation; the binlog evidence above makes the fix high-confidence.

cc @marcpopMSFT (who authored the previous VMR manual fix 73010df — the NU5026/MSB3030 in build 2972531 is what motivated finding the real root cause)

The Exec-based child `dotnet publish` introduced via flow of dotnet/sdk#54222
caused the spawned process to rebuild dotnet-aot.csproj's ProjectReferences
(Microsoft.DotNet.Cli.Utils, Cli.CoreUtils, NativeWrapper) without inheriting
the outer build's MSBuild global properties (DotNetBuild=True,
DotNetBuildFromVMR=True, Arcade/source-build flags, DebugType, signing, version
overrides). The child's rebuilds clobbered PDBs the outer build had produced,
breaking the outer Copy/Pack steps with MSB3030 / NU5026. This regressed
dotnet-unified-build verticals on dnceng/internal (build 2972531, after flow
PR dotnet#6524).

Verified from the failing build's sdk binlog that:
  - The centralized sdk.slnx restore IS RID-aware
    (RuntimeIdentifier=$(TargetRid) flows from the dotnet-aot.csproj declaration
    added in dotnet/sdk#54222).
  - ProcessFrameworkReferences on dotnet-aot.csproj during the SolutionRestore
    evaluation logs "Added PackageDownload for
    Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113".
  - The pack lands on disk at
    artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/.
So no separate Restore is required for runtime packs; the previous "Restore is
NOT skipped here" rationale was based on a misdiagnosis (the actual failure was
the PDB clobber, not missing runtime packs).

Removing the Exec also removes the need for the BuildManager-interference
workaround. With a single in-process <MSBuild Targets="Publish"> call, MSBuild
reuses the outer build's already-built ProjectReferences from its BuildManager
cache instead of re-running CoreCompile on them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 12, 2026 20:52
@dotnet-policy-service dotnet-policy-service Bot requested a review from a team May 12, 2026 20:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SDK layout build to publish the dotnet-aot NativeAOT shared library using an in-process <MSBuild Targets="Publish" ...> invocation instead of spawning a separate dotnet publish process via <Exec>. This aligns the publish with the outer build’s BuildManager session (and its global properties), addressing the reported internal unified-build regression where a child publish rebuilt project references under different settings and clobbered previously produced PDBs.

Changes:

  • Replace the dotnet publish <Exec> in PublishDotnetAot with an in-proc <MSBuild Targets="Publish" ...> call.
  • Update the surrounding comment to document why in-proc publish is required and why a separate restore isn’t needed in this context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ericstj
Copy link
Copy Markdown
Member

ericstj commented May 12, 2026

Will this target/MSBuild call ever race with another build of this project? This shares many of the same directories for the configuration, but's being built with different global properties, which break MSBuild's project build cache. It might be OK if it never races with the actual project build.

…ties

@ericstj noted that passing RuntimeIdentifier/PublishDir as global properties
to <MSBuild Targets="Publish"> creates a separate BuildManager project instance
from the outer build's evaluation of dotnet-aot.csproj. Because redist.csproj
has no ProjectReference to dotnet-aot.csproj, the outer Build and this
PublishDotnetAot path can in principle run on different parallel build nodes
and race over the shared obj\ and bin\ directories.

Drop the global properties so this call shares the outer build's project
instance (BuildManager cache reuse, no race). The csproj's conditional
<RuntimeIdentifier>=$(TargetRid) declaration sets the RID as a non-global
property during evaluation, which is enough.

Also call the PublishItemsOutputGroup target and capture its TargetOutputs to
discover where Publish wrote the native library, instead of hard-coding the
publish path. The published item's OutputPath metadata is the absolute path of
the published file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsplaisted
Copy link
Copy Markdown
Member Author

Will this target/MSBuild call ever race with another build of this project? This shares many of the same directories for the configuration, but's being built with different global properties, which break MSBuild's project build cache. It might be OK if it never races with the actual project build.

I've fixed this so that it no longer sets any global properties on the call to Publish. I think this will probably fix the issues.

/cc @NikolaMilosavljevic @ViktorHofer @MichaelSimons @JeremyKuhne

Copy link
Copy Markdown
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better.

@dsplaisted dsplaisted enabled auto-merge May 13, 2026 01:28
@dsplaisted dsplaisted merged commit dab6d9c into dotnet:main May 13, 2026
11 checks passed
@MichalStrehovsky
Copy link
Copy Markdown
Member

Running the Publish target directly without also setting the _IsPublishing property to true has been a source of subtle issues in the past because just running the Publish target is not enough to get all the behaviors.

kotlarmilos pushed a commit to kotlarmilos/dotnet that referenced this pull request May 14, 2026
dotnet#6576 changed PublishDotnetAot to call <MSBuild Targets="Publish;PublishItemsOutputGroup">
and filter the published item list by '%(Filename)%(Extension)' to find the
NativeAOT shared library. On macOS the dSYM bundle's inner Mach-O lives at
libdotnet-aot.dylib.dSYM/Contents/Resources/DWARF/libdotnet-aot.dylib, so it
has the same %(Filename)%(Extension) as the real dylib. Both items pass the
filter, both get copied to $(OutputPath) via DestinationFolder, and the dSYM
Mach-O (filetype MH_DSYM = 10) ends up overwriting the real dylib in the SDK
layout. Any 'dotnet' invocation that hits the muxer's AOT fast path then
fails dlopen with 'unloadable mach-o file type 10'.

Match on %(TargetPath) (the file's relative path under the publish output)
instead; the dSYM inner Mach-O has TargetPath
'libdotnet-aot.dylib.dSYM/Contents/Resources/DWARF/libdotnet-aot.dylib',
which no longer matches '$(_DotnetAotNativeLibName)'. The real dylib's
TargetPath is just 'libdotnet-aot.dylib'.

Verified by inspecting the upstream osx-arm64 SDK tarball:
  before: sdk/<version>/libdotnet-aot.dylib is 'Mach-O 64-bit dSYM companion file arm64'
  after:  expected to be 'Mach-O 64-bit dynamically linked shared library arm64'

Fixes dotnet/sdk#54296

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dsplaisted added a commit that referenced this pull request May 15, 2026
Per @MichalStrehovsky on #6576, running the Publish target
directly without also setting _IsPublishing=true has been a source of subtle
issues in the past, because just running Publish is not enough to get all
the Publish-time behaviors. _IsPublishing is the property the SDK uses to
gate Publish-aware logic in Microsoft.NET.RuntimeIdentifierInference.targets
and Microsoft.NET.Publish.targets (e.g. suppressing apphost generation when
PublishAot is true, AOT-specific error conditions, etc.).

This addresses the AOT NETSdkError condition guarded by
'$(PublishAot)' == 'true' and '$(_IsPublishing)' != 'true' and ...

Note: passing _IsPublishing=true as a global property means BuildManager
treats this call as a different project instance from the outer solution
build's evaluation of dotnet-aot.csproj. The previous form (no global
properties) deliberately shared the instance to avoid that race. With this
change, dotnet-aot.csproj is now evaluated three times (SolutionRestore,
outer Build, separate Publish-with-_IsPublishing), and the separate
Publish call's Build dependency could in principle race with the outer
Build over the shared bin\ and obj\ paths. In practice the surface is
narrow because dotnet-aot is OutputType=Library + NativeLib=Shared
(no apphost, no exe-specific behaviors that differ between the two
evaluations) and the outer Build's outputs are idempotent across the
two evaluations, but the structural race remains.

If this race needs to be eliminated (and not just narrowed), the next step
is to add a ProjectReference from redist.csproj to dotnet-aot.csproj with
ReferenceOutputAssembly=false / SkipGetTargetFrameworkProperties=true so
the outer Build is guaranteed to complete before redist's GenerateSdkLayout
target runs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants