Skip to content

Fix PublishDotnetAot non-deterministic NETSDK1047 with RestoreForce#54222

Merged
NikolaMilosavljevic merged 3 commits into
dotnet:mainfrom
NikolaMilosavljevic:aot.nondeterministic.fix
May 8, 2026
Merged

Fix PublishDotnetAot non-deterministic NETSDK1047 with RestoreForce#54222
NikolaMilosavljevic merged 3 commits into
dotnet:mainfrom
NikolaMilosavljevic:aot.nondeterministic.fix

Conversation

@NikolaMilosavljevic
Copy link
Copy Markdown
Member

@NikolaMilosavljevic NikolaMilosavljevic commented May 7, 2026

Summary

Fixes the non-deterministic NETSDK1047 error in the PublishDotnetAot target that was introduced by #54175.

Root Cause

Arcade's centralized NuGet restore writes project.assets.json without RuntimeIdentifier, so the assets file lacks the RID-specific target (e.g. net11.0/win-x64) that the NativeAOT Publish step requires.

Previous attempts to fix this with nested <MSBuild> task calls for Restore+Publish failed non-deterministically in CI:

  • RuntimeIdentifiers (plural) does not generate RID-specific targets in the lock file — only RuntimeIdentifier (singular) does
  • Even with RuntimeIdentifier (singular) + RestoreForce=true, the nested <MSBuild Targets="Restore"> followed by <MSBuild Targets="Publish"> fails in CI's multi-node parallel build, despite working correctly locally — likely due to BuildManager project caching/scheduling interference between the two separate evaluations

Fix

Replace the two nested <MSBuild> calls (Restore + Publish) with a single <Exec> that runs dotnet publish in a separate process:

<Exec Command="&quot;$(DotNetTool)&quot; publish &quot;...dotnet-aot.csproj&quot; -c $(Configuration) -r $(TargetRid) -o &quot;$(_DotnetAotPublishDir)&quot;" />

This ensures:

  1. Implicit restore generates the RID-specific target in the assets file
  2. Publish immediately uses it in the same process invocation — no gap for interference
  3. Complete isolation from the outer parallel build's BuildManager caching/scheduling

Verified locally: starting from a non-RID assets file (simulating centralized restore), dotnet publish -r win-x64 correctly restores, builds, and produces the native library.

Previous CI Results

Build Approach Result
1409344 RuntimeIdentifiers (plural) without RestoreForce Pass (lucky)
1411015 Same code on main NETSDK1047
1412377 RuntimeIdentifiers (plural) + RestoreForce NETSDK1047 (plural doesn't create RID targets)
1412531 RuntimeIdentifier (singular) + RestoreForce NETSDK1047 (BuildManager interference)

Copilot AI review requested due to automatic review settings May 7, 2026 15:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a non-deterministic NETSDK1047 failure in the PublishDotnetAot target during layout generation by ensuring the RID-specific restore always updates project.assets.json, even when a prior centralized restore has already produced an assets file.

Changes:

  • Forces the dotnet-aot.csproj restore invoked from PublishDotnetAot by setting RestoreForce=true to bypass NuGet’s no-op restore optimization.
  • Expands the in-target comment to document why RestoreForce is needed and why RuntimeIdentifiers (plural) is intentionally used.

@baronfel
Copy link
Copy Markdown
Member

baronfel commented May 7, 2026

Is there an issue we can log to track

NuGet's no-op optimization non-deterministically skips updating the assets file when it detects the project was already restored

Determinism seems like a useful thing here?

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Is there an issue we can log to track

NuGet's no-op optimization non-deterministically skips updating the assets file when it detects the project was already restored

Determinism seems like a useful thing here?

Absolutely - will research and create a nuget issue if there isn't one already.

@NikolaMilosavljevic NikolaMilosavljevic force-pushed the aot.nondeterministic.fix branch from 0f0ff75 to 025f1e2 Compare May 7, 2026 17:23
Add RestoreForce=true to the PublishDotnetAot Restore invocation to
bypass NuGet's no-op optimization, which non-deterministically skips
updating project.assets.json when it was already written by Arcade's
centralized restore without RuntimeIdentifier.

Without RestoreForce, the Restore call with RuntimeIdentifiers (plural)
sometimes adds the RID-specific target to the assets file and sometimes
doesn't, causing intermittent NETSDK1047 errors on CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Failures are unrelated.

@ViktorHofer
Copy link
Copy Markdown
Member

ViktorHofer commented May 8, 2026

I don't think that's the recommended approach here. @baronfel would you mind chime in here? Why isn't the initial restore sufficient / why can't the initial restore declare all RIDs + dependencies?

@baronfel
Copy link
Copy Markdown
Member

baronfel commented May 8, 2026

We'd need to see a binlog to be super-sure, but yes the ideal is that as long as the project(s) are correctly specified (Publish* properties set, RuntimeIdentifier(s) set, the top-level restore should bring down all of the required assets without needing any other restore invocations to allow other publish operations to occur.

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

@ViktorHofer @baronfel you are both right - this should be solved in the project by setting RuntimeIdentifier. Testing this out.

…sufficient

Add RuntimeIdentifier= to dotnet-aot.csproj (conditioned on
supported platforms) so that centralized NuGet restore via sdk.slnx generates
the RID-specific target in project.assets.json. This eliminates the need for
a separate restore during publish.

Add --no-restore to the dotnet publish Exec command since the initial restore
now produces the correct assets file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Updated based on feedback from @baronfel and @ViktorHofer:

  • Added <RuntimeIdentifier>$(TargetRid)</RuntimeIdentifier> to dotnet-aot.csproj (conditioned on win/linux/osx) so that centralized NuGet restore generates the RID-specific target in project.assets.json
  • Added --no-restore to the dotnet publish command since the initial restore is now sufficient
  • Kept <Exec> (separate process) to avoid BuildManager scheduling interference in CI's multi-node parallel builds

Verified locally: restore produces both net11.0 and net11.0/win-x64 targets, publish with --no-restore succeeds, and ProjectReferences' assets files are unaffected.

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

NikolaMilosavljevic commented May 8, 2026

CI Results — Build 1413817

NETSDK1047 is fully resolved — zero occurrences across all platforms (was failing on main in 1413497).

4 unrelated failures:

Leg Error Also on main?
TestBuild: linux (x64) IOException file lock on TemplateEngine.Core.dll No (transient race)
AoT: macOS (x64) ld_classic deprecated Masked by NETSDK1047
TestBuild: macOS (x64) ld_classic deprecated Masked by NETSDK1047
TestBuild: windows (x64) dotnet-watch.Tests flaky test Yes (canceled/failed)

The macOS ld_classic errors are a pre-existing Apple toolchain issue that was hidden on main because builds failed earlier with NETSDK1047 before reaching the native linker.

This needs to be merged on red to unblock SDK forward-flow dotnet/dotnet#6524

@marcpopMSFT

@MichaelSimons
Copy link
Copy Markdown
Member

Is there an issue tracking "The macOS ld_classic errors"? If this is merged won't those build errors still block the vmr flow?

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Is there an issue tracking "The macOS ld_classic errors"? If this is merged won't those build errors still block the vmr flow?

I'm not sure and official builds did not see this, so it's very puzzling - perhaps a difference in build environment. Will dig deeper.

The Exec task re-parses linker output into MSBuild canonical error format,
turning the macOS linker deprecation warning 'ld: warning: -ld_classic is
deprecated' into 'ld(0,0): error :' which fails the build even though the
native library is successfully produced. IgnoreStandardErrorWarningFormat
prevents this re-parsing; actual failures are still caught by the non-zero
exit code and the subsequent file existence check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Added IgnoreStandardErrorWarningFormat='true' to the Exec task.

The macOS ld_classic failures were caused by our switch from <MSBuild> to <Exec>. The native linker emits ld: warning: -ld_classic is deprecated — harmless, and the native library is produced successfully. However, the Exec task re-parses process output using MSBuild's canonical error format, turning this into ld(0,0): error : which fails the build.

Evidence: main build 1413497 AoT macOS leg succeeded — same linker warning appears as ld: warning: (not re-parsed, since it uses <MSBuild> internally). Our PR build had the identical warning re-formatted into an error by the Exec task.

IgnoreStandardErrorWarningFormat prevents this re-parsing. Actual build failures are still caught by the non-zero exit code and the subsequent file existence check.

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

NikolaMilosavljevic commented May 8, 2026

Note: the IgnoreStandardErrorWarningFormat pattern follows the same approach used in Microsoft.NET.CrossGen.targets (lines 669-695), where it is set conditionally based on detecting Xcode 16 (which warns on -ld_classic). In our case we set it unconditionally because the -ld_classic flag originates from dotnet/runtime's native build (configurecompiler.cmake), invoked inside the dotnet publish subprocess — we have no visibility into which linker flags will be used. Actual build failures are still caught by the non-zero exit code and the subsequent file existence check.

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

NikolaMilosavljevic commented May 8, 2026

The remaining failure (linux x64 TestBuild) is in RunFileTests_BuildCommands (Build_Exe_MultiTarget, Pack_CustomPath) — tests for the file-based run feature (dotnet build file.cs, dotnet pack file.cs), unrelated to this PR. The same tests are not failing on main, so this may be a flaky test.

Rerunning the failed job just in case.

@NikolaMilosavljevic
Copy link
Copy Markdown
Member Author

Linux test leg succeeded on retry - merging.

@NikolaMilosavljevic NikolaMilosavljevic merged commit 9a363f2 into dotnet:main May 8, 2026
24 checks passed
dsplaisted added a commit to dotnet/dotnet that referenced this pull request May 13, 2026
The Exec-based child `dotnet publish` introduced via flow of dotnet/sdk#54222
caused the spawned process to rebuild dotnet-aot.csproj's ProjectReferences
(Microsoft.DotNet.Cli.Utils, Cli.CoreUtils, NativeWrapper) without inheriting
the outer build's MSBuild global properties (DotNetBuild=True,
DotNetBuildFromVMR=True, Arcade/source-build flags, DebugType, signing, version
overrides). The child's rebuilds clobbered PDBs the outer build had produced,
breaking the outer Copy/Pack steps with MSB3030 / NU5026. This regressed
dotnet-unified-build verticals on dnceng/internal (build 2972531, after flow
PR #6524).

Verified from the failing build's sdk binlog that:
  - The centralized sdk.slnx restore IS RID-aware
    (RuntimeIdentifier=$(TargetRid) flows from the dotnet-aot.csproj declaration
    added in dotnet/sdk#54222).
  - ProcessFrameworkReferences on dotnet-aot.csproj during the SolutionRestore
    evaluation logs "Added PackageDownload for
    Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113".
  - The pack lands on disk at
    artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/.
So no separate Restore is required for runtime packs; the previous "Restore is
NOT skipped here" rationale was based on a misdiagnosis (the actual failure was
the PDB clobber, not missing runtime packs).

Removing the Exec also removes the need for the BuildManager-interference
workaround. With a single in-process <MSBuild Targets="Publish"> call, MSBuild
reuses the outer build's already-built ProjectReferences from its BuildManager
cache instead of re-running CoreCompile on them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants