Fix PublishDotnetAot non-deterministic NETSDK1047 with RestoreForce#54222
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a non-deterministic NETSDK1047 failure in the PublishDotnetAot target during layout generation by ensuring the RID-specific restore always updates project.assets.json, even when a prior centralized restore has already produced an assets file.
Changes:
- Forces the
dotnet-aot.csprojrestore invoked fromPublishDotnetAotby settingRestoreForce=trueto bypass NuGet’s no-op restore optimization. - Expands the in-target comment to document why
RestoreForceis needed and whyRuntimeIdentifiers(plural) is intentionally used.
|
Is there an issue we can log to track
Determinism seems like a useful thing here? |
Absolutely - will research and create a nuget issue if there isn't one already. |
0f0ff75 to
025f1e2
Compare
Add RestoreForce=true to the PublishDotnetAot Restore invocation to bypass NuGet's no-op optimization, which non-deterministically skips updating project.assets.json when it was already written by Arcade's centralized restore without RuntimeIdentifier. Without RestoreForce, the Restore call with RuntimeIdentifiers (plural) sometimes adds the RID-specific target to the assets file and sometimes doesn't, causing intermittent NETSDK1047 errors on CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
025f1e2 to
76d6a9e
Compare
|
Failures are unrelated. |
|
I don't think that's the recommended approach here. @baronfel would you mind chime in here? Why isn't the initial restore sufficient / why can't the initial restore declare all RIDs + dependencies? |
|
We'd need to see a binlog to be super-sure, but yes the ideal is that as long as the project(s) are correctly specified (Publish* properties set, RuntimeIdentifier(s) set, the top-level restore should bring down all of the required assets without needing any other restore invocations to allow other publish operations to occur. |
|
@ViktorHofer @baronfel you are both right - this should be solved in the project by setting |
…sufficient Add RuntimeIdentifier= to dotnet-aot.csproj (conditioned on supported platforms) so that centralized NuGet restore via sdk.slnx generates the RID-specific target in project.assets.json. This eliminates the need for a separate restore during publish. Add --no-restore to the dotnet publish Exec command since the initial restore now produces the correct assets file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Updated based on feedback from @baronfel and @ViktorHofer:
Verified locally: restore produces both |
|
CI Results — Build 1413817 ✅ NETSDK1047 is fully resolved — zero occurrences across all platforms (was failing on main in 1413497). 4 unrelated failures:
The macOS This needs to be merged on red to unblock SDK forward-flow dotnet/dotnet#6524 |
|
Is there an issue tracking "The macOS ld_classic errors"? If this is merged won't those build errors still block the vmr flow? |
I'm not sure and official builds did not see this, so it's very puzzling - perhaps a difference in build environment. Will dig deeper. |
The Exec task re-parses linker output into MSBuild canonical error format, turning the macOS linker deprecation warning 'ld: warning: -ld_classic is deprecated' into 'ld(0,0): error :' which fails the build even though the native library is successfully produced. IgnoreStandardErrorWarningFormat prevents this re-parsing; actual failures are still caught by the non-zero exit code and the subsequent file existence check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Added The macOS Evidence: main build 1413497 AoT macOS leg succeeded — same linker warning appears as
|
|
Note: the |
|
The remaining failure (linux x64 TestBuild) is in Rerunning the failed job just in case. |
|
Linux test leg succeeded on retry - merging. |
The Exec-based child `dotnet publish` introduced via flow of dotnet/sdk#54222 caused the spawned process to rebuild dotnet-aot.csproj's ProjectReferences (Microsoft.DotNet.Cli.Utils, Cli.CoreUtils, NativeWrapper) without inheriting the outer build's MSBuild global properties (DotNetBuild=True, DotNetBuildFromVMR=True, Arcade/source-build flags, DebugType, signing, version overrides). The child's rebuilds clobbered PDBs the outer build had produced, breaking the outer Copy/Pack steps with MSB3030 / NU5026. This regressed dotnet-unified-build verticals on dnceng/internal (build 2972531, after flow PR #6524). Verified from the failing build's sdk binlog that: - The centralized sdk.slnx restore IS RID-aware (RuntimeIdentifier=$(TargetRid) flows from the dotnet-aot.csproj declaration added in dotnet/sdk#54222). - ProcessFrameworkReferences on dotnet-aot.csproj during the SolutionRestore evaluation logs "Added PackageDownload for Microsoft.NETCore.App.Runtime.NativeAOT.win-x64@11.0.0-preview.5.26261.113". - The pack lands on disk at artifacts/.packages/microsoft.netcore.app.runtime.nativeaot.win-x64/. So no separate Restore is required for runtime packs; the previous "Restore is NOT skipped here" rationale was based on a misdiagnosis (the actual failure was the PDB clobber, not missing runtime packs). Removing the Exec also removes the need for the BuildManager-interference workaround. With a single in-process <MSBuild Targets="Publish"> call, MSBuild reuses the outer build's already-built ProjectReferences from its BuildManager cache instead of re-running CoreCompile on them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Fixes the non-deterministic NETSDK1047 error in the
PublishDotnetAottarget that was introduced by #54175.Root Cause
Arcade's centralized NuGet restore writes
project.assets.jsonwithoutRuntimeIdentifier, so the assets file lacks the RID-specific target (e.g.net11.0/win-x64) that the NativeAOT Publish step requires.Previous attempts to fix this with nested
<MSBuild>task calls for Restore+Publish failed non-deterministically in CI:RuntimeIdentifiers(plural) does not generate RID-specific targets in the lock file — onlyRuntimeIdentifier(singular) doesRuntimeIdentifier(singular) +RestoreForce=true, the nested<MSBuild Targets="Restore">followed by<MSBuild Targets="Publish">fails in CI's multi-node parallel build, despite working correctly locally — likely due to BuildManager project caching/scheduling interference between the two separate evaluationsFix
Replace the two nested
<MSBuild>calls (Restore + Publish) with a single<Exec>that runsdotnet publishin a separate process:This ensures:
Verified locally: starting from a non-RID assets file (simulating centralized restore),
dotnet publish -r win-x64correctly restores, builds, and produces the native library.Previous CI Results