Skip to content

DWARF .debug_str "producer" string in shipped artifacts retaining debug info is non-deterministic #128159

@mthalman

Description

@mthalman

Summary

Shipping artifacts that retain DWARF debug info — notably the static archive libnethost.a, and any separated .dbg companion files — embed the compiler version into the DWARF .debug_str "producer" string. Two builds of the same source taken across a clang point-release bump produce different bytes in .debug_str solely because of this version digit, even when no other DWARF content has changed.

Why this matters

A reproducibility validator that wants byte equality for shipped binaries today has only one option: pin the toolchain (clang/LLD) version that produced the original build and use the exact same version for the rebuild. That works, but it's a workaround with a structural failure mode: whenever the .NET runtime's toolchain rotates, there is a synchronization window during which the validator's pinned toolchain doesn't match what new builds use, and validation breaks until it catches up.

Unlike .comment in stripped binaries (tracked separately as a sibling issue — see F in our investigation), .debug_str lives inside debug info that downstream consumers may legitimately depend on. Debuggers and symbolizers read the producer string, so any normalization or rewrite of this section has consumer-visible behavior — making this the harder of the two ELF metadata leaks to address.

Background

DWARF .debug_str is the deduplicated string pool for the debug-info section. Among other strings (file paths, type names, symbol names) it contains the compile unit's "producer" string, which Clang sets to its own version banner (e.g. clang version 22.1.4). Each .o produced by Clang carries this in its .debug_str; when shipped without strip (in .a archives or .dbg companion files), the string ends up in the released bits.

The .debug_str size and overall layout are stable across toolchain point releases; only the version digit in the producer string changes.

Observed behavior

libnethost.a (shipped as part of Microsoft.NETCore.App.Host.linux-x64) is an ar archive of unstripped .o files. All 7 of its .o members differ between two builds of the same dotnet/dotnet VMR commit, and the .debug_str delta is confined to the producer string.

Example: the 169-byte .debug_str of _version.c.o:

Build A: ["clang version 22.1.3", "/__w/1/s/src/runtime/artifacts/obj/_version.c", "/crossrootfs/x64",
          "/__w/1/s/src/runtime/artifacts/obj/linux-x64.Release", "sccsid", "char", "__ARRAY_SIZE_TYPE__"]
Build B: ["clang version 22.1.4", "/__w/1/s/src/runtime/artifacts/obj/_version.c", "/crossrootfs/x64",
          "/__w/1/s/src/runtime/artifacts/obj/linux-x64.Release", "sccsid", "char", "__ARRAY_SIZE_TYPE__"]

Every other byte (paths, type names, sccsid, etc.) is identical. The only delta is the producer-string version digit.

Affected files

  • packs/Microsoft.NETCore.App.Host.linux-x64/<ver>/runtimes/linux-x64/native/libnethost.a — all 7 .o members.
  • Any other shipping .a that retains debug info.
  • Any separated .dbg companion files shipped alongside stripped runtime libraries (these would carry the same .debug_str producer string drift, though we have not directly confirmed them in the diff because our reproducibility comparison ran on the SDK tarball contents).

Context

Found while building the SDK reproducibility validation test for the dotnet/dotnet VMR (dotnet/source-build#5486). Resolution of this issue is important to meet the goal of reproducible builds: dotnet/source-build#4963

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions