Skip to content
/ cecil Public
forked from jbevain/cecil
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deterministic MVID and add PdbChecksum #31

Merged
merged 5 commits into from
Nov 11, 2021

Conversation

vitek-karas
Copy link
Member

@vitek-karas vitek-karas commented Nov 8, 2021

Make deterministic MVID unique even when the only change is in PE headers. Implement deterministic PDB ID generation independent of MVID and implement writing of PdbCheckum debug header entry.

Observable differences (other than the above):

  • ISymbolWriter now writes the PDB in GetDebugHeader (well except for MPDB writer which I didn't change as it's really tricky). Callers should not notice much of a difference if they assumed only Dispose would actually write.
  • Module.Write(stream now requires the stream to be read/write. It already did so for cases where strong name was written, now it requires it also when deterministic MVID is turned on.

Deterministic MVID
So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers.
This means that if there's an assembly which only differs in the PE architecture it would have the same MVID.
This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically
identical across platforms, possibly except for the architecture.
The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and
the strong name signature (if presenet) zeroed out.
This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes,
compute the hash, write the MVID and then compute the strong name.

PdbChecksum
In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum
debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum
is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19.
This change implements that behavior. Symbols are now written explicitly once all metadata is processed
but before the final assembly image is written. For portable PDBs the full file is written with PDB ID
set to all zeroes. Then the crypto hash is calculated (using SHA256).
The hash is then written into the PdbChecksum debug header entry of the assembly image.

Deterministic Portable PDB ID
So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't
work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash
with the PDB embedded, as the PDB would now contain the MVID (cycle).
The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum
and use the first 20 bytes of the has as the PDB ID.
This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last
change of the portable PDB.
The calculated PDB ID is also then set into the CodeView debug header entry of the assembly.

Added tests for reading and writing both portable and embedded portable PDBs with checksums.
Added test to validate stability of PDB ID across multiple writes.
Added test to validate stability of MVID across multiple writes and the fact that it changes
when just the PE architecture is changed.

This is a fix for dotnet/linker#2203.

Deterministic MVID
So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers.
This means that if there's an assembly which only differs in the PE architecture it would have the same MVID.
This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically
identical across platforms, possibly except for the architecture.
The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and
the strong name signature (if presenet) zeroed out.
This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes,
compute the hash, write the MVID and then compute the strong name.

PdbChecksum
In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum
debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum
is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19.
This change implements that behavior. Symbols are now written explicitly once all metadata is processed
but before the final assembly image is written. For portable PDBs the full file is written with PDB ID
set to all zeroes. Then the crypto hash is calculated (using SHA256).
The hash is then written into the PdbChecksum debug header entry of the assembly image.

Deterministic Portable PDB ID
So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't
work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash
with the PDB embedded, as the PDB would now contain the MVID (cycle).
The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum
and use the first 20 bytes of the has as the PDB ID.
This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last
change of the portable PDB.
The calculated PDB ID is also then set into the CodeView debug header entry of the assembly.

Added tests for reading and writing both portable and embedded portable PDBs with checksums.
Added test to validate stability of PDB ID across multiple writes.
Added test to validate stability of MVID across multiple writes and the fact that it changes
when just the PE architecture is changed.
@@ -171,7 +171,7 @@ public ISymbolReader GetSymbolReader (ModuleDefinition module, string fileName)
throw new InvalidOperationException ();

return new EmbeddedPortablePdbReader (
(PortablePdbReader) new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry)));
(PortablePdbReader)new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo any line damage or formating changes, they make upstream merges pain

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I hit something in VS which reformatted the entire file. I'm so glad we have lint in dotnet/linker 😉.
Anyway - hopefully I was able to revert all of these now.

@vitek-karas vitek-karas changed the title [WIP] Fix deterministic MVID and add PdbChecksum Fix deterministic MVID and add PdbChecksum Nov 10, 2021
@vitek-karas
Copy link
Member Author

This should be fully ready now.

Copy link

@marek-safar marek-safar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to make upstream PR after you merge it here

@marek-safar marek-safar merged commit ff616bf into dotnet:main Nov 11, 2021
@vitek-karas vitek-karas deleted the MVIDArchSensitive branch November 11, 2021 17:54
@vitek-karas
Copy link
Member Author

Ported this to jbevain/cecil as jbevain#810.

vitek-karas added a commit to vitek-karas/linker that referenced this pull request Nov 18, 2021
The new Cecil fixes the deterministic MVID and PDB checksum.
Details in dotnet/cecil#31.
vitek-karas added a commit to dotnet/linker that referenced this pull request Nov 18, 2021
The new Cecil fixes the deterministic MVID and PDB checksum.
Details in dotnet/cecil#31.
marek-safar pushed a commit that referenced this pull request Jan 20, 2022
* Fix deterministic MVID and add PdbChecksum (#31)

* Fix how pdb path is calculated in the tests

* Fix portable PDB stamp in CodeView header (#32)

* Introduce ISymbolWriter.Write

This mostly cleans up the code to make it easier to understand. `ISymbolWriter.GetDebugHeader` no longer actually writes the symbols, there's a new `Write` method for just that.

The assembly writer calls `Write` first and then the image writer calls `GetDebugHeader` when it's needed.

This is partially taken from jbevain#617.
marek-safar added a commit that referenced this pull request Jan 20, 2022
agocke pushed a commit to dotnet/runtime that referenced this pull request Nov 16, 2022
…otnet/linker#2384)

The new Cecil fixes the deterministic MVID and PDB checksum.
Details in dotnet/cecil#31.

Commit migrated from dotnet/linker@4f4a670
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants