Skip to content

Commit

Permalink
Merge pull request #408 from Washi1337/development
Browse files Browse the repository at this point in the history
5.1.0
  • Loading branch information
Washi1337 committed Jan 29, 2023
2 parents 72f9b45 + 21630a9 commit d70c022
Show file tree
Hide file tree
Showing 73 changed files with 1,874 additions and 248 deletions.
2 changes: 1 addition & 1 deletion Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<RepositoryUrl>https://github.com/Washi1337/AsmResolver</RepositoryUrl>
<RepositoryType>git</RepositoryType>
<LangVersion>10</LangVersion>
<Version>5.0.0</Version>
<Version>5.1.0</Version>
</PropertyGroup>

</Project>
4 changes: 2 additions & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- master

image: Visual Studio 2022
version: 5.0.0-master-build.{build}
version: 5.1.0-master-build.{build}
configuration: Release

skip_commits:
Expand Down Expand Up @@ -33,7 +33,7 @@
- development

image: Visual Studio 2022
version: 5.0.0-dev-build.{build}
version: 5.1.0-dev-build.{build}
configuration: Release

skip_commits:
Expand Down
250 changes: 250 additions & 0 deletions docs/core/segments.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
.. _segments:

Reading and Writing File Segments
=================================

Segments are the basis of everything in AsmResolver.
They are the fundamental building blocks that together make up a binary file (such as a PE file).
Segments are organized as a tree, where the leaves are single contiguous chunk of memory, while the nodes are segments that comprise multiple smaller sub-segments.
The aim of segments is to abstract away the complicated mess that comes with calculating offsets, sizes and updating them accordingly, allowing programmers to easily read binary files, as well as construct new ones.

Every class that directly translates to a concrete segment in a file on the disk implements the ``ISegment`` interface.
In the following, some of the basics of ``ISegment`` as well as common examples will be introduced.


Basic Data Segments
-------------------

The simplest and arguably the most commonly used form of segment is the ``DataSegment`` class.
This is a class that wraps around a ``byte[]`` into an instance of ``ISegment``, allowing it to be used in any context where a segment are expected in AsmResolver.

.. code-block:: csharp
byte[] data = new byte[] { 1, 2, 3, 4 };
var segment = new DataSegment(data);
While the name of the ``DataSegment`` class implies it is used for defining literal data (such as a constant for a variable), it can be used to define *any* type of contiguous memory.
This also includes a raw code stream of a function body and sometimes entire program sections.


Reading Segment Contents
------------------------

Some implementations of ``ISegment`` (such as ``DataSegment``) allow for reading binary data directly.
Segments that allow for this implement ``IReadableSegment``, which defines a function ``CreateReader`` that can be used to create an instance of ``BinaryStreamReader`` that starts at the beginning of the raw contents of the segment.
This reader can then be used to read the contents of the segment.

.. code-block:: csharp
byte[] data = new byte[] { 1, 2, 3, 4 };
IReadableSegment segment = new DataSegment(data);
var reader = segment.CreateReader();
reader.ReadByte(); // returns 1
reader.ReadByte(); // returns 2
reader.ReadByte(); // returns 3
reader.ReadByte(); // returns 4
reader.ReadByte(); // throws EndOfStreamException.
Alternatively, a ``IReadableSegment`` can be turned into a ``byte[]`` quickly using the ``ToArray()`` method.

.. code-block:: csharp
byte[] data = new byte[] { 1, 2, 3, 4 };
IReadableSegment segment = new DataSegment(data);
byte[] allData = segment.ToArray(); // Returns { 1, 2, 3, 4 }
Composing new Segments
----------------------

Many segments comprise multiple smaller sub-segments.
For example, PE sections often do not contain just a single data structure, but are a collection of structures concatenated together.
To facilitate more complicated structures like these, the ``SegmentBuilder`` class can be used to combine ``ISegment`` instances into one effortlessly:

.. code-block:: csharp
var builder = new SegmentBuilder();
builder.Add(new DataSegment(...));
builder.Add(new DataSegment(...));
Many segments in an executable file format require segments to be aligned to a certain byte-boundary.
The ``SegmentBuilder::Add`` method allows for specifying this alignment, and automatically adjust the offsets and sizes accordingly:

.. code-block:: csharp
var builder = new SegmentBuilder();
// Add some segment with potentially a size that is not a multiple of 4 bytes.
builder.Add(new DataSegment(...));
// Ensure the next segment is aligned to a 4-byte boundary in the final file.
builder.Add(new DataSegment(...), alignment: 4);
Since ``SegmentBuilder`` implements ``ISegment`` itself, it can also be used within another ``SegmentBuilder``, allowing for recursive constructions like the following:

.. code-block:: csharp
var child = new SegmentBuilder();
child.Add(new DataSegment(...));
child.Add(new DataSegment(...));
var root = new SegmentBuilder();
root.Add(new DataSegment(...));
root.Add(child); // Nest segment builders into each other.
Resizing Segments at Runtime
----------------------------

Most segments in an executable file retain their size at runtime.
However, some segments (such as a ``.bss`` section in a PE file) may be resized upon mapping it into memory.
AsmResolver represents these segments using the ``VirtualSegment`` class:

.. code-block:: csharp
var physicalContents = new DataSegment(new byte[] {1, 2, 3, 4});
section.Contents = new VirtualSegment(physicalContents, 0x1000); // Create a new segment with a virtual size of 0x1000 bytes.
Patching Segments
-----------------

Some use-cases of AsmResolver require segments to be hot-patched with new data after serialization.
This is done via the ``PatchedSegment`` class.

Any segment can be wrapped into a ``PatchedSegment`` via its constructor:

.. code-block:: csharp
using AsmResolver.Patching;
ISegment segment = ...
var patchedSegment = new PatchedSegment(segment);
Alternatively, you can use (the preferred) fluent syntax:

.. code-block:: csharp
using AsmResolver.Patching;
ISegment segment = ...
var patchedSegment = segment.AsPatchedSegment();
Applying the patches can then be done by repeatedly calling one of the ``Patch`` method overloads.
Below is an example of patching a section within a PE file:

.. code-block:: csharp
var peFile = PEFile.FromFile("input.exe");
var section = peFile.Sections.First(s => s.Name == ".text");
var someSymbol = peImage
.Imports.First(m => m.Name == "ucrtbased.dll")
.Symbols.First(s => s.Name == "puts");
section.Contents = section.Contents.AsPatchedSegment() // Create patched segment.
.Patch(offset: 0x10, data: new byte[] {1, 2, 3, 4}) // Apply literal bytes patch
.Patch(offset: 0x20, AddressFixupType.Absolute64BitAddress, someSymbol); // Apply address fixup patch.
The patching API can be extended by implementing the ``IPatch`` yourself.


Calculating Offsets and Sizes
-----------------------------

Typically, the ``ISegment`` API aims to abstract away any raw offset, relative virtual address (RVA), and/or size of a data structure within a binary file.
However, in case the final offset and/or size of a segment still need to be determined and used (e.g., when implementing new segments), it is important to understand how this is done.

Two properties are responsible for representing the offsets:

- ``Offset``: The starting file or memory address of the segment.
- ``Rva``: The virtual address of the segment, relative to the executable's image base at runtime.


Typically, these properties are read-only and managed by AsmResolver itself.
However, to update the offsets and RVAs of a segment, you can call the ``UpdateOffsets`` method.
This method traverses the entire segment recursively, and updates the offsets accordingly.

.. code-block:: csharp
ISegment segment = ...
// Relocate a segment to an offsets-rva pair:
segment.UpdateOffsets(new RelocationParameters(offset: 0x200, rva: 0x2000);
Console.WriteLine("Offset: 0x{0:X8}", segment.Offset); // Prints 0x200
Console.WriteLine("Rva: 0x{0:X8}", segment.Rva); // Prints 0x2000
.. warning::
Try to call ``UpdateOffsets()`` as sparsely as possible.
The method does a full pass on the entire segment, and updates all offsets of all sub-segments as well.
It can thus be very inefficient to call them repeatedly.
The size (in bytes) of a segment can be calculated using either the ``GetPhysicalSize()`` or ``GetVirtualSize()``.
Typically, these two measurements are going to be equal, but for some segments (such as a ``VirtualSegment``) this may differ:
.. code-block:: csharp
ISegment segment = ...
// Measure the size of the segment:
uint physicalSize = segment.GetPhysicalSize();
uint virtualSize = segment.GetVirtualSize();
Console.WriteLine("Physical (File) Size: 0x{0:X8}", physicalSize);
Console.WriteLine("Virtual (Runtime) Size: 0x{0:X8}", virtualSize);
.. warning::
Only call ``GetPhysicalSize()`` and ``GetVirtualSize()`` whenever you know the offsets of the segment are up to date.
Due to padding requirements, many segments will have a slightly different size depending on the final file offset they are placed at.
.. warning::
Try to call ``GetPhysicalSize()`` and ``GetVirtualSize()`` as sparsely as possible.
These methods do a full pass on the entire segment, and measure the total amount of bytes required to represent it.
It can thus be very inefficient to call them repeatedly.
Serializing Segments
--------------------
Segments are serialized using the ``ISegment::Write`` method.
.. code-block:: csharp
ISegment segment = ...
using var stream = new MemoryStream();
segment.Write(new BinaryStreamWriter(stream));
byte[] serializedData = stream.ToArray();
Alternatively, you can quickly serialize a segment to a ``byte[]`` using the ``WriteIntoArray()`` extension method:
.. code-block:: csharp
ISegment segment = ...
byte[] serializedData = stream.WriteIntoArray();
.. warning::
Only call ``Write`` whenever you know the offsets of the segment are up to date.
Many segments will contain offsets to other segments in the file, which may not be accurate until all offsets are calculated.
4 changes: 2 additions & 2 deletions docs/dotnet/advanced-module-reading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Advanced Module Reading
=======================

Advanced users might have the need to configure AsmResolver's module reader. For example, instead of letting the module reader throw exceptions upon reading invalid data, errors should be ignored and recovered from. Other uses might include changing the way the underlying PE or method bodies are read. These kinds of settings can be configured using the ``ModuleReaderParameters`` class.
Advanced users might need to configure AsmResolver's module reader. For example, instead of letting the module reader throw exceptions upon reading invalid data, errors should be ignored and recovered from. Other uses might include changing the way the underlying PE or method bodies are read. These kinds of settings can be configured using the ``ModuleReaderParameters`` class.

.. code-block:: csharp
Expand Down Expand Up @@ -131,7 +131,7 @@ To let the reader use this implementation of the ``IMethodBodyReader``, set the
Custom Field RVA reading
------------------------

By default, the field RVA data storing the initial binary value of a field is interpreted as raw byte blobs, and are turned into instances of the ``DataSegment`` class. To adjust this behaviour, it is possible provide a custom implementation of the ``IFieldRvaDataReader`` interface.
By default, the field RVA data storing the initial binary value of a field is interpreted as raw byte blobs, and are turned into instances of the ``DataSegment`` class. To adjust this behaviour, it is possible to provide a custom implementation of the ``IFieldRvaDataReader`` interface.


.. code-block:: csharp
Expand Down
37 changes: 30 additions & 7 deletions docs/dotnet/advanced-pe-image-building.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The easiest way to write a .NET module to the disk is by using the ``Write`` met
This method is essentially a shortcut for invoking the ``ManagedPEImageBuilder`` and ``ManagedPEFileBuilder`` classes, and will completely reconstruct the PE image, serialize it into a PE file and write the PE file to the disk.

While this is easy, and would probably work for most .NET module processing, it does not provide much flexibility. To get more control over the construction of the new PE image, it is therefore not recommended to use a different overload of the ``Write`` method, were we pass on a custom ``IPEFileBuilder``, or a configured ``ManagedPEImageBuilder``:
While this is easy, and would probably work for most .NET module processing, it does not provide much flexibility. To get more control over the construction of the new PE image, it is therefore not recommended to use a different overload of the ``Write`` method, where we pass on a custom ``IPEFileBuilder``, or a configured ``ManagedPEImageBuilder``:

.. code-block:: csharp
Expand All @@ -22,7 +22,7 @@ While this is easy, and would probably work for most .NET module processing, it
module.Write(@"C:\Path\To\Output\Binary.exe", imageBuilder);
Alternatively, it is possible to call the ``CreateImage`` method directly. This allows for inspecting all build artifacts, as well as post processing of the constructed PE image before it is written to the disk.
Alternatively, it is possible to call the ``CreateImage`` method directly. This allows for inspecting all build artifacts, as well as post-processing of the constructed PE image before it is written to the disk.

.. code-block:: csharp
Expand Down Expand Up @@ -72,10 +72,34 @@ Some .NET modules are carefully crafted and rely on the raw structure of all met

- RIDs of rows within a metadata table.
- Indices of blobs within the ``#Blob``, ``#Strings``, ``#US`` or ``#GUID`` heaps.

The default PE image builder for .NET modules (``ManagedPEImageBuilder``) defines a property called ``DotNetDirectoryFactory``, which contains the object responsible for constructing the .NET data directory, can be configured to preserve as much of this structure as possible. With the help of the ``MetadataBuilderFlags`` enum, it is possible to indicate which structures of the metadata directory need to preserved.

Below an example on how to configure the image builder to preserve blob data and all metadata tokens to type references:
- Unknown or unconventional metadata streams and their order.

The default PE image builder for .NET modules (``ManagedPEImageBuilder``) defines a property called ``DotNetDirectoryFactory``, which contains the object responsible for constructing the .NET data directory, can be configured to preserve as much of this structure as possible. With the help of the ``MetadataBuilderFlags`` enum, it is possible to indicate which structures of the metadata directory need to preserved. The following table provides an overview of all preservation metadata builder flags that can be used and combined:

+----------------------------------------+-------------------------------------------------------------------+
| flag | Description |
+========================================+===================================================================+
| ``PreserveXXXIndices`` | Preserves all row indices of the original ``XXX`` metadata table. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveTableIndices`` | Preserves all row indices from all original metadata tables. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveBlobIndices`` | Preserves all blob indices in the ``#Blob`` stream. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveGuidIndices`` | Preserves all GUID indices in the ``#GUID`` stream. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveStringIndices`` | Preserves all string indices in the ``#Strings`` stream. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveUserStringIndices`` | Preserves all user-string indices in the ``#US`` stream. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveUnknownStreams`` | Preserves any of the unknown / unconventional metadata streams. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveStreamOrder`` | Preserves the original order of all metadata streams. |
+----------------------------------------+-------------------------------------------------------------------+
| ``PreserveAll`` | Preserves as much of the original metadata as possible. |
+----------------------------------------+-------------------------------------------------------------------+


Below is an example on how to configure the image builder to preserve blob data and all metadata tokens to type references:

.. code-block:: csharp
Expand All @@ -84,7 +108,6 @@ Below an example on how to configure the image builder to preserve blob data and
| MetadataBuilderFlags.PreserveTypeReferenceIndices;
imageBuilder.DotNetDirectoryFactory = factory;
If everything is supposed to be preserved as much as possible, then instead of specifying all flags defined in the ``MetadataBuilderFlags`` enum, we can also use ``MetadataBuilderFlags.PreserveAll`` as a shortcut.
.. warning::

Expand Down
Loading

0 comments on commit d70c022

Please sign in to comment.