Skip to content

Commit

Permalink
Merge pull request #9307 from JanKrivanek/proto/binlog-events-offsets
Browse files Browse the repository at this point in the history
Binlogs Redacting support + Binlogs forward-compatibility reading support
  • Loading branch information
JanKrivanek committed Jan 16, 2024
2 parents d0af670 + 38f5791 commit d53e436
Show file tree
Hide file tree
Showing 40 changed files with 3,795 additions and 471 deletions.
98 changes: 96 additions & 2 deletions documentation/wiki/Binary-Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ It is a `GZipStream`-compressed binary stream of serialized `BuildEventArgs` obj

## Incrementing the file format

Every .binlog file has the first three bytes that indicate the file version. The current file format version is 9 (`00 00 09`).
Every .binlog file has the first four bytes that indicate the file version. The current file format is indicated in [`BinaryLogger.cs`](/src/Build/Logging/BinaryLogger/BinaryLogger.cs).

When incrementing the file format, keep this in mind:
* Increment the version and add a summary of the changes: https://github.com/dotnet/msbuild/blob/main/src/Build/Logging/BinaryLogger/BinaryLogger.cs#L22
Expand All @@ -94,4 +94,98 @@ When incrementing the file format, keep this in mind:

The format is backwards compatible, i.e. MSBuild will be able to play back .binlog files created with an older version of MSBuild. The Viewer will also be able to open files of any older version. Since the viewer updates automatically and I can push out updates easily, we can consider the Viewer is always able to read all .binlogs.

However MSBuild of version 15.3 won't be able to read .binlogs created with MSBuild version 15.6. This means the format is unfortunately not forwards-compatible. It is not self-describing, i.e. it doesn't carry its schema around for performance and compactness reasons. This is not a problem with a Viewer because Viewer is always up-to-date (there isn't an "old version" of the Viewer unless people go to great lengths to prevent it from auto-updating).
## Forward compatibility reading

From version 18, the binlog contains as well the minimum version of reader that can interpret it (stored in bytes 4 to 8). Support for best effort forward compatibility is added by this version. It is “best effort” only because the binlog format is not self-describing, i.e. it doesn't carry its schema around for performance and compactness reasons.

This is not of a high importance for users of the Viewer because Viewer is always up-to-date (there isn't an "old version" of the Viewer unless people go to great lengths to prevent it from auto-updating).

## Reading API

We recommend usage of `BinaryLogReplayEventSource`. It provides simplified helpers for creating and configuring `BuildEventArgsReader` and subscribing to the events.

```csharp
var logReader = new BinaryLogReplayEventSource()
{
AllowForwardCompatibility = true
};

// Handling of the structured events contained within the log
logReader.AnyEventRaised += (_, e) =>
{
if (e is BuildErrorEventArgs error)
{
//...
}

// ...
};

// Starts the synchronous log reading loop.
logReader.Replay(path_to_binlog_file);

```

### Handling the recoverable reading errors

In compatibility mode (default for `BinaryLogReplayEventSource`. Only supported for binlogs of version 18 and higher) reader is capable of skipping unknown event types and unknown parts of known events (`BuildEventArgsReader` can configure the behavior via 2 separate properties - `SkipUnknownEvents` and `SkipUnknownEventParts`).

The unknown events and event parts are regarded as recoverable errors, since the reader is able to continue reading subsequent records in the binlog. However the specific user logic should have the last call in deciding whether errors are really recoverable (e.g. is presence of unrecognized or unparseable event ok? It might be fine when searching only for specific events - e.g. errors but not acceptable when trying to provide definitive overview of the built).

To allow the calling code to decide - based on the type of error, type of events getting the error, or the number of errors - the `RecoverableReadError` event is exposed (from both `BinaryLogReplayEventSource` and `BuildEventArgsReader`).

```csharp
/// <summary>
/// An event args for <see cref="IBinaryLogReaderErrors.RecoverableReadError"/> event.
/// </summary>
public sealed class BinaryLogReaderErrorEventArgs : EventArgs
{
/// <summary>
/// Type of the error that occurred during reading.
/// </summary>
public ReaderErrorType ErrorType { get; }

/// <summary>
/// Kind of the record that encountered the error.
/// </summary>
public BinaryLogRecordKind RecordKind { get; }

/// <summary>
/// Materializes the error message.
/// Until it's called the error message is not materialized and no string allocations are made.
/// </summary>
/// <returns>The error message.</returns>
public string GetFormattedMessage() => _formatErrorMessage();
}

/// <summary>
/// Receives recoverable errors during reading.
/// Communicates type of the error, kind of the record that encountered the error and the message detailing the error.
/// In case of <see cref="ReaderErrorType.UnknownEventData"/> this is raised before returning the structured representation of a build event
/// that has some extra unknown data in the binlog. In case of other error types this event is raised and the offending build event is skipped and not returned.
/// </summary>
event Action<BinaryLogReaderErrorEventArgs>? RecoverableReadError;
```

Our sample usage of the [Reading API](#reading-api) can be enhanced with recoverable errors handling e.g. as such:

```csharp

// Those can be raised only during forward compatibility reading mode.
logReader.RecoverableReadError += errorEventArgs =>
{
// ...
// e.g. we can decide to ignore the error and continue reading or break reading
// based on the type of the error or/and type of the record or/and the frequency of the error
// Would we decide to completely ignore some errors - we can aid better performance by not materializing the actual error message.
// Otherwise the error message can be materialized via the provided method on the event argument:
Console.WriteLine($"Recoverable reader error: {errorEventArgs.GetFormattedMessage()}");
};

```

When authoring changes to the specific BuildEventArg types - it is always strongly recommended to **prefer append-only changes**.

This prevents the possibility of collision where some fields are removed in one version and then different fields with same binary size are added in future version. Such a sequence of format changes might not be caught by the decoder and might lead to unnoticed corrupt interpretation of data. For this reason the author of specific OM changes should always check whether there is a possibility of unrecognizable format collision (same binary size, different representation) within binlog versions of a same [minimum reader version support](#forward-compatibility-reading). If this is possible, the [minimum reader version support](#forward-compatibility-reading) should be incremented.
188 changes: 183 additions & 5 deletions src/Build.UnitTests/BinaryLogger_Tests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Reflection;
using System.Text;

using FluentAssertions;
using Microsoft.Build.BackEnd.Logging;
using Microsoft.Build.Evaluation;
using Microsoft.Build.Execution;
Expand Down Expand Up @@ -83,10 +85,21 @@ public BinaryLoggerTests(ITestOutputHelper output)
_logFile = _env.ExpectFile(".binlog").Path;
}

public enum BinlogRoundtripTestReplayMode
{
NoReplay,
Structured,
RawEvents
}

[Theory]
[InlineData(s_testProject)]
[InlineData(s_testProject2)]
public void TestBinaryLoggerRoundtrip(string projectText)
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.NoReplay)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.RawEvents)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.NoReplay)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.RawEvents)]
public void TestBinaryLoggerRoundtrip(string projectText, BinlogRoundtripTestReplayMode replayMode)
{
var binaryLogger = new BinaryLogger();

Expand All @@ -110,6 +123,36 @@ public void TestBinaryLoggerRoundtrip(string projectText)
project.Build(new ILogger[] { binaryLogger, mockLogFromBuild, serialFromBuild, parallelFromBuild }).ShouldBeTrue();
}

string fileToReplay;
switch (replayMode)
{
case BinlogRoundtripTestReplayMode.NoReplay:
fileToReplay = _logFile;
break;
case BinlogRoundtripTestReplayMode.Structured:
case BinlogRoundtripTestReplayMode.RawEvents:
{
var logReader = new BinaryLogReplayEventSource();
fileToReplay = _env.ExpectFile(".binlog").Path;
if (replayMode == BinlogRoundtripTestReplayMode.Structured)
{
// need dummy handler to force structured replay
logReader.BuildFinished += (_, _) => { };
}

BinaryLogger outputBinlog = new BinaryLogger()
{
Parameters = fileToReplay
};
outputBinlog.Initialize(logReader);
logReader.Replay(_logFile);
outputBinlog.Shutdown();
}
break;
default:
throw new ArgumentOutOfRangeException(nameof(replayMode), replayMode, null);
}

var mockLogFromPlayback = new MockLogger();

var serialFromPlaybackText = new StringBuilder();
Expand All @@ -126,7 +169,10 @@ public void TestBinaryLoggerRoundtrip(string projectText)
parallelFromPlayback.Initialize(binaryLogReader);

// read the binary log and replay into mockLogger2
binaryLogReader.Replay(_logFile);
binaryLogReader.Replay(fileToReplay);
mockLogFromPlayback.Shutdown();
serialFromPlayback.Shutdown();
parallelFromPlayback.Shutdown();

// the binlog will have more information than recorded by the text log
mockLogFromPlayback.FullLog.ShouldContainWithoutWhitespace(mockLogFromBuild.FullLog);
Expand All @@ -140,6 +186,138 @@ public void TestBinaryLoggerRoundtrip(string projectText)
parallelActual.ShouldContainWithoutWhitespace(parallelExpected);
}

/// <summary>
/// This test validate that binlog file content is identical upon replaying.
/// The identity can be defined via 3 ways:
/// * byte-for-byte equality
/// * byte-for-byte equality of unzipped content
/// * structured equality of events
///
/// They are ordered by their strength (the byte-for-byte equality implies the other two, etc.),
/// but we mainly care about the structured equality. If the more strong equalities are broken -
/// the assertions can be simply removed.
/// However the structured equality is important - it guarantees that binlog reading and writing functionality
/// is not dropping or altering any information.
/// </summary>
/// <param name="projectText"></param>
/// <param name="replayMode"></param>
[Theory]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.RawEvents)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.RawEvents)]
public void TestBinaryLoggerRoundtripEquality(string projectText, BinlogRoundtripTestReplayMode replayMode)
{
var binaryLogger = new BinaryLogger();

binaryLogger.Parameters = _logFile;

// build and log into binary logger
using (ProjectCollection collection = new())
{
Project project = ObjectModelHelpers.CreateInMemoryProject(collection, projectText);
// make sure the project file makes it to the binlog (it has file existence check)
File.WriteAllText(project.FullPath, projectText);
project.Build(new ILogger[] { binaryLogger }).ShouldBeTrue();
File.Delete(project.FullPath);
}

var logReader = new BinaryLogReplayEventSource();
string replayedLogFile = _env.ExpectFile(".binlog").Path;
if (replayMode == BinlogRoundtripTestReplayMode.Structured)
{
// need dummy handler to force structured replay
logReader.BuildFinished += (_, _) => { };
}

BinaryLogger outputBinlog = new BinaryLogger()
{
Parameters = $"LogFile={replayedLogFile};OmitInitialInfo"
};
outputBinlog.Initialize(logReader);
logReader.Replay(_logFile);
outputBinlog.Shutdown();

AssertBinlogsHaveEqualContent(_logFile, replayedLogFile);
// If this assertation complicates development - it can possibly be removed
// The structured equality above should be enough.
AssertFilesAreBinaryEqualAfterUnpack(_logFile, replayedLogFile);
}

private static void AssertFilesAreBinaryEqualAfterUnpack(string firstPath, string secondPath)
{
using var br1 = BinaryLogReplayEventSource.OpenReader(firstPath);
using var br2 = BinaryLogReplayEventSource.OpenReader(secondPath);
const int bufferSize = 4096;

int readCount = 0;
while (br1.ReadBytes(bufferSize) is { Length: > 0 } bytes1)
{
var bytes2 = br2.ReadBytes(bufferSize);

bytes1.SequenceEqual(bytes2).ShouldBeTrue(
$"Buffers starting at position {readCount} differ. First:{Environment.NewLine}{string.Join(",", bytes1)}{Environment.NewLine}Second:{Environment.NewLine}{string.Join(",", bytes2)}");
readCount += bufferSize;
}

br2.ReadBytes(bufferSize).Length.ShouldBe(0, "Second buffer contains bytes after first file end");
}

private static void AssertBinlogsHaveEqualContent(string firstPath, string secondPath)
{
using var reader1 = BinaryLogReplayEventSource.OpenBuildEventsReader(firstPath);
using var reader2 = BinaryLogReplayEventSource.OpenBuildEventsReader(secondPath);

Dictionary<string, string> embedFiles1 = new();
Dictionary<string, string> embedFiles2 = new();

reader1.ArchiveFileEncountered += arg
=> AddArchiveFile(embedFiles1, arg);

// This would be standard subscribe:
// reader2.ArchiveFileEncountered += arg
// => AddArchiveFile(embedFiles2, arg);

// We however use the AddArchiveFileFromStringHandler - to exercise it
// and to assert it's equality with ArchiveFileEncountered handler
string currentFileName = null;
reader2.ArchiveFileEncountered +=
((Action<StringReadEventArgs>)AddArchiveFileFromStringHandler).ToArchiveFileHandler();

int i = 0;
while (reader1.Read() is { } ev1)
{
i++;
var ev2 = reader2.Read();

ev1.Should().BeEquivalentTo(ev2,
$"Binlogs ({firstPath} and {secondPath}) should be equal at event {i}");
}
// Read the second reader - to confirm there are no more events
// and to force the embedded files to be read.
reader2.Read().ShouldBeNull($"Binlogs ({firstPath} and {secondPath}) are not equal - second has more events >{i + 1}");

Assert.Equal(embedFiles1, embedFiles2);

void AddArchiveFile(Dictionary<string, string> files, ArchiveFileEventArgs arg)
{
ArchiveFile embedFile = arg.ArchiveData.ToArchiveFile();
files.Add(embedFile.FullPath, embedFile.Content);
}

void AddArchiveFileFromStringHandler(StringReadEventArgs args)
{
if (currentFileName == null)
{
currentFileName = args.OriginalString;
return;
}

embedFiles2.Add(currentFileName, args.OriginalString);
currentFileName = null;
}
}

[Fact]
public void BinaryLoggerShouldSupportFilePathExplicitParameter()
{
Expand Down
Loading

0 comments on commit d53e436

Please sign in to comment.