Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Binlogs Redacting] API and support for external redaction functionality #9219

Closed
wants to merge 9 commits into from

Conversation

JanKrivanek
Copy link
Member

@JanKrivanek JanKrivanek commented Sep 15, 2023

Fixes #9147 and contributes to #8400

All the fixes are now contributed to the extending branch only - so please review #9307 only.


The fwd-compatability reading support PR: JanKrivanek#1 (this aims to be incorporated into this PR).
If you prefer to review both set of changes within single PR: #9307

Context

#8400

This PR enables:

  • Transparentness of binlogs replaying
    • Equalitity of build events upon de/serialization
    • Ability to configure replaying of binlog embedded files
    • Ability to configure replaying of initial binlog information events (version, params)
    • Ability to opt-in into determinstic compression of binlog stream (via GreedyBufferedStream)
  • Adjustments to Binary log replaying API - (exposing ability to supply file vs reader - plus ownership considerations; ability to subscribe to embedded files replaying events)
  • BuildEventArgsReader exposes event for optional editing embedded files during replaying
    • Either accessible via more versatile ArchiveFileEventArgs - that allows accessing file contents via StreamReader for more optimal work with large files
    • Or handleable via simple Action<StringReadEventArgs handler - allowing to reuse identical handler as exposed for editing strings
    • Binlogs replaying optimaly switches the embedded content reading stratgy based on if/what is the caller subscribing to (just skipping the content vs replaying the embedded stream as is vs unpacking it and interpretting it as embedded files).

In combination with the complement PR implementing the actual redactor (dotutils/MSBuild.BinlogRedactor#2) it enables:

image

@JanKrivanek JanKrivanek marked this pull request as ready for review September 20, 2023 14:50
Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments. Haven't reviewed all changes yet.

src/Build/CompatibilitySuppressions.xml Show resolved Hide resolved
Comment on lines +27 to +31
if (!_resultSet)
{
throw new InvalidOperationException(
"ArchiveFile was obtained, but the final edited version was not set.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unnecessarily complicated. Do you think that a simple ArchiveFile getter and a pair of SetArchiveFile or maybe ReplaceArchiveFile methods would be misused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is needed. Let me try to explain - if it makes sense:

Due to dual access pattern to ArchiveFile - via Stream and via string - the behavior is undefined if the content is obtained multiple time. And for the Stream case - even once it's obtained once, the value cannot then be read by the exposing reader or by the next event in the subscription chain. This contract tries to express this situation explicitly and fail fast and clearly if any subscriber just reads the value and doesn't properly set the result. Event the name of the method tries to suggest it is one-time action, taking the ownership.

tl;dr; Having getter could lead to out of bounds exception (due to reading beyond end of exposed stream) for the second subscriber (or the event firing reader) when subscriber wouldn't call SetResults afterwards.

With this background - would you still prefer getter? Or at least different naming?

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more comments.

@ladipro
Copy link
Member

ladipro commented Oct 9, 2023

Also, I forgot to mention it in the review, please consider covering the new code with unit tests. The new stream adaptors, for one, should be easily testable.

@JanKrivanek
Copy link
Member Author

All the fixes are now contributed to the extending branch only - so please review #9307 only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binlog redacting - properly process embedded files
2 participants