Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROTOCOL] Protocol changes for log compactions #2122

Closed

Conversation

prakharjain09
Copy link
Collaborator

@prakharjain09 prakharjain09 commented Sep 29, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other: Protocol

Description

This PR makes PROTOCOL changes for log compactions.
Design: #2072

How was this patch tested?

NA

Does this PR introduce any user-facing changes?

NA

@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column

These files reside in the `_delta_log/_sidecars` directory.

### Log Compaction Files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ToC entry:
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably expand Metadata cleanup section to cover log compaction as well?

PROTOCOL.md Outdated
compaction file represents an [action](#actions).
The commit files for a given range are created by doing [Action Reconciliation](#action-reconciliation)
of the corresponding commits.
Instead of reading ihe individual commit files in range `[x, y]`, an implementation could choose to read
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Instead of reading ihe individual commit files in range `[x, y]`, an implementation could choose to read
Instead of reading the individual commit files in range `[x, y]`, an implementation could choose to read

PROTOCOL.md Outdated
@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column

These files reside in the `_delta_log/_sidecars` directory.

### Log Compaction Files

Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name:
Log compaction files reside in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name:

PROTOCOL.md Outdated
@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column

These files reside in the `_delta_log/_sidecars` directory.

### Log Compaction Files

Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't really any mention of writers here, tho much of it can probably be inferred.
Seems like we could be more clear by saying:

  • Writers
    • Can optionally produce log compactions (with advice about when they might want to do so)
    • Compactions produced by applying action reconciliation to the commits to be compacted
  • Readers
    • Can optionally consume log compactions, if available
    • The compaction replaces the corresponding commits during action reconciliation

Also probably worth pointing out that this is not a table feature, because both writers and readers can safely access the table even if they know nothing about log compactions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Readers section.
Added Writer section except this point: Compactions produced by applying action reconciliation to the commits to be compacted. Isn't this the definition of log compaction which we have already explained in the initial part of this section?

Also explicitly pointing out that minor-compaction is not a table feature seems little out of context from PROTOCOL's perspective?

{"add":{"path":"f4",...}}
{"remove":{"path":"f1",...}}
{"remove":{"path":"f3",...}}
{"txn":{"appId":"3ae45b72-24e1-865a-a211-34987ae02f2a","version":4390}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example!

@vkorukanti vkorukanti closed this in 5d43f1d Oct 2, 2023
vkorukanti pushed a commit to vkorukanti/delta that referenced this pull request Oct 3, 2023
Protocol changes for log compaction
Issue: delta-io#2072

Closes delta-io#2122

GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
vkorukanti pushed a commit to vkorukanti/delta that referenced this pull request Oct 3, 2023
Protocol changes for log compaction
Issue: delta-io#2072

Closes delta-io#2122

GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
Kimahriman pushed a commit to Kimahriman/delta that referenced this pull request Oct 3, 2023
Protocol changes for log compaction
Issue: delta-io#2072

Closes delta-io#2122

GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants