-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROTOCOL] Protocol changes for log compactions #2122
[PROTOCOL] Protocol changes for log compactions #2122
Conversation
@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column | |||
|
|||
These files reside in the `_delta_log/_sidecars` directory. | |||
|
|||
### Log Compaction Files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably expand Metadata cleanup section to cover log compaction as well?
PROTOCOL.md
Outdated
compaction file represents an [action](#actions). | ||
The commit files for a given range are created by doing [Action Reconciliation](#action-reconciliation) | ||
of the corresponding commits. | ||
Instead of reading ihe individual commit files in range `[x, y]`, an implementation could choose to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reading ihe individual commit files in range `[x, y]`, an implementation could choose to read | |
Instead of reading the individual commit files in range `[x, y]`, an implementation could choose to read |
PROTOCOL.md
Outdated
@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column | |||
|
|||
These files reside in the `_delta_log/_sidecars` directory. | |||
|
|||
### Log Compaction Files | |||
|
|||
Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name: | |
Log compaction files reside in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name: |
PROTOCOL.md
Outdated
@@ -253,6 +253,47 @@ as of now. The add and remove file actions are stored as their individual column | |||
|
|||
These files reside in the `_delta_log/_sidecars` directory. | |||
|
|||
### Log Compaction Files | |||
|
|||
Log compaction files resides in the `_delta_log` directory. A log compaction file from a start version `x` to an end version `y` will have the following name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't really any mention of writers here, tho much of it can probably be inferred.
Seems like we could be more clear by saying:
- Writers
- Can optionally produce log compactions (with advice about when they might want to do so)
- Compactions produced by applying action reconciliation to the commits to be compacted
- Readers
- Can optionally consume log compactions, if available
- The compaction replaces the corresponding commits during action reconciliation
Also probably worth pointing out that this is not a table feature, because both writers and readers can safely access the table even if they know nothing about log compactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Readers section.
Added Writer section except this point: Compactions produced by applying action reconciliation to the commits to be compacted
. Isn't this the definition of log compaction which we have already explained in the initial part of this section?
Also explicitly pointing out that minor-compaction is not a table feature seems little out of context from PROTOCOL's perspective?
{"add":{"path":"f4",...}} | ||
{"remove":{"path":"f1",...}} | ||
{"remove":{"path":"f3",...}} | ||
{"txn":{"appId":"3ae45b72-24e1-865a-a211-34987ae02f2a","version":4390}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice example!
Protocol changes for log compaction Issue: delta-io#2072 Closes delta-io#2122 GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
Protocol changes for log compaction Issue: delta-io#2072 Closes delta-io#2122 GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
Protocol changes for log compaction Issue: delta-io#2072 Closes delta-io#2122 GitOrigin-RevId: c15ff24a2a4242520f5cf8ffdb8604a4ffc36805
Which Delta project/connector is this regarding?
Description
This PR makes PROTOCOL changes for log compactions.
Design: #2072
How was this patch tested?
NA
Does this PR introduce any user-facing changes?
NA