-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, API: View Version implementation #6861
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few small things to address, but other than that it LGTM
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/view/TestViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
03783c8
to
baf9be2
Compare
@@ -43,8 +47,7 @@ public interface ViewVersion { | |||
long timestampMillis(); | |||
|
|||
/** | |||
* Return the version summary such as the name of the operation that created that version of the | |||
* view | |||
* Return a string map of summary data for the operation that produced this view version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the data model itself, I think we should follow what Snapshot does where summary returns all the fields other than the operation itself. We have a separate operation method below for the operation. When we serialize the version, the operation will get put in the summary of the version. Let me know if this makes sense @nastra @jackye1995 @rdblue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid having a change of spec here, the information in summary are just FYI purpose, I don't think we need to assign a semantic meaning to operations now, we can always add later if it becomes necessary. But for views I don't see anything other than CREATE and REPLACE, but it can also be identified directly through the version ID, ID=1 means CREATE, everything else are REPLACE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion here, there's no change in spec here, it's a change in the data model layer but since there was no implementation until now we have some flexibility to change if desired.
We'll still serialize the operation
in the summary
map when writing the metadata file, so the spec is maintained.
When we deserialize into the in memory representation, the summary map will not contain operation
, and instead a user would use the operation()
API. It's a similar model to Snapshot
and SnapshotParser
Let me know if that makes sense, or if there's doubts on the API itself after this change! cc: @rdblue @nastra @danielcweeks @jzhuge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the argument also applies to the data model, because for Snapshot I am interested in what is the operation that produces the snapshot, because there are different types for data write. But for view, I am not sure how useful that information is, it's always just create or replace of the metadata definition. So I don't know if we should add this layer of complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @jackye1995 instead of restricting the type of operation up front, we can keep it open. It looks like Snapshot summary also does has the same behavior. There's a set of recognized operations by the Iceberg library like append
, replace
etc but those aren't strictly validated when reading the metadata. So that's a similar model we can follow here. We also don't need an enum following this logic, it can just be defined Strings (which we can define separately when we get to the catalog implementation for createView
replaceView
etc. Also we can always add an operation
API if it becomes to awkward to get the operation from the summary but it's not needed at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @jackye1995 we were talking on two different issues here.
1.) The first is on strict validation of the operation
, we concluded that's probably not a desired behavior, and that's the behavior of snapshot
currently.
2.) The second is on the API and serialization of operation itself. Operation will be serialized in the summary but when it's read we deserialize it into a separate field in memory. This is the same as what happens with Snapshot
and I think it makes sense to have that same behavior and have a separate API for operation. Updated the PR to reflect this.
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/view/TestViewVersionParser.java
Outdated
Show resolved
Hide resolved
baf9be2
to
98f8784
Compare
98f8784
to
616c1f0
Compare
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/view/TestViewVersionParser.java
Outdated
Show resolved
Hide resolved
2170c9f
to
c6dc13c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still LGTM, thanks @amogh-jahagirdar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
core/src/main/java/org/apache/iceberg/view/ViewVersionParser.java
Outdated
Show resolved
Hide resolved
generator.writeNumberField(VERSION_ID, version.versionId()); | ||
generator.writeNumberField(TIMESTAMP_MS, version.timestampMillis()); | ||
generator.writeObjectFieldStart(SUMMARY); | ||
for (Map.Entry<String, String> summaryEntry : version.summary().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also write OPERATION here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary should be guaranteed to have operation
at the time of writing the metadata (should already be an entry in the map).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, see #6861 (comment) for the discussion on this
e4d60b6
to
35562fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
3b43bf8
to
d6968e7
Compare
Co-authored-by: John Zhuge <jzhuge@apache.org>
d6968e7
to
59feecc
Compare
Thanks for all the changes, let me know when you have the metadata parser PR! |
View Version parser implementation
Co-authored-by: John Zhuge jzhuge@apache.org
CC @jzhuge @rdblue @nastra @danielcweeks @jackye1995 Let me know your thoughts!