-
Notifications
You must be signed in to change notification settings - Fork 3k
Python: Move Snapshot to Pydantic #5201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyiceberg/table/snapshots.py
Outdated
| manifest_list: Optional[str] = Field( | ||
| alias="manifest-list", description="Location of the snapshot's manifest list file", default=None | ||
| ) | ||
| manifests: List[str] = Field(default_factory=list, repr=False, exclude=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably not include this. Manifests were originally listed in snapshot, but that caused problems and we quickly moved to manifest lists. It's really unlikely that there are tables still listing manifests in the snapshot itself, so I'd probably just drop this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the historical context. Sounds like an excellent idea. I'll drop this one, and make the manifest-list mandatory.
python/pyiceberg/table/snapshots.py
Outdated
| sequence_number: Optional[int] = Field(alias="sequence-number", default=None) | ||
| timestamp_ms: int = Field(alias="timestamp-ms") | ||
| manifest_list: Optional[str] = Field( | ||
| alias="manifest-list", description="Location of the snapshot's manifest list file", default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove manifests, then there's no need to make this Optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we plug into manifest-list in that case? I'm using this as a test case: https://github.com/apache/iceberg/blob/master/python/tests/table/test_metadata.py#L45
I can add a manifest-list there, but it is marked as optional in the schema: https://iceberg.apache.org/spec/. How liberal should we read, or just make it mandatory?
python/pyiceberg/table/snapshots.py
Outdated
| def additional_properties(self) -> Dict[str, str]: | ||
| return { | ||
| k: v for k, v in self.__root__.items() if k != OPERATION # type: ignore # We know that they are all string, and we don't want to check | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to keep a copy of this in __init__?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| snapshot = Snapshot( | ||
| snapshot_id=25, | ||
| parent_snapshot_id=19, | ||
| sequence_number=200, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want a test without a sequence number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I've added a test. I noticed that in Java we set the sequence number to zero. I tried to do this in Python as well, but then we don't want to write the value. Turns out that setting it to None makes it easier since we don't write None keys in the json object (similar to doc in the NestedField).
I don't think this is an issue, but we should add a check when we start modifying the snapshots.
| parent_snapshot_id: Optional[int] = Field(alias="parent-snapshot-id") | ||
| sequence_number: Optional[int] = Field(alias="sequence-number", default=None) | ||
| timestamp_ms: int = Field(alias="timestamp-ms") | ||
| manifest_list: Optional[str] = Field(alias="manifest-list", description="Location of the snapshot's manifest list file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be required
|
Looks great. Thanks, @Fokko! |
No description provided.