Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Jul 5, 2022

No description provided.

@github-actions github-actions bot added the python label Jul 5, 2022
@Fokko Fokko force-pushed the fd-add-snapshots branch from a813688 to 5ac8ad3 Compare July 5, 2022 11:30
manifest_list: Optional[str] = Field(
alias="manifest-list", description="Location of the snapshot's manifest list file", default=None
)
manifests: List[str] = Field(default_factory=list, repr=False, exclude=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably not include this. Manifests were originally listed in snapshot, but that caused problems and we quickly moved to manifest lists. It's really unlikely that there are tables still listing manifests in the snapshot itself, so I'd probably just drop this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the historical context. Sounds like an excellent idea. I'll drop this one, and make the manifest-list mandatory.

sequence_number: Optional[int] = Field(alias="sequence-number", default=None)
timestamp_ms: int = Field(alias="timestamp-ms")
manifest_list: Optional[str] = Field(
alias="manifest-list", description="Location of the snapshot's manifest list file", default=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove manifests, then there's no need to make this Optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we plug into manifest-list in that case? I'm using this as a test case: https://github.com/apache/iceberg/blob/master/python/tests/table/test_metadata.py#L45

I can add a manifest-list there, but it is marked as optional in the schema: https://iceberg.apache.org/spec/. How liberal should we read, or just make it mandatory?

def additional_properties(self) -> Dict[str, str]:
return {
k: v for k, v in self.__root__.items() if k != OPERATION # type: ignore # We know that they are all string, and we don't want to check
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to keep a copy of this in __init__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

snapshot = Snapshot(
snapshot_id=25,
parent_snapshot_id=19,
sequence_number=200,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want a test without a sequence number?

Copy link
Contributor Author

@Fokko Fokko Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've added a test. I noticed that in Java we set the sequence number to zero. I tried to do this in Python as well, but then we don't want to write the value. Turns out that setting it to None makes it easier since we don't write None keys in the json object (similar to doc in the NestedField).
I don't think this is an issue, but we should add a check when we start modifying the snapshots.

parent_snapshot_id: Optional[int] = Field(alias="parent-snapshot-id")
sequence_number: Optional[int] = Field(alias="sequence-number", default=None)
timestamp_ms: int = Field(alias="timestamp-ms")
manifest_list: Optional[str] = Field(alias="manifest-list", description="Location of the snapshot's manifest list file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be required

@rdblue rdblue merged commit 1533eb1 into apache:master Jul 12, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 12, 2022

Looks great. Thanks, @Fokko!

@Fokko Fokko deleted the fd-add-snapshots branch July 12, 2022 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants