Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix parse partitions in manifest_list #122

Merged
merged 3 commits into from
Dec 19, 2023
Merged

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Dec 18, 2023

This PR:

  1. fix bug: ManifestList parsing should not require partition_type. #121 and modify unit test to test this case.
  2. fix Support initial-default when reading Avro #119, add init default when reading v1 manifest
  3. add test case for writing back v1 manifest/manifest-list as v2
    • for manifest list, we can directly write back a manifest list as v2.
    • for manifest, we can call upgrade_version and then write it back as v2.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 18, 2023

@@ -79,6 +79,22 @@ impl Manifest {

Ok(Manifest { metadata, entries })
}

/// Upgrade the format version of this manifest.
pub fn upgrade_version(&mut self, format_version: FormatVersion) -> Result<(), Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update version seems tricky to me. We don't need to take other actions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.

Yes, upgrading the version is more. Here it's more like upgrading this manifest file. #119 (comment)
This interface is confusing. There is another way:

  1. pub fn parse_avro(bs: &[u8], version) -> Result<Self, Error>. Then we can parse v1 manifest as v2. After we parse it, we will modify the version as v2.
  2. pub async fn write(mut self, manifest: Manifest, version). And then we can write as v1 manifest as v2. I think this way may be more clear.🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that comment, but I don't think we need to introduce this api for now. The manifest format upgradation is part of table format upgradation, and we need to think about it in the overall design, rather exposing this single api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's remove this interface and only fix the init-default. Let's track the upgrade action in another issue as a new feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.

This is not correct, and V2 is forward-compatible. If you apply the rules of schema evolution (as with any other Iceberg table), it should work seamlessly. You don't want to rewrite a multi-petabyte table when you do any meta operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this interface and only fix the init-default.

This will make thing easier, such as setting the sequence-number when the field isn't present (in the case of V1 metadata).

.unwrap_or_default())
} else {
Err(Error::new(
crate::ErrorKind::DataInvalid,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting a better error message with more context, for example, the manifest list file path, the entry path, spec id.

Copy link
Contributor Author

@ZENOTME ZENOTME Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added context of entry path and spec id. We don't have the manifest list file path for now (we pass [u8] to parse directly), I think it need to set in the higher level API.

@@ -79,6 +79,22 @@ impl Manifest {

Ok(Manifest { metadata, entries })
}

/// Upgrade the format version of this manifest.
pub fn upgrade_version(&mut self, format_version: FormatVersion) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.

Copy link
Collaborator

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -1104,8 +1104,8 @@ mod _serde {
Ok(ManifestEntry {
status: self.status.try_into()?,
snapshot_id: Some(self.snapshot_id),
sequence_number: None,
file_sequence_number: None,
sequence_number: Some(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ZENOTME

@Fokko Fokko merged commit 7d06a85 into apache:main Dec 19, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: ManifestList parsing should not require partition_type. Support initial-default when reading Avro
4 participants