Skip to content

Conversation

@emkornfield
Copy link
Contributor

Which issue does this PR close?

Split off from #1851

What changes are included in this PR?

This change honors the compression setting for metadata.json file (write.metadata.compression-codec).

Are these changes tested?

Add unit test to verify files are gzipped when the flag is enabled.

@emkornfield
Copy link
Contributor Author

CC @kevinjqliu @liurenjie1024

key: &str,
default: T,
) -> Result<T, anyhow::Error>
) -> crate::error::Result<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> crate::error::Result<T>
) -> Result<T>

Unnecessary fully qualified name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

/// The target file size for files.
pub write_target_file_size_bytes: usize,
/// Compression codec for metadata files (JSON)
pub metadata_compression_codec: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be sth like Option<String>, none should be None in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


// Modify filename to add .gz before .metadata.json
let location = metadata_location.as_ref();
let new_location = if location.ends_with(".metadata.json") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should simply replace the suffix with this approach. Currently all metadata location generated logic are in MetadataLocation, we should add a new method in like following and deprecate old method as following:

impl MetadataLocation {
#[deprecate("Use new_with_table instead.")]
pub fn new_with_table_location(table_location: impl ToString) -> Self {
}

pub fn new_with_table(table: &Table) -> Self {
  ....
}

Copy link
Contributor Author

@emkornfield emkornfield Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table option seemed a little awkward with the control flow on how it is used istead I added a method that takes table properties. Also added a similar method for incrementing the number.

I modified this logic a bit and kept. My concern here is if we only deprecate the new_with_table_location then users can write out metadata.json files in a format that would be unreadable by other readers.

.map_err(from_aws_sdk_error)?;
let warehouse_location = get_resp.warehouse_location().to_string();
MetadataLocation::new_with_table_location(warehouse_location).to_string()
get_resp.warehouse_location().to_string()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a semantic change because up the update to creation.location below, need to investigate further what the correct thing to do here is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, from the comment above (L451-452) it seems like the "s3tables catalog" generates the metadata file location and does not allow any modification

so maybe we should just use

        let metadata_location =
            MetadataLocation::new_with_properties(metadata_location, None)
                .to_string(); 

so as to not do any path modification before writing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code was wrong. As best I can test warehouseLocation is actually the table location.

I've updated it to assign to table_location and then assign it below (as far as I can tell, the .location field is meant to be table location and not the metadata.json file).

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, a few nit comments.
would be good to get a second pair of 👀


/// Creates a completely new metadata location starting at version 0,
/// with compression settings from the table's properties.
/// Only used for creating a new table. For updates, see `with_next_version`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Only used for creating a new table. For updates, see `with_next_version`.
/// Only used for creating a new table. For updates, see `with_next_version_and_properties`.

since with_next_version is also deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait till I understand @liurenjie1024 concerns about with_next_version_and_properties to update this.

.map_err(from_aws_sdk_error)?;
let warehouse_location = get_resp.warehouse_location().to_string();
MetadataLocation::new_with_table_location(warehouse_location).to_string()
get_resp.warehouse_location().to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, from the comment above (L451-452) it seems like the "s3tables catalog" generates the metadata file location and does not allow any modification

so maybe we should just use

        let metadata_location =
            MetadataLocation::new_with_properties(metadata_location, None)
                .to_string(); 

so as to not do any path modification before writing

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emkornfield for this pr, looks much better now, still need some refinements.

/// Creates a completely new metadata location starting at version 0,
/// with compression settings from the table's properties.
/// Only used for creating a new table. For updates, see `with_next_version`.
pub fn new_with_properties(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn new_with_properties(
pub fn new_table(table: &Table)

We may do refactoring to table properties, and we may need other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue I have with taking a table here is that metadata location is an input to building a table, how about taking TableMetadataRef as input?

Or are you suggesting changing the return type of Table, so that this method updates it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @emkornfield changing the input param to take a &TableMetadata sounds good to me.

since = "0.8.0",
note = "Use with_next_version_and_properties instead to properly handle compression settings changes"
)]
pub fn with_next_version(&self) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need change this method, it's not a constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as the comment below, lets consolidate conversation there.

/// Creates a new metadata location for an updated metadata file.
/// Takes table properties to determine compression settings, which may have changed
/// from the previous version.
pub fn with_next_version_and_properties(&self, properties: &HashMap<String, String>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this.

Copy link
Contributor Author

@emkornfield emkornfield Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by we don't need it? I updated crates/iceberg/src/catalog/mod.rs to use it, if new properties change compression settings, I'm not sure how this would be reflected otherwise? I might have missed it but I don't think there is anything in the spec that prevents users from changing compression properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If properties has been changes, we should build a new instance of this MetadataLocation struct.

properties
.get(TableProperties::PROPERTY_METADATA_COMPRESSION_CODEC)
.and_then(|codec| match codec.to_lowercase().as_str() {
"gzip" => Some(".gz".to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make ".gz" a constant rather a magic literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

emkornfield and others added 4 commits November 21, 2025 08:54
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
…nfield/iceberg-rust into feat/metadata-compression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] iceberg-rust does not respect compression settings for metadata & avro

3 participants