Skip to content

Comments

Plumbing to add extra custom metadata to snapshot summary#1241

Merged
rdblue merged 5 commits intoapache:masterfrom
moulimukherjee:extra-snapshot-metadata
Jul 24, 2020
Merged

Plumbing to add extra custom metadata to snapshot summary#1241
rdblue merged 5 commits intoapache:masterfrom
moulimukherjee:extra-snapshot-metadata

Conversation

@moulimukherjee
Copy link
Contributor

@moulimukherjee moulimukherjee commented Jul 24, 2020

Plumbing to add extra custom metadata to snapshot summary via write options #1242

Context: At Stripe we make use of airflow to run spark jobs, and want to pass a separate external ID and some versioning information linking an airflow run for each snapshot created. Currently to do this, we have to fork the Writer just to override the commitOperation.

The changes in this PR would allow custom metadata via write options which start with a particular prefix (extra-metadata.).

cc? @rdblue

public static final String PUBLISHED_WAP_ID_PROP = "published-wap-id";
public static final String SOURCE_SNAPSHOT_ID_PROP = "source-snapshot-id";
public static final String REPLACE_PARTITIONS_PROP = "replace-partitions";
public static final String EXTRA_METADATA_PREFIX = "extra-metadata.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to change if a different prefix string would be more suitable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. What about context. or snapshot.property.? I want it to be somewhat obvious what the prefix indicates, but as short as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot.property sounds good. I'll make changes.

Comment on lines 382 to 388
.option("extra-metadata.extra-key", "someValue")
.option("extra-metadata.another-key", "anotherValue")
.save(tableLocation);

Table table = tables.load(tableLocation);

Assert.assertTrue(table.currentSnapshot().summary().get("extra-key").equals("someValue"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plumbing to pass extra information from write options

@rdblue
Copy link
Contributor

rdblue commented Jul 24, 2020

Looks good to me. I just want to find a good prefix for the feature.

It would also be good to document this in the write options docs: http://iceberg.apache.org/configuration/#write-options

@moulimukherjee
Copy link
Contributor Author

Ack, will update docs too 👍

@moulimukherjee
Copy link
Contributor Author

@rdblue Added docs. Btw, I changed the prefix slightly to use - instead . as I noticed all other options have dashes.

@moulimukherjee
Copy link
Contributor Author

moulimukherjee commented Jul 24, 2020

@rdblue rdblue merged commit bf7edc4 into apache:master Jul 24, 2020
@rdblue
Copy link
Contributor

rdblue commented Jul 24, 2020

Thanks @moulimukherjee!

@moulimukherjee moulimukherjee deleted the extra-snapshot-metadata branch July 24, 2020 22:21
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Jul 29, 2020
aokolnychyi pushed a commit to aokolnychyi/iceberg that referenced this pull request Aug 18, 2020
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Have a way to pass custom metadata for iceberg snapshots via write options

3 participants