Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add MetadataUpdate.applyTo to re-apply changes #4320

Merged
merged 3 commits into from Mar 15, 2022

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 13, 2022

This updates TableMetadata and MetadataUpdate so that changes can be reconstructed in TableMetadata.Builder from a list of MetadataUpdate instances that are produced by the builder.

This also fixes a few minor issues with the builder:

  • lastAssignedPartitionId was not public
  • Supports set changes that use -1 to refer to the last added schema, spec, or sort order
  • Updates snapshot history timestamp based on whether the snapshot was added, rather than the setCurrentSnapshot call

This is part of #4241.


@Override
public void applyTo(TableMetadata.Builder metadataBuilder) {
throw new UnsupportedOperationException("Not implemented");
Copy link
Contributor Author

@rdblue rdblue Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of the REST API, so I left it unimplemented. I'm not sure whether we will need this because tables have had UUIDs for a long time (added in #264). It seems unlikely that someone will update from 0.6.0 or earlier directly to 0.14.0 and want to use the REST catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is unlikely that will be the case. And existing tables without a UUID will have a UUID added on successful snapshot commit, so people would need to go directly from 0.6.0 to 0.14.0 and then try to use the REST api if I understand correctly.

I think it's fine to ignore that very specific edge case, and if somebody has a complaint, a procedure / utility function can be provided that will simply add a UUID as a direct table maintenance operation or something.

But I agree it's very unlikely for this to occur and probably better not to pollute the code to take it into account.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but overall this LGTM.

Thank you @rdblue!

changes.add(new MetadataUpdate.SetCurrentSchema(schemaId));
if (lastAddedSchemaId != null && lastAddedSchemaId == schemaId) {
// use -1 to signal that current was set to the last added schema
changes.add(new MetadataUpdate.SetCurrentSchema(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be something to make as a constant? Perhaps USE_LAST_ADDED_SCHEMA = -1?

EDIT: Looking at further code, we use the same constant for partition spec and many other things. So maybe this comment would be better placed above the builder class or something? Or using a more generic constant like private static final int KEEP_EXISTING_ID = -1?

I'm overall ok with this though and like the idea of using -1 this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LAST_ADDED. Good suggestion, it's easier to read.

Comment on lines +1396 to +1399
private <U extends MetadataUpdate> Stream<U> changes(Class<U> updateClass) {
return changes.stream()
.filter(updateClass::isInstance)
.map(updateClass::cast);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only place this is used, is it really needed? If so, maybe changesFor(Class<U> updateClass) would be better. Specifically as ther eis already a class field called changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places where we go through the changes, so I think it's a good idea to keep this for future changes rather than implementing it slightly differently in multiple places.

@@ -821,6 +827,12 @@ private Builder(TableMetadata base) {
this.sortOrdersById = Maps.newHashMap(base.sortOrdersById);
}

public Builder withMetadataLocation(String newMetadataLocation) {
this.metadataLocation = newMetadataLocation;
this.discardChanges = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for discarding changes automatically in this case? Is it that we don't want to apply on top of unknown metadata? A comment here would be helpful since we're performing a hidden action on this builder call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this. I think I want the caller to specifically call discardChanges() or fail validation if there are any changes.

The contract for TableMetadata is that the contents of the metadata location matches the table metadata. That's because the only way to get a TableMetadata with a non-null metadata location was to read it from the location using TableMetadataParser.

Now, table metadata is coming from a REST endpoint that also sends metadataLocation, so we need to be able to read the table metadata and re-attach the metadata location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a precondition to the build method and removed this.

@rdblue rdblue merged commit a701416 into apache:master Mar 15, 2022
@rdblue
Copy link
Contributor Author

rdblue commented Mar 15, 2022

Thanks for the reviews, @danielcweeks and @kbendick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants