Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core, API: Add addNonDefaultSpec to UpdatePartitionSpec to not set the new partition spec as default #10736

Conversation

shanielh
Copy link
Contributor

This may be used in cases where we want to add a partition spec but not set the new partition spec as default.

For example, if I want to make sure that there's an unpartitioned spec since I might want to commit specific data to it in some cases, I would first validate that there's an unpartition spec in the table, and if there isn't, I would try to add the unpartitioned spec but not set it as the new default partition spec

@shanielh shanielh force-pushed the feature/update-partition-spec-set-as-default branch from 2c2a54b to a2e1cd7 Compare July 21, 2024 10:05
@RussellSpitzer
Copy link
Member

My big question here is what is the value of adding the Spec before you are ready to write data?

@shanielh
Copy link
Contributor Author

My big question here is what is the value of adding the Spec before you are ready to write data?

I don't think it matters whether I add the spec just before writing to it or prior. But I wouldn't like to set it as the new default.

@RussellSpitzer
Copy link
Member

My big question here is what is the value of adding the Spec before you are ready to write data?

I don't think it matters whether I add the spec just before writing to it or prior. But I wouldn't like to set it as the new default.

Wouldn't that mean we would want to modify the write APIs? Isn't that essentially just an engine decision?

@shanielh
Copy link
Contributor Author

My big question here is what is the value of adding the Spec before you are ready to write data?

I don't think it matters whether I add the spec just before writing to it or prior. But I wouldn't like to set it as the new default.

Wouldn't that mean we would want to modify the write APIs? Isn't that essentially just an engine decision?

The write APIs already support that.

@RussellSpitzer
Copy link
Member

Got it, Ok let me take a closer look at this

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable but we do need to add some tests to TestUpdatePartitionSpec to validate the behavior

@shanielh
Copy link
Contributor Author

This seems reasonable but we do need to add some tests to TestUpdatePartitionSpec to validate the behavior

I had to add the spec to TestTableUpdatePartitionSpec since that's the only one that actually commit the spec to the table, PLMK if there's anything else I need to do for this to get merged.

* @param value whether to set the new partition spec as default
* @return this for method chaining
*/
UpdatePartitionSpec setAsDefault(Boolean value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent and it's reasonable imo (don't think there's any other API where adding this would make sense). Typically we try avoid adding boolean APIs but to facilitate this case, it may be unavoidable. I'd like to think through more alternatives though.

On another note, at minimum I think this should be made a primitive boolean named setAsDefaultSpec (same with the API name imo)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do a no arg method instead something like

notDefault()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah couldn’t think of a good name for a no arg method so I discarded it initially but another one came to mind: addWithoutSettingCurrent (maybe a bit verbose)

@@ -327,7 +336,12 @@ public PartitionSpec apply() {

@Override
public void commit() {
TableMetadata update = base.updatePartitionSpec(apply());
TableMetadata update;
if (this.setAsDefault) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically in the project, this is only used when writing to the field. Here we could just reference setAsDefault

@shanielh
Copy link
Contributor Author

shanielh commented Jul 25, 2024

Hi (@amogh-jahagirdar, @RussellSpitzer), I've applied the changes you suggested within a new fixup commit, please review and let me know when it's good to squash and rebase.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, we can merge if @amogh-jahagirdar agrees

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very against this idea, but after some more thinking, I think I'm struggling to understand the use case and can be convinced otherwise. I thought I originally understood the intent, but now it's a bit confusing to me:

After the spec is added without making it the current, active table spec, how will it later be set as the current? With the current APIs a user has to again call updateSpec(), reconstruct the desired spec and then commit, and rely on the behavior of the implementation to essentially re-use the partition spec that was previously added.

At minimum though would advocate for a more clear API name which indicates what "nonDefault" is referring to; I'm proposing addNonDefaultSpec or addInactiveSpec

*
* @return this for method chaining
*/
UpdatePartitionSpec notDefault();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this addNonDefaultSpec or addInactiveSpec cc @RussellSpitzer @shanielh?

When I was going through the test code the API name felt a bit odd (the heuristic I try and aim for is does it sound right when speaking as a sentence).

 table.updateSpec().addField("id").addNonDefaultSpec().commit();

reads more clear to me since it's more clear what the action is ("adding"), and what non-default is referring to ("spec")

inactive maybe a bit more extreme but technically it is not the active table spec if it's added, since every write to the table should just ignore that spec until it is made active.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah just saw/remembered #10736 (comment), the write APIs Iceberg exposes allow passing in a given spec ID. So you're not really looking to set the spec as current but rather just leverage the existence of the spec in the write APIs. That's reasonable.

Cool, I think I'd still advocate for a different method name. #10736 (comment)

After that, I think the PR looks good to me.

@shanielh shanielh force-pushed the feature/update-partition-spec-set-as-default branch from 503f8fb to 175a391 Compare July 27, 2024 04:02
@shanielh
Copy link
Contributor Author

Ah just saw/remembered #10736 (comment), the write APIs Iceberg exposes allow passing in a given spec ID. So you're not really looking to set the spec as current but rather just leverage the existence of the spec in the write APIs. That's reasonable.

Cool, I think I'd still advocate for a different method name. #10736 (comment)

After that, I think the PR looks good to me.

Ok, I've modified the method name and squashed all commits into one.

Thanks!

@shanielh shanielh force-pushed the feature/update-partition-spec-set-as-default branch from 175a391 to 5d69bc4 Compare July 28, 2024 06:22
@RussellSpitzer
Copy link
Member

Revapi broken (as expected) please fix

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, as @RussellSpitzer mentioned you'll need to fix RevAPI. Also @RussellSpitzer not sure if you saw the naming change to make the API called addNonDefaultSpec , just want to make sure you're also good with that!

@RussellSpitzer
Copy link
Member

Overall LGTM, as @RussellSpitzer mentioned you'll need to fix RevAPI. Also @RussellSpitzer not sure if you saw the naming change to make the API called addNonDefaultSpec , just want to make sure you're also good with that!

A rose by any other name :) still +1

@amogh-jahagirdar amogh-jahagirdar changed the title UpdatePartitionSpec: Added ability to not set the new partition spec as default Core, API: Add addNonDefaultSpec to UpdatePartitionSpec to not set the new partition spec as default Jul 29, 2024
…as default

This may be used in cases where we want to add a partition spec but not set the new partition
spec as default.

For example, if I want to make sure that there's an unpartitioned spec since I might want
to commit specific data to it in some cases, I would first validate that there's an unpartition
spec in the table, and if there isn't, I would try to add the unpartitioned spec but not
set it as the new default partition spec
@shanielh shanielh force-pushed the feature/update-partition-spec-set-as-default branch from 5d69bc4 to ac1a9df Compare July 31, 2024 13:34
@shanielh
Copy link
Contributor Author

Overall LGTM, as @RussellSpitzer mentioned you'll need to fix RevAPI. Also @RussellSpitzer not sure if you saw the naming change to make the API called addNonDefaultSpec , just want to make sure you're also good with that!

Fixed, thanks.

@shanielh
Copy link
Contributor Author

shanielh commented Aug 4, 2024

@RussellSpitzer, @amogh-jahagirdar Any chance to merge this? All checks have passed and two approvals 😄

@RussellSpitzer
Copy link
Member

@shanielh In general we want to wait extra time on API module changes since we are committing the project to support this new API for ever. That said I think we've had this open long enough so I'll merge today.

@RussellSpitzer RussellSpitzer merged commit d9aacd2 into apache:main Aug 5, 2024
61 checks passed
@RussellSpitzer
Copy link
Member

Thanks @shanielh for the PR, and @amogh-jahagirdar for Reviewing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants