-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, API: Add addNonDefaultSpec to UpdatePartitionSpec to not set the new partition spec as default #10736
Core, API: Add addNonDefaultSpec to UpdatePartitionSpec to not set the new partition spec as default #10736
Conversation
2c2a54b
to
a2e1cd7
Compare
My big question here is what is the value of adding the Spec before you are ready to write data? |
I don't think it matters whether I add the spec just before writing to it or prior. But I wouldn't like to set it as the new default. |
Wouldn't that mean we would want to modify the write APIs? Isn't that essentially just an engine decision? |
The write APIs already support that. |
Got it, Ok let me take a closer look at this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable but we do need to add some tests to TestUpdatePartitionSpec to validate the behavior
I had to add the spec to TestTableUpdatePartitionSpec since that's the only one that actually commit the spec to the table, PLMK if there's anything else I need to do for this to get merged. |
* @param value whether to set the new partition spec as default | ||
* @return this for method chaining | ||
*/ | ||
UpdatePartitionSpec setAsDefault(Boolean value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intent and it's reasonable imo (don't think there's any other API where adding this would make sense). Typically we try avoid adding boolean APIs but to facilitate this case, it may be unavoidable. I'd like to think through more alternatives though.
On another note, at minimum I think this should be made a primitive boolean named setAsDefaultSpec
(same with the API name imo)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could do a no arg method instead something like
notDefault()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah couldn’t think of a good name for a no arg method so I discarded it initially but another one came to mind: addWithoutSettingCurrent (maybe a bit verbose)
@@ -327,7 +336,12 @@ public PartitionSpec apply() { | |||
|
|||
@Override | |||
public void commit() { | |||
TableMetadata update = base.updatePartitionSpec(apply()); | |||
TableMetadata update; | |||
if (this.setAsDefault) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically in the project, this
is only used when writing to the field. Here we could just reference setAsDefault
Hi (@amogh-jahagirdar, @RussellSpitzer), I've applied the changes you suggested within a new fixup commit, please review and let me know when it's good to squash and rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, we can merge if @amogh-jahagirdar agrees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very against this idea, but after some more thinking, I think I'm struggling to understand the use case and can be convinced otherwise. I thought I originally understood the intent, but now it's a bit confusing to me:
After the spec is added without making it the current, active table spec, how will it later be set as the current? With the current APIs a user has to again call updateSpec(), reconstruct the desired spec and then commit, and rely on the behavior of the implementation to essentially re-use the partition spec that was previously added.
At minimum though would advocate for a more clear API name which indicates what "nonDefault" is referring to; I'm proposing addNonDefaultSpec
or addInactiveSpec
* | ||
* @return this for method chaining | ||
*/ | ||
UpdatePartitionSpec notDefault(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call this addNonDefaultSpec
or addInactiveSpec
cc @RussellSpitzer @shanielh?
When I was going through the test code the API name felt a bit odd (the heuristic I try and aim for is does it sound right when speaking as a sentence).
table.updateSpec().addField("id").addNonDefaultSpec().commit();
reads more clear to me since it's more clear what the action is ("adding"), and what non-default is referring to ("spec")
inactive maybe a bit more extreme but technically it is not the active table spec if it's added, since every write to the table should just ignore that spec until it is made active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah just saw/remembered #10736 (comment), the write APIs Iceberg exposes allow passing in a given spec ID. So you're not really looking to set the spec as current but rather just leverage the existence of the spec in the write APIs. That's reasonable.
Cool, I think I'd still advocate for a different method name. #10736 (comment)
After that, I think the PR looks good to me.
503f8fb
to
175a391
Compare
Ok, I've modified the method name and squashed all commits into one. Thanks! |
175a391
to
5d69bc4
Compare
Revapi broken (as expected) please fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, as @RussellSpitzer mentioned you'll need to fix RevAPI. Also @RussellSpitzer not sure if you saw the naming change to make the API called addNonDefaultSpec
, just want to make sure you're also good with that!
A rose by any other name :) still +1 |
…as default This may be used in cases where we want to add a partition spec but not set the new partition spec as default. For example, if I want to make sure that there's an unpartitioned spec since I might want to commit specific data to it in some cases, I would first validate that there's an unpartition spec in the table, and if there isn't, I would try to add the unpartitioned spec but not set it as the new default partition spec
5d69bc4
to
ac1a9df
Compare
Fixed, thanks. |
@RussellSpitzer, @amogh-jahagirdar Any chance to merge this? All checks have passed and two approvals 😄 |
@shanielh In general we want to wait extra time on API module changes since we are committing the project to support this new API for ever. That said I think we've had this open long enough so I'll merge today. |
Thanks @shanielh for the PR, and @amogh-jahagirdar for Reviewing |
This may be used in cases where we want to add a partition spec but not set the new partition spec as default.
For example, if I want to make sure that there's an unpartitioned spec since I might want to commit specific data to it in some cases, I would first validate that there's an unpartition spec in the table, and if there isn't, I would try to add the unpartitioned spec but not set it as the new default partition spec