-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Move a column with the same name as a deleted column #8325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm not sure, for me I would do this in two actions. remove the old column, then add the new column. |
it won't work I think, remove and add will only make the column as the last column in the schema , but I want to move the column to a certain place . Additionally it leaves the table in the risk of a partially updated schema if the job failed after delete and before add you can remove + add in a single transaction and then move in the other one, but still it has the problem of the partially updated schema |
|
@CodingCat Thanks for raising this PR, and sorry for the long wait. As you mentioned in the PR description, this operation should be able to be done in a single commit. I dug into it, and I was able to let the test pass (thank you for writing those!) with the following change: private Integer findForMove(String name) {
Types.NestedField field = findField(name);
if (field != null) {
return field.fieldId();
}
return addedNameToId.get(name);
}To: private Integer findForMove(String name) {
Integer addedId = addedNameToId.get(name);
if (addedId != null) {
return addedId;
}
Types.NestedField field = findField(name);
if (field != null) {
return field.fieldId();
}
return null;
}This way we keep all the logic of moving fields in the same place. WDYT? |
Hi, @Fokko thanks for the review and comments, I agree, this is a cleaner fix, I updated the PR |
this is a flaky test? |
2174a22 to
53508d7
Compare
|
@Fokko ok, all tests passed now, thank you for the review! would you mind taking another look? |
|
gentle ping @Fokko ? would you mind giving it another review? thank you! |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodingCat Thanks for pinging me, this looks great! 👍
currently, users are not allowed to do the following operation which is valid use case when we want to change a column type and don't care about old data anymore
(of course we can add/delete columns, commit and then move columns and commit again, but this workaround brings two transactions which leaves a batch data application in the risk of a partially updated schema when it failed in the middle)
when user run the above code, the below line will throw an NoSuchElement
exception
iceberg/core/src/main/java/org/apache/iceberg/SchemaUpdate.java
Line 754 in bb36f28
more details analysis: code
because
List<Types.NestedField> fieldsonly contains undeleted column: in the above example, id (with field 1) is not there. Additionally, when we build theCollection<Move> movesthe.moveAfter("id", "data")will refer to field 1 instead of 4 (representing the column with the same name added byaddRequiredColumn("id", Types.IntegerType.get()))the proposal in this PR is essentially to reconstruct
movesin the constructor of ApplyChanges by replacing the references to the deleted column with a reference to the newly added column with the same name