-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Fix NPE when validating identifier fields #4372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Fix NPE when validating identifier fields #4372
Conversation
|
I have a unit test but I'm not sure it's worth to add it. package org.apache.iceberg;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
import org.apache.iceberg.types.Types;
import org.junit.Assert;
import org.junit.Test;
public class TestSchema {
@Test
public void testValidateIdentifierField() {
try {
new Schema(
Types.StructType.of(Types.NestedField.required(1, "id", Types.StringType.get())).fields(),
ImmutableSet.of(2));
Assert.fail("Should have failed because identifier id 2 does not exist");
} catch (IllegalArgumentException e) {
Assert.assertTrue(e.getMessage().contains("field not exists"));
} // all other exception fails the test
}
} |
803df95 to
e61479a
Compare
I think we can put one in TestSchemaUpdate (where the original author seems to put some test of the other preconditions?) |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check looks good, some comments (inline and above)
| Map<Integer, Integer> idToParent) { | ||
| Types.NestedField field = idToField.get(fieldId); | ||
| Preconditions.checkArgument(field != null, | ||
| "Can not add filedId %d as an identifier field: field not exists", fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo 'fileId'. Should we put something like 'field with id %s'?
Also, I thought Preconditions only supports %s, does this actually work?
https://javadoc.io/doc/com.google.guava/guava/latest/com/google/common/base/Preconditions.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change it to Preconditions.checkArgument(null != field, "Cannot use fieldId %s: field does not exist", fieldId);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding a unit test in TestSchemaUpdate alongside the others.
Here's an example testing similar preconditions in this file that you can adapt your unit test to:
iceberg/core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java
Lines 623 to 630 in fafe33a
| public void testUpdateMissingColumn() { | |
| AssertHelpers.assertThrows("Should reject rename missing column", | |
| IllegalArgumentException.class, "missing column: col", () -> { | |
| UpdateSchema update = new SchemaUpdate(SCHEMA, SCHEMA_LAST_COLUMN_ID); | |
| update.updateColumn("col", Types.DateType.get()); | |
| } | |
| ); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I thought Preconditions only supports %s, does this actually work?
https://javadoc.io/doc/com.google.guava/guava/latest/com/google/common/base/Preconditions.html
You are right. I changed to %s now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change it to Preconditions.checkArgument(null != field, "Cannot use fieldId %s: field does not exist", fieldId);
I replaced can not with cannot which is also suggested in another comment. I think filed != null reads smoother and I see more places use this pattern. Keep "as an identifier field" in error message as this error may happen when constructing a full Schema, where it's clearer to report the error is for identifier fields.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liuml07.
I left some comments.
| Map<Integer, Integer> idToParent) { | ||
| Types.NestedField field = idToField.get(fieldId); | ||
| Preconditions.checkArgument(field != null, | ||
| "Can not add filedId %d as an identifier field: field not exists", fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding a unit test in TestSchemaUpdate alongside the others.
Here's an example testing similar preconditions in this file that you can adapt your unit test to:
iceberg/core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java
Lines 623 to 630 in fafe33a
| public void testUpdateMissingColumn() { | |
| AssertHelpers.assertThrows("Should reject rename missing column", | |
| IllegalArgumentException.class, "missing column: col", () -> { | |
| UpdateSchema update = new SchemaUpdate(SCHEMA, SCHEMA_LAST_COLUMN_ID); | |
| update.updateColumn("col", Types.DateType.get()); | |
| } | |
| ); | |
| } |
| Map<Integer, Integer> idToParent) { | ||
| Types.NestedField field = idToField.get(fieldId); | ||
| Preconditions.checkArgument(field != null, | ||
| "Can not add filedId %d as an identifier field: field not exists", fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can not -> Cannot.
RE the phrasing: In this case, the phrasing does match the rest of the file, though it's not what we normally use in Iceberg. Normally we use {Problem} {Cause} and possibly {Problem} {Cause}: {relevant value}. And the messages ideally read as plain English phrases.
So it's abnormal to see the : in the middle, but it does match the rest of the file. I would say Cannot add identifier partition field from missing fieldId: %d and then let the rest be inferred from the stack trace, as adding a partition field is what the user is trying to do here.
But in this case, your phrasing is similar to the rest of the file so it's not the worst phrasing.
So I'd update Cannot and fieldId typos and then add the test. The phrase is easy to update, but the test is needed regardless.
|
Thank you all for prompt reviews! I have updated the PR a bit. |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good to me, one nit below, and see if the others have more comments.
| Map<Integer, Integer> idToParent) { | ||
| Types.NestedField field = idToField.get(fieldId); | ||
| Preconditions.checkArgument(field != null, | ||
| "Cannot add fieldId %s as an identifier field: field not exists", fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "field does not exist" (more gramatically correct)
If the identifier field lookup by id fails, the Schema validation fails with NPE. I think it's better to fail with more meaningful exception and message.