Skip to content

Conversation

@liuml07
Copy link
Member

@liuml07 liuml07 commented Mar 22, 2022

If the identifier field lookup by id fails, the Schema validation fails with NPE. I think it's better to fail with more meaningful exception and message.

@github-actions github-actions bot added the API label Mar 22, 2022
@liuml07
Copy link
Member Author

liuml07 commented Mar 22, 2022

I have a unit test but I'm not sure it's worth to add it.

package org.apache.iceberg;

import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
import org.apache.iceberg.types.Types;
import org.junit.Assert;
import org.junit.Test;

public class TestSchema {
  @Test
  public void testValidateIdentifierField() {
    try {
      new Schema(
          Types.StructType.of(Types.NestedField.required(1, "id", Types.StringType.get())).fields(),
          ImmutableSet.of(2));
      Assert.fail("Should have failed because identifier id 2 does not exist");
    } catch (IllegalArgumentException e) {
      Assert.assertTrue(e.getMessage().contains("field not exists"));
    } // all other exception fails the test
  }
}

@liuml07 liuml07 force-pushed the fix-npe-schema-validate-identifer-fields branch from 803df95 to e61479a Compare March 22, 2022 00:54
@szehon-ho
Copy link
Member

I have a unit test but I'm not sure it's worth to add it.

package org.apache.iceberg;

import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
import org.apache.iceberg.types.Types;
import org.junit.Assert;
import org.junit.Test;

public class TestSchema {
  @Test
  public void testValidateIdentifierField() {
    try {
      new Schema(
          Types.StructType.of(Types.NestedField.required(1, "id", Types.StringType.get())).fields(),
          ImmutableSet.of(2));
      Assert.fail("Should have failed because identifier id 2 does not exist");
    } catch (IllegalArgumentException e) {
      Assert.assertTrue(e.getMessage().contains("field not exists"));
    } // all other exception fails the test
  }
}

I think we can put one in TestSchemaUpdate (where the original author seems to put some test of the other preconditions?)

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check looks good, some comments (inline and above)

Map<Integer, Integer> idToParent) {
Types.NestedField field = idToField.get(fieldId);
Preconditions.checkArgument(field != null,
"Can not add filedId %d as an identifier field: field not exists", fieldId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'fileId'. Should we put something like 'field with id %s'?

Also, I thought Preconditions only supports %s, does this actually work?
https://javadoc.io/doc/com.google.guava/guava/latest/com/google/common/base/Preconditions.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change it to Preconditions.checkArgument(null != field, "Cannot use fieldId %s: field does not exist", fieldId);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding a unit test in TestSchemaUpdate alongside the others.

Here's an example testing similar preconditions in this file that you can adapt your unit test to:

public void testUpdateMissingColumn() {
AssertHelpers.assertThrows("Should reject rename missing column",
IllegalArgumentException.class, "missing column: col", () -> {
UpdateSchema update = new SchemaUpdate(SCHEMA, SCHEMA_LAST_COLUMN_ID);
update.updateColumn("col", Types.DateType.get());
}
);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I thought Preconditions only supports %s, does this actually work?
https://javadoc.io/doc/com.google.guava/guava/latest/com/google/common/base/Preconditions.html

You are right. I changed to %s now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change it to Preconditions.checkArgument(null != field, "Cannot use fieldId %s: field does not exist", fieldId);

I replaced can not with cannot which is also suggested in another comment. I think filed != null reads smoother and I see more places use this pattern. Keep "as an identifier field" in error message as this error may happen when constructing a full Schema, where it's clearer to report the error is for identifier fields.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liuml07.

I left some comments.

Map<Integer, Integer> idToParent) {
Types.NestedField field = idToField.get(fieldId);
Preconditions.checkArgument(field != null,
"Can not add filedId %d as an identifier field: field not exists", fieldId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding a unit test in TestSchemaUpdate alongside the others.

Here's an example testing similar preconditions in this file that you can adapt your unit test to:

public void testUpdateMissingColumn() {
AssertHelpers.assertThrows("Should reject rename missing column",
IllegalArgumentException.class, "missing column: col", () -> {
UpdateSchema update = new SchemaUpdate(SCHEMA, SCHEMA_LAST_COLUMN_ID);
update.updateColumn("col", Types.DateType.get());
}
);
}

Map<Integer, Integer> idToParent) {
Types.NestedField field = idToField.get(fieldId);
Preconditions.checkArgument(field != null,
"Can not add filedId %d as an identifier field: field not exists", fieldId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can not -> Cannot.

RE the phrasing: In this case, the phrasing does match the rest of the file, though it's not what we normally use in Iceberg. Normally we use {Problem} {Cause} and possibly {Problem} {Cause}: {relevant value}. And the messages ideally read as plain English phrases.

So it's abnormal to see the : in the middle, but it does match the rest of the file. I would say Cannot add identifier partition field from missing fieldId: %d and then let the rest be inferred from the stack trace, as adding a partition field is what the user is trying to do here.

But in this case, your phrasing is similar to the rest of the file so it's not the worst phrasing.

So I'd update Cannot and fieldId typos and then add the test. The phrase is easy to update, but the test is needed regardless.

@github-actions github-actions bot added the core label Mar 22, 2022
@liuml07
Copy link
Member Author

liuml07 commented Mar 22, 2022

Thank you all for prompt reviews! I have updated the PR a bit.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me, one nit below, and see if the others have more comments.

Map<Integer, Integer> idToParent) {
Types.NestedField field = idToField.get(fieldId);
Preconditions.checkArgument(field != null,
"Cannot add fieldId %s as an identifier field: field not exists", fieldId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "field does not exist" (more gramatically correct)

@szehon-ho szehon-ho merged commit 718ff6a into apache:master Mar 23, 2022
@szehon-ho
Copy link
Member

Merged, thanks @liuml07 for change and @kbendick @nastra for additional review

@liuml07 liuml07 deleted the fix-npe-schema-validate-identifer-fields branch April 3, 2022 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants