Skip to content

Core: do not allow setting fields nested in optional struct as identi…#4535

Merged
szehon-ho merged 1 commit intoapache:masterfrom
zhongyujiang:identifier-fields
Apr 25, 2022
Merged

Core: do not allow setting fields nested in optional struct as identi…#4535
szehon-ho merged 1 commit intoapache:masterfrom
zhongyujiang:identifier-fields

Conversation

@zhongyujiang
Copy link
Contributor

…fier fields and do not allow deleting columns with nested identifier fields

This PR does 2 things:

  1. Disable setting required fields nested in optional struct as identifier fields. Currently we only check if the identifier fields is required to prevent nullable identifier fields, but a required field nested in an optional struct is also nullable;
  2. Disable deletion of struct type columns with nested identifier fields.

@jackye1995 Could you help review this? Thanks!

"Cannot delete identifier field %s. To force deletion, " +
"also call setIdentifierFields to update identifier fields.", field);
idNotToDelete = idToParent.get(idNotToDelete);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now deleting struct type columns with nested identifier fields will also fail bacause this check, just the failure message can be a little confusing.

// validate identifier requirements based on the latest schema
Map<String, Integer> nameToId = TypeUtil.indexByName(struct);
Set<Integer> freshIdentifierFieldIds = Sets.newHashSet();
for (String name : identifierFieldNames) {
Preconditions.checkArgument(nameToId.containsKey(name),
"Cannot add field %s as an identifier field: not found in current schema or added columns", name);
freshIdentifierFieldIds.add(nameToId.get(name));
}

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, some style comments.

.setIdentifierFields("preferences.feature1")
.apply();

Assert.assertEquals("set existing nested field as identifier should succeed",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should remove this test?

Copy link
Contributor Author

@zhongyujiang zhongyujiang Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail now because preferences is optional, I'll add test against a required struct field.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, just two more very small comments, should be good to go after.

.setIdentifierFields("out.nested")
.apply();

AssertHelpers.assertThrows("delete an identifier column without setting identifier fields should fail",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the message can be clearer like 'delete a struct with a nested identifier column without setting identifier fields should fail'?

.setIdentifierFields("locations.key.zip")
.apply());

AssertHelpers.assertThrows("add a nested field in list should fail",
Copy link
Member

@szehon-ho szehon-ho Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this test, it should not need to be removed right? (if i understand, the first Precondition will still fail with the original message about nesting).

Seems the new test "required_list.element.x" is about required list, this can be kept and test about optional list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it still fails, only now the reason it fails is because the hidden middle field element is optional:

"Cannot add field x as an identifier field: must not be nested in an optional field 14: element: optional struct<15: x: required long, 16: y: required long>"
to contain:

This test is against "list" type, so I added the "required_list.element.x" test to make sure the Precondtion will fail for the "list" type reason, to replace it.

Copy link
Member

@szehon-ho szehon-ho Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see, it checks from bottom upward , and hits the "element" optional check first. What do you think about, we still keep the test as "add a nested field in optional list should fail", and change the assert message? This test may still have value as there's not another that test this case, unless I am missing seeing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this test, and I think changing the validation logic might be a better option than changing the assert message.
I feel we shouldn't expose the element or key-value field in Preconditions error message because these fields are invisible to the users and may confuse them. I think a validation from top to bottom can give a more correct error message.

Deque<Integer> deque = Lists.newLinkedList();
while (parentId != null) {
  deque.push(parentId);
  parentId = idToParent.get(parentId);
}

while (!deque.isEmpty()) {
  Types.NestedField parent = idToField.get(deque.pop());
  Preconditions.checkArgument(parent.type().isStructType(),
      "Cannot add field %s as an identifier field: must not be nested in %s", field.name(), parent);
  Preconditions.checkArgument(parent.isRequired(),
      "Cannot add field %s as an identifier field: must not be nested in an optional field %s",
      field.name(), parent);
}

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that sounds good to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can add a small code comment, like 'Exploring from root for better error message for list and map types'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zhongyujiang zhongyujiang requested a review from szehon-ho April 22, 2022 02:18
@szehon-ho szehon-ho merged commit 6188c77 into apache:master Apr 25, 2022
@szehon-ho
Copy link
Member

Merged, thanks @zhongyujiang

@zhongyujiang
Copy link
Contributor Author

Thanks for your review!

@zhongyujiang zhongyujiang deleted the identifier-fields branch April 26, 2022 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants