Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10566: [C++] Allow validating ArrayData directly #8652

Closed
wants to merge 2 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Nov 12, 2020

Having to construct an Array from ArrayData entails various assertions which may fail.
It is therefore safer to be able to validate the data before.

@pitrou pitrou requested a review from bkietz November 12, 2020 17:44
@github-actions
Copy link

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! A few minor notes

cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Show resolved Hide resolved
}
// Check offsets are in bounds
const int32_t* offsets = data.GetValues<int32_t>(2);
for (int64_t i = 0; i < data.length; ++i) {
const int32_t code = type_codes[i];
const int32_t offset = offsets[i];
if (offset < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also validate that the offsets are strictly increasing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not required for unions. cc @wesm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://arrow.apache.org/docs/format/Columnar.html#dense-union

"The respective offsets for each child value array must be in order / increasing."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to happen in this PR: https://issues.apache.org/jira/browse/ARROW-10580

Having to construct an Array from ArrayData entails various assertions which may fail.
It is therefore safer to be able to validate the data *before*.
@bkietz bkietz closed this in 7be266b Nov 13, 2020
@pitrou pitrou deleted the ARROW-10566-validate-data branch November 13, 2020 14:33
yordan-pavlov pushed a commit to yordan-pavlov/arrow that referenced this pull request Nov 14, 2020
Having to construct an Array from ArrayData entails various assertions which may fail.
It is therefore safer to be able to validate the data *before*.

Closes apache#8652 from pitrou/ARROW-10566-validate-data

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Having to construct an Array from ArrayData entails various assertions which may fail.
It is therefore safer to be able to validate the data *before*.

Closes apache#8652 from pitrou/ARROW-10566-validate-data

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants