Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8354: [R] Fix segfault in Table to Array conversion #6871

Closed
wants to merge 3 commits into from

Conversation

fsaintjacques
Copy link
Contributor

@fsaintjacques fsaintjacques commented Apr 8, 2020

The Converter::Make did not like receiving empty ArrayVector. The bug was exposed in ARROW-8216 which could return an empty selection vector due to a randomly generated fixture in test-dplyr.R

@github-actions
Copy link

github-actions bot commented Apr 8, 2020

@@ -118,6 +119,15 @@ test_that("filter() with NAs in selection", {
)
})

test_that("Filter returning an empty Table should not segfault (ARROW-8354)", {
expect_dplyr_error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not error: it should return a 0-row data.frame:

filter(mtcars,FALSE)
##  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
## <0 rows> (or 0-length row.names)

I'll pull the branch and try to special-case that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsaintjacques I did this in f93f08c.

std::shared_ptr<Converter> Converter::Make(const ArrayVector& arrays) {
switch (arrays[0]->type_id()) {
if (arrays.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save us future pain, should we hunt for other places that assume >0 length vectors? grepping [0] in r/src hits a few places that look similarly suspect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked them all and found one bug, I'll add a test and the fix in the current PR.

@@ -656,8 +656,15 @@ class Converter_Null : public Converter {
return Status::OK();
}
};

std::shared_ptr<Converter> Converter::Make(const ArrayVector& arrays) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should accept the type of arrays explicitly; a Converter should not fail when arrays is empty. Instead the result of conversion should also be empty.

Copy link
Contributor Author

@fsaintjacques fsaintjacques Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated https://issues.apache.org/jira/browse/ARROW-7798 to include this. I tried to quickly refactor what you suggest but it bubbles to rewriting almost all methods/functions of the file.

Copy link
Contributor Author

@fsaintjacques fsaintjacques Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also lead me to find a bug with handling Dictionary ChunkedArray, https://issues.apache.org/jira/browse/ARROW-8374

@nealrichardson
Copy link
Member

#6878 may help give more visibility into the unexpected failures

fsaintjacques and others added 3 commits April 8, 2020 15:29
The Converter::Make did not like receiving empty ArrayVector. The bug
was introduced in ARROW-8216 which could return an empty selection
vector due to a randomly generated fixture in test-dplyr.R
@wesm
Copy link
Member

wesm commented Apr 8, 2020

Hmm, continuing to have arrow::r::Converter crash on no chunks isn't too great. It seems that in most cases the first element is being used to obtain type metadata. This could be fixed in most cases from what I can see by passing the type as a parameter to Converter::Make. Thoughts?

@nealrichardson
Copy link
Member

At least now it errors instead of segfaults, and I changed some of the choreography to keep away from the error. Agreed that we can do better, and I think what you say has been added to the scope of https://issues.apache.org/jira/browse/ARROW-7798.

I'm going to merge this since we've at least fixed the issue and have a passing build.

@wesm
Copy link
Member

wesm commented Apr 8, 2020

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants