-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8354: [R] Fix segfault in Table to Array conversion #6871
Conversation
r/tests/testthat/test-dplyr.R
Outdated
@@ -118,6 +119,15 @@ test_that("filter() with NAs in selection", { | |||
) | |||
}) | |||
|
|||
test_that("Filter returning an empty Table should not segfault (ARROW-8354)", { | |||
expect_dplyr_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not error: it should return a 0-row data.frame:
filter(mtcars,FALSE)
## [1] mpg cyl disp hp drat wt qsec vs am gear carb
## <0 rows> (or 0-length row.names)
I'll pull the branch and try to special-case that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsaintjacques I did this in f93f08c.
std::shared_ptr<Converter> Converter::Make(const ArrayVector& arrays) { | ||
switch (arrays[0]->type_id()) { | ||
if (arrays.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save us future pain, should we hunt for other places that assume >0 length vectors? grepping [0]
in r/src
hits a few places that look similarly suspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked them all and found one bug, I'll add a test and the fix in the current PR.
@@ -656,8 +656,15 @@ class Converter_Null : public Converter { | |||
return Status::OK(); | |||
} | |||
}; | |||
|
|||
std::shared_ptr<Converter> Converter::Make(const ArrayVector& arrays) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should accept the type of arrays
explicitly; a Converter should not fail when arrays is empty. Instead the result of conversion should also be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated https://issues.apache.org/jira/browse/ARROW-7798 to include this. I tried to quickly refactor what you suggest but it bubbles to rewriting almost all methods/functions of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also lead me to find a bug with handling Dictionary ChunkedArray, https://issues.apache.org/jira/browse/ARROW-8374
#6878 may help give more visibility into the unexpected failures |
The Converter::Make did not like receiving empty ArrayVector. The bug was introduced in ARROW-8216 which could return an empty selection vector due to a randomly generated fixture in test-dplyr.R
f93f08c
to
c8e2c9f
Compare
Hmm, continuing to have |
At least now it errors instead of segfaults, and I changed some of the choreography to keep away from the error. Agreed that we can do better, and I think what you say has been added to the scope of https://issues.apache.org/jira/browse/ARROW-7798. I'm going to merge this since we've at least fixed the issue and have a passing build. |
👍 |
The Converter::Make did not like receiving empty ArrayVector. The bug was exposed in ARROW-8216 which could return an empty selection vector due to a randomly generated fixture in test-dplyr.R