-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Push projection expressions into ScanNode #33760
Comments
nealrichardson
added a commit
that referenced
this issue
Jan 18, 2023
This PR implements `$.Expression` and `[[.Expression` methods, such that if the Expression is a FieldRef, it returns a nested FieldRef. This required revising some assumptions in a few places, particularly that if an Expression is a FieldRef, it has a `name`, and that all FieldRefs correspond to a Field in a Schema. In the case where the Expression is not a FieldRef, it will create an Expression call to `struct_field` to extract the field, iff the Expression has a knowable `type`, the type is `StructType`, and the field name exists in the struct. Things not done because they weren't needed to get this working: * `Expression$field_ref()` take a vector to construct a nested ref * Method to return vector of nested components of a field ref in R Next steps for future PRs: * Wrap this in [tidyr::unpack()](https://tidyr.tidyverse.org/reference/pack.html) method (but unfortunately, unpack() is not a generic) * #33756 * #33757 * #33760 * Closes: #18818 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
nealrichardson
added a commit
that referenced
this issue
Jan 24, 2023
### Rationale for this change Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library. ### Are there any user-facing changes? Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner. * Closes: #33760 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
westonpace
pushed a commit
to westonpace/arrow
that referenced
this issue
Jan 25, 2023
…33770) ### Rationale for this change Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library. ### Are there any user-facing changes? Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner. * Closes: apache#33760 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
sjperkins
pushed a commit
to sjperkins/arrow
that referenced
this issue
Feb 10, 2023
…33770) ### Rationale for this change Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library. ### Are there any user-facing changes? Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner. * Closes: apache#33760 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
https://github.com/apache/arrow/pull/19706/files#r1073391100 pointed out that in creating the ScanNode, we're extracting field names from Expressions in order to pass them to C++, which then makes FieldRef Expressions again. We can probably eliminate that step. Doing so may mean we need to drop a following Project step (or not, we'll have to see), and if so that means our
show_query()
output would change too--but if the projection doesn't show up faithfully in the print method of the ScanNode, we may want to reconsider (or, better, improve the ScanNode print).Component(s)
R
The text was updated successfully, but these errors were encountered: