Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Push projection expressions into ScanNode #33760

Closed
nealrichardson opened this issue Jan 18, 2023 · 0 comments · Fixed by #33770
Closed

[R] Push projection expressions into ScanNode #33760

nealrichardson opened this issue Jan 18, 2023 · 0 comments · Fixed by #33770

Comments

@nealrichardson
Copy link
Member

Describe the enhancement requested

https://github.com/apache/arrow/pull/19706/files#r1073391100 pointed out that in creating the ScanNode, we're extracting field names from Expressions in order to pass them to C++, which then makes FieldRef Expressions again. We can probably eliminate that step. Doing so may mean we need to drop a following Project step (or not, we'll have to see), and if so that means our show_query() output would change too--but if the projection doesn't show up faithfully in the print method of the ScanNode, we may want to reconsider (or, better, improve the ScanNode print).

Component(s)

R

nealrichardson added a commit that referenced this issue Jan 18, 2023
This PR implements `$.Expression` and `[[.Expression` methods, such that if the Expression is a FieldRef, it returns a nested FieldRef. This required revising some assumptions in a few places, particularly that if an Expression is a FieldRef, it has a `name`, and that all FieldRefs correspond to a Field in a Schema. In the case where the Expression is not a FieldRef, it will create an Expression call to `struct_field` to extract the field, iff the Expression has a knowable `type`, the type is `StructType`, and the field name exists in the struct. 

Things not done because they weren't needed to get this working:

  * `Expression$field_ref()` take a vector to construct a nested ref
  * Method to return vector of nested components of a field ref in R

Next steps for future PRs:

* Wrap this in [tidyr::unpack()](https://tidyr.tidyverse.org/reference/pack.html) method (but unfortunately, unpack() is not a generic)
* #33756
* #33757
* #33760

* Closes: #18818

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
@nealrichardson nealrichardson added this to the 12.0.0 milestone Jan 24, 2023
nealrichardson added a commit that referenced this issue Jan 24, 2023
### Rationale for this change

Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library.

### Are there any user-facing changes?

Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner.

* Closes: #33760

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
westonpace pushed a commit to westonpace/arrow that referenced this issue Jan 25, 2023
…33770)

### Rationale for this change

Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library.

### Are there any user-facing changes?

Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner.

* Closes: apache#33760

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
…33770)

### Rationale for this change

Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 with the goal of deleting and simplifying some code. As it turned out, it was more about moving code from the R bindings to the C++ library.

### Are there any user-facing changes?

Not for R users, but this fixes a bug in the dataset C++ library where nested field refs could not be handled by the scanner.

* Closes: apache#33760

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant