-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Retain schema in new Expressions #28820
Comments
Neal Richardson / @nealrichardson: |
Ian Cook / @ianmcook: is.double(x + y) If that seems like a silly or contrived example: There are some R functions in which we need to know what the type of the input is in order to know which Arrow kernels to call. For example, to make |
Neal Richardson / @nealrichardson: In case (2), I would think we could handle this in a more targeted way inside of where() or across() etc. Re: ARROW-12055, I see where you're going with it but it feels like we're hacking on ourselves to support that, and we shouldn't have to do that. I'd personally prefer to add is_nan methods for all other types in C++ (always returning false). My pushback comes from various past experiences of trying to hack together interfaces that seemingly need to track their state, and trying to get certain APIs to conform to expectations from R. Sometimes that's the right choice, but it's a slippery slope and we should spend some extra time looking for a cleaner solution before going down it. |
Ian Cook / @ianmcook: The principle I had in mind with this PR is straightforward: An expression can optionally have a schema bound to it, and once a schema is bound to an expression, it will stay bound to derivative expressions. Before this PR, as soon as you create an expression inside a dplyr verb, you immediately lose the ability to know what its type is, despite the fact that it is fully knowable. You need to wait until map(.data$selected_columns, ~(.$schema <- .data$.data$schema)) After this PR, the types of expressions are known at all times. I think it is a clean solution; it's +7, -2 lines of actual package code; everything else is tests, docs, etc. If the central objection here is that this causes the I experimented with trying to achieve the changes in this PR through changes in |
Ian Cook / @ianmcook: |
Ian Cook / @ianmcook:
|
When a new Expression is created,
schema
should be retained from the expression(s) it was created from. That way, thetype()
andtype_id()
methods of the new Expression will work. For example, currently this happens:This is what we want to happen:
Reporter: Ian Cook / @ianmcook
Assignee: Ian Cook / @ianmcook
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-13117. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: