You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was originally reported by @jonkeane in ARROW-13893 but that issue was covering a different topic so I am opening a new issue for this specific behavior.
Weston Pace / @westonpace:
The call to head is triggering an (immediate?) call to the legacy scanner head method. The resulting dataset is then returned. Then the remaining dplyr execution is resolved against the in-memory data. ExecPlan is not used at all. So it is first fetching the first 4 rows and then sorting instead of sorting and then fetching.
If this is truly a blocker for 6.0.0 then it might be an problem. The head can't be applied in R because it would read in all of the data (presumably you could abort the read partway through but I think this would be overly complex).
If we want to do a proper ordered head in C++ then my recommendation would be the batch index scheme proposed in the sequencing doc here but I'm not sure we want to tackle that as part of 6.0.0.
As a short term solution we can modify the sorting sink node to accept a limit argument. That should be a reasonably quick solution and could maybe fit in 6.0.0 but I'm not sure how much time we want to invest in stop-gap measures.
This was originally reported by @jonkeane in ARROW-13893 but that issue was covering a different topic so I am opening a new issue for this specific behavior.
Reporter: Weston Pace / @westonpace
Assignee: Neal Richardson / @nealrichardson
Note: This issue was originally created as ARROW-14162. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: