You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the discussion on ARROW-15260, if we run the following code in R, we might expect it to push down the filter so we can just read in the relevant files:
"You might think we would get the hint and only read files matching that pattern. This is not the case. We will read the entire dataset and apply the "cyl=8" filter in memory.
If we want to pushdown filters on the filename column we will need to add some special logic."
Weston Pace / @westonpace:
So this is possible. And something like regex on filename might be interesting. However, I'm not terribly motivated to work on this because:
In the above example the user could establish a partitioning on cyl and then just filter for cyl == 8.
For more general filename filtering the user can often do this themselves by creating a dataset, getting the list of files, picking the files they want, and then creating a new dataset from the smaller list of files.
So it might be nice to first know of some key use cases that aren't solvable with other features.
In the discussion on ARROW-15260, if we run the following code in R, we might expect it to push down the filter so we can just read in the relevant files:
As mentioned by @westonpace:
"You might think we would get the hint and only read files matching that pattern. This is not the case. We will read the entire dataset and apply the "cyl=8" filter in memory.
If we want to pushdown filters on the filename column we will need to add some special logic."
Reporter: Nicola Crane / @thisisnic
Related issues:
Note: This issue was originally created as ARROW-16164. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: