Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Pushdown filters on augmented columns like fragment filename #20187

Open
asfimport opened this issue Apr 11, 2022 · 1 comment
Open

[C++] Pushdown filters on augmented columns like fragment filename #20187

asfimport opened this issue Apr 11, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

asfimport commented Apr 11, 2022

In the discussion on ARROW-15260, if we run the following code in R, we might expect it to push down the filter so we can just read in the relevant files:

  filter = Expression$create(
    "match_substring",
    Expression$field_ref("__filename"),
    options = list(pattern = "cyl=8")
  )

As mentioned by @westonpace:

"You might think we would get the hint and only read files matching that pattern. This is not the case. We will read the entire dataset and apply the "cyl=8" filter in memory.

If we want to pushdown filters on the filename column we will need to add some special logic."

Reporter: Nicola Crane / @thisisnic

Related issues:

Note: This issue was originally created as ARROW-16164. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
So this is possible. And something like regex on filename might be interesting. However, I'm not terribly motivated to work on this because:

  • In the above example the user could establish a partitioning on cyl and then just filter for cyl == 8.

  • For more general filename filtering the user can often do this themselves by creating a dataset, getting the list of files, picking the files they want, and then creating a new dataset from the smaller list of files.

    So it might be nice to first know of some key use cases that aren't solvable with other features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant