Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] The new scan node should use values from fragment guarantees instead of loading them from disk #15059

Closed
westonpace opened this issue Dec 21, 2022 · 0 comments · Fixed by #15129

Comments

@westonpace
Copy link
Member

Describe the enhancement requested

The main reason we need to do this is because the columns are not always going to be on the disk (right now the new scan node fails in this case). It's also a performance enhancement to skip loading of these columns as well. The solution will, I suspect, also lay the groundwork for adding support for the augmented columns as well (filename, batch index, file index)

Component(s)

C++

westonpace added a commit to westonpace/arrow that referenced this issue Dec 26, 2022
…instead of loading the data from the fragment
westonpace added a commit to westonpace/arrow that referenced this issue Dec 30, 2022
…instead of loading the data from the fragment
westonpace added a commit to westonpace/arrow that referenced this issue Feb 22, 2023
…instead of loading the data from the fragment
westonpace added a commit to westonpace/arrow that referenced this issue Feb 23, 2023
…instead of loading the data from the fragment
westonpace added a commit that referenced this issue Feb 25, 2023
…stead of fragment (#15129)

If a fragment has a guarantee like `x == 5` then we don't need to load the column `x` from disk and can instead just use the scalar `5`.  This is not just a performance improvement.  In many cases, users will create partitioned datasets without actually storing the partition value as a separate column (e.g. the file `my_dataset/x=5/foo.parquet` will not have a column named `x`)
* Closes: #15059

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 12.0.0 milestone Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant