Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Oct 5, 2025

Summary

Implements Phase 2 of the RowSelection API as proposed in #58.
Related Issue: #58 (Phase 2 implementation)

What's Implemented

This PR implements Phase 2: Efficient Skipping of the ORC RowSelection API, building upon the MVP from Phase 1(#59). It significantly improves performance when using row filtering by implementing efficient skip operations for all major decoder types, avoiding unnecessary decoding and memory allocation.

Phase 2: Efficient Skipping

  • Implement efficient skip_rows() for each decoder type
  • Skip without decoding for:
    • Integer types (RLE skip)
    • Boolean (skip bit runs)
    • Strings (skip dictionary entries)
    • Timestamps, decimals, etc.

TODO

impl efficient skip for rle_v2

@suxiaogang223 suxiaogang223 marked this pull request as draft October 5, 2025 07:41
@suxiaogang223 suxiaogang223 changed the title feat: Impl Efficient Skipping for RowSelection (Phase 2) feat: Implement Efficient Skipping for RowSelection (Phase 2) Oct 5, 2025
@suxiaogang223 suxiaogang223 marked this pull request as ready for review October 5, 2025 09:08
@suxiaogang223
Copy link
Contributor Author

Hi, @WenyXu This pr is ready for review🚀. This pr implements the skip method for each decoder except rle_v2 decoder. Because the decoder of rle_v2 is very complex🥲, it needs to be implemented in a separate pr in the future.

Copy link
Collaborator

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@WenyXu WenyXu merged commit b1b70f7 into datafusion-contrib:main Oct 10, 2025
12 checks passed
@sunng87 sunng87 mentioned this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants