-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Overview
This document describes the proposed RowSelection API for orc-rust, modeled after arrow-rs Parquet's implementation. This API will enable fine-grained row filtering at the decoder level, significantly reducing I/O and improving query performance.
Motivation
Currently, ORC readers must decode all rows in a stripe, even if only a subset is needed. RowSelection allows:
- Skip rows before decoding: Avoid reading and decompressing data for filtered rows
- Reduce I/O: Skip data pages that contain only filtered rows
- Improve performance: Lower CPU and memory usage by decoding only selected rows
- Page-level pruning: Leverage ORC row indexes to skip entire row groups
API Design
Core Types
/// A single selector specifying rows to select or skip
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
pub struct RowSelector {
/// The number of rows
pub row_count: usize,
/// If true, skip `row_count` rows
pub skip: bool,
}
impl RowSelector {
/// Select `row_count` rows
pub fn select(row_count: usize) -> Self {
Self {
row_count,
skip: false,
}
}
/// Skip `row_count` rows
pub fn skip(row_count: usize) -> Self {
Self {
row_count,
skip: true,
}
}
}
/// A collection of [`RowSelector`] used to skip rows when scanning an ORC file
///
/// Invariants:
/// * Contains no [`RowSelector`] of 0 rows
/// * Consecutive [`RowSelector`]s alternate between skipping and selecting
#[derive(Debug, Clone, Default, Eq, PartialEq)]
pub struct RowSelection {
selectors: Vec<RowSelector>,
}Usage Examples
Example 1: Skip First N Rows
use orc_rust::{ArrowReaderBuilder, RowSelection, RowSelector};
let selection = RowSelection::from(vec![
RowSelector::skip(10000), // Skip first 10k rows
RowSelector::select(1000000), // Read next 1M rows
]);
let reader = ArrowReaderBuilder::try_new(file_data)?
.with_row_selection(selection)
.build();
for batch in reader {
// Process only rows 10000-1010000
}Example 2: From Filter Results
use arrow::array::BooleanArray;
use orc_rust::RowSelection;
// After evaluating predicates against row indexes
let filter1 = BooleanArray::from(vec![true, false, true, false, true]);
let filter2 = BooleanArray::from(vec![false, true, true, false, true]);
let selection = RowSelection::from_filters(&[filter1, filter2]);
let reader = ArrowReaderBuilder::try_new(file_data)?
.with_row_selection(selection)
.build();Example 3: Combining with Stripe Filtering
use datafusion_orc::stripe_filter::StripeAccessPlanFilter;
use orc_rust::RowSelection;
// 1. First, filter stripes by statistics
let mut stripe_filter = StripeAccessPlanFilter::new(access_plan);
stripe_filter.prune_by_statistics(&schema, metadata, &predicate, &metrics);
let stripe_plan = stripe_filter.build();
// 2. For selected stripes, create row-level selection
let mut row_selections = Vec::new();
for stripe_idx in stripe_plan.stripe_indexes() {
// Use ORC row index to create fine-grained selection
let row_index = metadata.stripe_metadatas()[stripe_idx].row_index()?;
let selection = evaluate_predicate_on_row_index(&row_index, &predicate)?;
row_selections.push(selection);
}
// 3. Combine all selections
let combined = row_selections.into_iter()
.fold(RowSelection::default(), |acc, sel| acc.union(&sel));
// 4. Read with both stripe and row filtering
let reader = ArrowReaderBuilder::try_new(file_data)?
.with_row_groups(stripe_plan.stripe_indexes()) // Stripe filtering
.with_row_selection(combined) // Row filtering
.build();Example 4: Two-Stage Filtering
// Stage 1: Filter based on column A
let filter_a = evaluate_predicate_a(&row_index)?; // BooleanArray
let selection_a = RowSelection::from_filters(&[filter_a]);
// Stage 2: Among selected rows, filter based on column B
let filter_b = evaluate_predicate_b(&row_index)?; // BooleanArray
let selection_b = RowSelection::from_filters(&[filter_b]);
// Combine: only rows where both A and B match
let final_selection = selection_a.and_then(&selection_b);
let reader = ArrowReaderBuilder::try_new(file_data)?
.with_row_selection(final_selection)
.build();Implementation Phases
Phase 1: Basic API (MVP)
- Define
RowSelectorandRowSelectiontypes - Implement
FromIteratorand basic conversions - Add
with_row_selection()toArrowReaderBuilder - Pass selection through to stripe decoder
- Implement basic skip in decoders (decode and discard)
Phase 2: Efficient Skipping
- Implement efficient
skip_rows()for each decoder type - Skip without decoding for:
- Integer types (RLE skip)
- Boolean (skip bit runs)
- Strings (skip dictionary entries)
- Timestamps, decimals, etc.
Phase 3: Row Index Integration
- Parse ORC row indexes
- Implement
row_selection_from_row_index() - Integrate with stripe filtering
- Add benchmarks
Phase 4: Advanced Features
-
scan_ranges()for page-level I/O planning -
and_then()for multi-stage filtering -
intersection()andunion()operations - Expand to batch boundaries for caching
References
- Parquet RowSelection:
arrow-rs/parquet/src/arrow/arrow_reader/selection.rs - ORC Specification: https://orc.apache.org/specification/ORCv1/
- ORC Row Index: https://orc.apache.org/docs/indexes.html
- DataFusion Pruning:
datafusion/pruning/src/lib.rs
cxzl25, WenyXu and sollhui
Metadata
Metadata
Assignees
Labels
No labels