Skip to content

ORC RowSelection API Design #58

@suxiaogang223

Description

@suxiaogang223

Overview

This document describes the proposed RowSelection API for orc-rust, modeled after arrow-rs Parquet's implementation. This API will enable fine-grained row filtering at the decoder level, significantly reducing I/O and improving query performance.

Motivation

Currently, ORC readers must decode all rows in a stripe, even if only a subset is needed. RowSelection allows:

  • Skip rows before decoding: Avoid reading and decompressing data for filtered rows
  • Reduce I/O: Skip data pages that contain only filtered rows
  • Improve performance: Lower CPU and memory usage by decoding only selected rows
  • Page-level pruning: Leverage ORC row indexes to skip entire row groups

API Design

Core Types

/// A single selector specifying rows to select or skip
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
pub struct RowSelector {
    /// The number of rows
    pub row_count: usize,
    
    /// If true, skip `row_count` rows
    pub skip: bool,
}

impl RowSelector {
    /// Select `row_count` rows
    pub fn select(row_count: usize) -> Self {
        Self {
            row_count,
            skip: false,
        }
    }
    
    /// Skip `row_count` rows
    pub fn skip(row_count: usize) -> Self {
        Self {
            row_count,
            skip: true,
        }
    }
}

/// A collection of [`RowSelector`] used to skip rows when scanning an ORC file
///
/// Invariants:
/// * Contains no [`RowSelector`] of 0 rows
/// * Consecutive [`RowSelector`]s alternate between skipping and selecting
#[derive(Debug, Clone, Default, Eq, PartialEq)]
pub struct RowSelection {
    selectors: Vec<RowSelector>,
}

Usage Examples

Example 1: Skip First N Rows

use orc_rust::{ArrowReaderBuilder, RowSelection, RowSelector};

let selection = RowSelection::from(vec![
    RowSelector::skip(10000),      // Skip first 10k rows
    RowSelector::select(1000000),  // Read next 1M rows
]);

let reader = ArrowReaderBuilder::try_new(file_data)?
    .with_row_selection(selection)
    .build();

for batch in reader {
    // Process only rows 10000-1010000
}

Example 2: From Filter Results

use arrow::array::BooleanArray;
use orc_rust::RowSelection;

// After evaluating predicates against row indexes
let filter1 = BooleanArray::from(vec![true, false, true, false, true]);
let filter2 = BooleanArray::from(vec![false, true, true, false, true]);

let selection = RowSelection::from_filters(&[filter1, filter2]);

let reader = ArrowReaderBuilder::try_new(file_data)?
    .with_row_selection(selection)
    .build();

Example 3: Combining with Stripe Filtering

use datafusion_orc::stripe_filter::StripeAccessPlanFilter;
use orc_rust::RowSelection;

// 1. First, filter stripes by statistics
let mut stripe_filter = StripeAccessPlanFilter::new(access_plan);
stripe_filter.prune_by_statistics(&schema, metadata, &predicate, &metrics);
let stripe_plan = stripe_filter.build();

// 2. For selected stripes, create row-level selection
let mut row_selections = Vec::new();
for stripe_idx in stripe_plan.stripe_indexes() {
    // Use ORC row index to create fine-grained selection
    let row_index = metadata.stripe_metadatas()[stripe_idx].row_index()?;
    let selection = evaluate_predicate_on_row_index(&row_index, &predicate)?;
    row_selections.push(selection);
}

// 3. Combine all selections
let combined = row_selections.into_iter()
    .fold(RowSelection::default(), |acc, sel| acc.union(&sel));

// 4. Read with both stripe and row filtering
let reader = ArrowReaderBuilder::try_new(file_data)?
    .with_row_groups(stripe_plan.stripe_indexes())  // Stripe filtering
    .with_row_selection(combined)                    // Row filtering
    .build();

Example 4: Two-Stage Filtering

// Stage 1: Filter based on column A
let filter_a = evaluate_predicate_a(&row_index)?;  // BooleanArray
let selection_a = RowSelection::from_filters(&[filter_a]);

// Stage 2: Among selected rows, filter based on column B
let filter_b = evaluate_predicate_b(&row_index)?;  // BooleanArray
let selection_b = RowSelection::from_filters(&[filter_b]);

// Combine: only rows where both A and B match
let final_selection = selection_a.and_then(&selection_b);

let reader = ArrowReaderBuilder::try_new(file_data)?
    .with_row_selection(final_selection)
    .build();

Implementation Phases

Phase 1: Basic API (MVP)

  • Define RowSelector and RowSelection types
  • Implement FromIterator and basic conversions
  • Add with_row_selection() to ArrowReaderBuilder
  • Pass selection through to stripe decoder
  • Implement basic skip in decoders (decode and discard)

Phase 2: Efficient Skipping

  • Implement efficient skip_rows() for each decoder type
  • Skip without decoding for:
    • Integer types (RLE skip)
    • Boolean (skip bit runs)
    • Strings (skip dictionary entries)
    • Timestamps, decimals, etc.

Phase 3: Row Index Integration

  • Parse ORC row indexes
  • Implement row_selection_from_row_index()
  • Integrate with stripe filtering
  • Add benchmarks

Phase 4: Advanced Features

  • scan_ranges() for page-level I/O planning
  • and_then() for multi-stage filtering
  • intersection() and union() operations
  • Expand to batch boundaries for caching

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions