Skip to content

KoVal177/arrow-view-state

Repository files navigation

arrow-view-state

Crates.io version Docs.rs CI License: MIT OR Apache-2.0 MSRV

High-performance columnar permutation index and filter engine for Apache Arrow.

Overview

arrow-view-state is a pure indexing engine — no I/O, no column names, no pagination, no application state. It separates view-state computation (sorting, filtering, selection) from data access so that UI layers can reorder and filter millions of Arrow rows without copying or rewriting the underlying RecordBatches.

Features

  • SortBuilder — streaming multi-column sort; feed RecordBatches one at a time, finish into a PermutationIndex
  • PermutationIndex — virtual-to-physical row mapping; windowed reads, late materialisation
  • FilterIndex — sparse filter backed by Roaring Bitmap with set algebra (and, or, not)
  • PhysicalSelection — physical row IDs ready for Arrow take or Parquet row-selection pushdown
  • Parallel argsort via Rayon (optional, default-on)
  • Memory-mapped storage for large indices that exceed RAM (persist feature)
  • Hash index for O(1) equality lookups (hash-index feature)
  • Inverted index for token-based text search (inverted-index feature)
  • SIMD predicate evaluation via arrow-ord (evaluate feature)

Installation

[dependencies]
arrow-view-state = "0.1"

For WASM or minimal builds, disable the default parallel feature:

arrow-view-state = { version = "0.1", default-features = false }

Quick Start

use std::sync::Arc;
use arrow_array::{ArrayRef, Int64Array, StringArray, RecordBatch};
use arrow_schema::{Schema, Field, DataType, SortOptions};
use arrow_row::SortField;
use arrow_view_state::{SortBuilder, FilterIndex};

let schema = Arc::new(Schema::new(vec![
    Field::new("name",  DataType::Utf8,  false),
    Field::new("value", DataType::Int64, false),
]));
let batch = RecordBatch::try_new(schema.clone(), vec![
    Arc::new(StringArray::from(vec!["B", "A", "B", "A"])) as ArrayRef,
    Arc::new(Int64Array::from(vec![10, 20, 30, 40]))       as ArrayRef,
]).unwrap();

// Sort by name asc, then value desc.
let fields = vec![
    SortField::new(DataType::Utf8),
    SortField::new_with_options(
        DataType::Int64,
        SortOptions { descending: true, nulls_first: true },
    ),
];
let mut builder = SortBuilder::new(fields);
builder.push(&[batch.column(0).clone(), batch.column(1).clone()]).unwrap();
let sorted = builder.finish().unwrap();

let range     = sorted.read_range(0, 2);              // windowed read
let filter    = FilterIndex::from_ids([0, 2]);
let filtered  = filter.apply_to_permutation(&sorted); // sparse filter
let selection = sorted.to_physical_selection(0..2);   // late materialisation

Examples

See EXAMPLES.md for annotated walkthroughs.

Example What it shows
Sort & Window Multi-column sort, windowed read, late materialisation
Filter Algebra Composing FilterIndex with and / or / not
Persist to Disk Save and reload a PermutationIndex via the persist feature

Feature Flags

Flag Default Description
parallel Parallel argsort via Rayon
evaluate SIMD predicate evaluation → FilterIndex via arrow-ord
hash-index O(1) equality lookup index
inverted-index Token-based text search index
mmap Memory-mapped temp storage for large sorts
persist Save/load indices to disk (implies mmap)
full All of the above

Further Reading

MSRV

Minimum supported Rust version: 1.94 (edition 2024).

Contributing

See CONTRIBUTING.md.

License

Licensed under either of MIT or Apache-2.0 at your option.

About

High-performance columnar permutation index and filter engine for Apache Arrow

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages