Skip to content

0xDaizz/cudf-rs

cudf-rs

License Crates.io docs.rs Rust 1.85+ CUDA 12.2+ Ask DeepWiki

Unofficial Rust bindings for NVIDIA's libcudf -- GPU-accelerated DataFrame operations.

This project is unofficial and not affiliated with NVIDIA or RAPIDS.

Features

  • Near-zero unsafe public API -- all unsafe is confined to the internal FFI layer (sole exception: DLPackTensor::from_raw_ptr)
  • Zero-cost FFI -- cxx bridge with no serialization overhead
  • 61 bridge modules covering the full libcudf surface: compute, I/O, strings, nested types, interop
  • Arrow interop -- conversion to/from arrow-rs via Arrow C Data Interface and IPC
  • Builder-pattern I/O -- ParquetReader, CsvReader, JsonReader, OrcReader, AvroReader
  • GPU string processing -- case, find, replace, split, regex, extract, and more
  • RAII memory management -- Column and Table drops free GPU memory automatically

Prerequisites

Requirement Version
OS Linux (libcudf does not support macOS or Windows)
GPU NVIDIA Volta or newer (compute capability 7.0+)
CUDA 12.2+
libcudf 24.0+ (tested with 26.2.1)
Rust 1.85+ (edition 2024)

Installation

1. Install libcudf

Conda (recommended):

conda create -n cudf-dev -c rapidsai -c conda-forge libcudf cuda-version=12.2
conda activate cudf-dev

From source:

Follow the RAPIDS build guide, then set:

export CUDF_ROOT=/path/to/libcudf/prefix
# expects: $CUDF_ROOT/lib/libcudf.so and $CUDF_ROOT/include/cudf/

2. Set CUDA path (if non-standard)

export CUDA_PATH=/usr/local/cuda

3. Add to your project

[dependencies]
cudf = "0.2"

4. Build

cargo build --release

Quick Start

use cudf::{Column, Table, Result};
use cudf::sorting::{SortOrder, NullOrder};
use cudf::groupby::GroupBy;
use cudf::aggregation::AggregationKind;
use cudf::io::parquet;

fn main() -> Result<()> {
    // Create GPU columns from host data
    let ids    = Column::from_slice(&[1i32, 1, 2, 2, 3])?;
    let values = Column::from_slice(&[10.0f64, 20.0, 30.0, 40.0, 50.0])?;
    let table  = Table::new(vec![ids, values])?;

    // Sort
    let sorted = table.sort(
        &[SortOrder::Ascending, SortOrder::Descending],
        &[NullOrder::After,     NullOrder::After],
    )?;

    // GroupBy
    let keys   = Table::new(vec![Column::from_slice(&[1i32, 1, 2, 2, 3])?])?;
    let vals   = Table::new(vec![Column::from_slice(&[10.0f64, 20.0, 30.0, 40.0, 50.0])?])?;
    let result = GroupBy::new(&keys)
        .agg(0, AggregationKind::Sum)
        .execute(&vals)?;

    // Parquet I/O
    parquet::write_parquet(&table, "/tmp/test.parquet")?;
    let loaded = parquet::read_parquet("/tmp/test.parquet")?;
    assert_eq!(loaded.num_rows(), 5);

    // Read data back to host
    let col  = loaded.column(0)?;
    let data: Vec<i32> = col.to_vec()?;
    println!("{:?}", data);

    Ok(())
}

Architecture

+----------------------------------------------+
|  cudf       -- safe, idiomatic Rust API      |
|             -- Column, Table, GroupBy, I/O   |
+----------------------------------------------+
                      |
+----------------------------------------------+
|  cudf-cxx   -- cxx bridge + C++ shim layer   |
|             -- one bridge per libcudf module  |
+----------------------------------------------+
                      |
+----------------------------------------------+
|  cudf-sys   -- links libcudf.so (build only) |
|             -- CUDF_ROOT / conda / pkg-config|
+----------------------------------------------+
                      |
           NVIDIA libcudf (C++)

Each libcudf C++ module maps to three files:

Layer Files Role
C++ shim cudf-cxx/cpp/{include,src}/{module}_shim.{h,cpp} Wraps libcudf types for cxx compatibility
cxx bridge cudf-cxx/src/{module}.rs #[cxx::bridge] FFI declarations
Safe API cudf/src/{module}.rs Idiomatic Rust wrappers with full safety

Supported Operations

Core

Module Description
column GPU-resident column with typed data and optional null bitmask
table Ordered collection of columns (DataFrame equivalent)
scalar GPU-resident single typed value with validity flag
types TypeId and DataType mirroring libcudf's type system

Compute

Module Description
sorting Sort, rank, is-sorted checks
groupby Builder-pattern groupby with multi-column aggregation
reduction Reduce a column to a scalar (sum, min, max, ...)
quantiles Quantile and percentile computation
rolling Fixed-size rolling window aggregation
binaryop Element-wise binary ops (arithmetic, comparison, logic)
unary Element-wise unary ops (math, casts, null/NaN checks)
round Numeric rounding
transform NaN-to-null conversion, boolean mask generation
search Binary search and containment checks
hashing Row-wise hashing (Murmur3, MD5, SHA-256, ...)
datetime Extract year, month, day, hour, ... from timestamps

Data Manipulation

Module Description
copying Gather, scatter, slice, split, conditional copy
filling Fill, repeat, sequence generation
concatenate Vertical stacking of columns and tables
merge Merge pre-sorted tables
join Inner, left, full outer, semi, anti, cross joins
stream_compaction Drop nulls, boolean mask, unique, distinct
replace Replace nulls, NaNs, clamp values
reshape Interleave and tile
transpose Swap rows and columns
partitioning Hash and round-robin partitioning
lists Explode, sort, contains on list columns
structs Extract child columns from struct columns
dictionary Dictionary encoding/decoding
json JSONPath queries on string columns

I/O

Format Read Write Compression
Parquet ParquetReader ParquetWriter Snappy, Gzip, Zstd, LZ4
CSV CsvReader CsvWriter --
JSON JsonReader JsonWriter --
ORC OrcReader OrcWriter Snappy, Zlib, Zstd
Avro AvroReader -- --

String Operations (strings::)

Case, find, contains, replace, split, strip, slice, combine, convert, extract, findall, like, padding, partition, repeat, reverse, split_re, attributes, char_types, translate, wrap -- 21 submodules total.

Interop

Arrow C Data Interface, Arrow IPC, DLPack tensor exchange, pack/unpack/contiguous_split.

Limitations / Notes

  • GroupBy maintain_order: Approximated by key-column sort, not true input-order preservation.
  • Std/Var ddof: Default standalone reduction uses ddof=1. Full ddof support via reduce_var_with_ddof / reduce_std_with_ddof.
  • Polars version: cudf-polars is compatible with Polars 0.53.0.
  • Unsupported types: Date, Datetime, Duration, Categorical, List, Struct are not yet mapped (returns explicit error).
  • Unsupported expressions: Window functions (.over()), IsIn, expression-level Sort/Filter/Slice.
  • Unsupported IR nodes: Cache, MapFunction, ExtContext, Sink.
  • Multi-file Parquet: Only reads the first file in multi-file scans.

Testing

A GPU with CUDA is required to run tests.

# Unit tests (cudf crate — 55 tests)
cargo test -p cudf --features gpu-tests

# End-to-end tests (cudf-polars — 56 tests + 1 doctest)
cargo test -p cudf-polars --features gpu-tests

# Python polars-gpu integration tests (81 tests)
python tests/polars_gpu_integration.py

Total: 193 tests covering type roundtrips, filter, select, sort, groupby, join, slice, binary ops, distinct, with_columns, concat, nulls, edge cases, and error paths.

Contributing

See CONTRIBUTING.md for guidelines on adding new bindings, code conventions, and testing.

License

Licensed under either of

at your option.

About

Unofficial Rust bindings for NVIDIA libcudf — GPU-accelerated DataFrame operations

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors