DataFusion: SQL Query Execution in Rust
DataFusion is an attempt at building a modern distributed compute platform in Rust, using Apache Arrow as the memory model.
See my article How To Build a Modern Distributed Compute Platform to learn about the design and my motivation for building this. The TL;DR is that this project is a great way to learn about building distributed systems but there are plenty of better choices if you need something mature and supported.
The following features are currently supported:
- SQL Parser, Planner and Optimizer
- DataFrame API
- Columnar processing using Apache Arrow
- Support for local CSV and Apache Parquet files
- Single-threaded execution of SQL queries, supporting:
- Scalar Functions
- Aggregates (Min, Max, Count)
- User-defined Scalar Functions (UDFs)
DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.
A Docker image is also available if you just want to run SQL queries against your CSV and Parquet files.
Project Home Page
- Rust nightly (required by
There is a Gitter channel where you can ask questions about the project or make feature suggestions too.
Contributors are welcome! Please see CONTRIBUTING.md for details.