A series of articles that explore working with data using DataFusion and Apache Arrow.
- Introduction to Apache Arrow with Rust
- Data Aggregation with DataFusion
- Data in Flight
- Extending DataFusion
In general, the first two articles provide high level overviews, while the second two article articles take a deeper look into the libraries.
This article gives an overview of Apache Arrow. The focus is introducing the library and exploring its primary use cases. Important features are explored at a high level, and code samples are used to provide examples.
This article provides a more in-depth look at DataFusion. The focus is on exploring how to use DataFusion to aggregate data by using either the DataFrame API or the SQL Parser API. Both methods are described in detail and illustrated with code samples.
This article describes how to use Arrow Flight to move data accross a network. The focus is on building an Arrow Flight server and client to move aggregated from server to client. Methods that are described in Data Aggregation with DataFusion will be used in this article. Arrow Flight will be discussed in detail while building out a small example project.
This article describes how DataFusion can be extended to allow for data sources that are not currently supported. The focus is on implementing and XML file reader that can be utilized with DataFusion and that reads data into the Apache Arrow memory model. In-depth features of both DataFusion and Apache Arrow will explored and described in detail while building the example project.