Skip to content

a-agmon/rs-parquet-gql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rs-parquet-gql

This repo is a Rust implementation of my post on Towards Data Science: Data Access API over Data Lake Tables Without the Complexity.
You can find the post here

In short, it is a GraphQL query service that serves GQL query requests over parquet files in a data lake table. It is implemented using Axum and Apache Arrow Data fusion. This is the intro from the post:

"...providing thin clients the ability to query data lake files fast usually comes at the price of adding more moving parts and processes to our pipeline, in order to either copy and ingest data to more expensive customer-facing warehouses or aggregate and transform it to fit low-latency databases. The purpose of this post is to explore and demonstrate a different and simpler approach to tackle this requirement using lighter in-process query engines. Specifically, I show how we can use in-process engines, such as DuckDB and Arrow Data Fusion, in order to create services that can both handle data lake files and volumes and act as a fast memory store that serves low-latency API calls. Using this approach, we can efficiently consolidate the required functionality into a single query service, which can be horizontally scaled, that will load data, aggregate and store it in memory, and serve API calls efficiently and fast.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages