GitHub - a-agmon/rs-parquet-gql

rs-parquet-gql

This repo is a Rust implementation of my post on Towards Data Science: Data Access API over Data Lake Tables Without the Complexity.
You can find the post here

In short, it is a GraphQL query service that serves GQL query requests over parquet files in a data lake table. It is implemented using Axum and Apache Arrow Data fusion. This is the intro from the post:

"...providing thin clients the ability to query data lake files fast usually comes at the price of adding more moving parts and processes to our pipeline, in order to either copy and ingest data to more expensive customer-facing warehouses or aggregate and transform it to fit low-latency databases. The purpose of this post is to explore and demonstrate a different and simpler approach to tackle this requirement using lighter in-process query engines. Specifically, I show how we can use in-process engines, such as DuckDB and Arrow Data Fusion, in order to create services that can both handle data lake files and volumes and act as a fast memory store that serves low-latency API calls. Using this approach, we can efficiently consolidate the required functionality into a single query service, which can be horizontally scaled, that will load data, aggregate and store it in memory, and serve API calls efficiently and fast.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
config.toml		config.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rs-parquet-gql

About

Releases

Packages

Languages

a-agmon/rs-parquet-gql

Folders and files

Latest commit

History

Repository files navigation

rs-parquet-gql

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages