Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fdw: Add foreign data wrapper for parquet files (read-only) #15721

Open
1 task
mfussenegger opened this issue Mar 19, 2024 · 0 comments
Open
1 task

fdw: Add foreign data wrapper for parquet files (read-only) #15721

mfussenegger opened this issue Mar 19, 2024 · 0 comments
Labels
complexity: 5-8 feature: cold store feature: fdw Foreign data wrapper needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize

Comments

@mfussenegger
Copy link
Member

mfussenegger commented Mar 19, 2024

Problem Statement

Keeping all data forever in CrateDB can get expensive.
Deleting the data isn't an option, because it might still be needed, but it is not queried often.

Having an option to use cheaper storage option at the expense of query performance would be nice.

Possible Solutions

Make it possible to query Parquet files hosted on S3 via a foreign data wrapper/foreign table.

This is similar to: #15718

Downsides:

Advantages:

  • Stable file format
  • File layout is optimized to fetch subset of columns. Via S3 Range requests it's possible to utilize this.

Considered Alternatives

Technical constraints

Open questions

  • Distributed execution/reads

Initial Scope (Estimate is only for this part)

  • Simple but slow version; No caching of downloaded data; Always reads from remote as neeeded; No attempts at operation push-down or partial result retrieval

Follow up (for later dedicated issues, not included in the first implementation)

  • Download only required data to minimize traffic; And maybe add cache for the remote data to avoid repeated downloads
@mfussenegger mfussenegger added complexity: no estimate needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize feature: fdw Foreign data wrapper feature: cold store labels Mar 19, 2024
@mfussenegger mfussenegger changed the title fdw: Add foreign data wrapper for parquet files fdw: Add foreign data wrapper for parquet files (read-only) Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity: 5-8 feature: cold store feature: fdw Foreign data wrapper needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize
Projects
Status: Candidates
Development

No branches or pull requests

2 participants