Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues of Data Lake with Iceberg Support #12272

Open
6 tasks
Xuanwo opened this issue Jul 31, 2023 · 2 comments
Open
6 tasks

Tracking issues of Data Lake with Iceberg Support #12272

Xuanwo opened this issue Jul 31, 2023 · 2 comments

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Jul 31, 2023

After the close of #11947, Databend has completed all preparation work required for implementing data lake support!

Databend now has multi-catalog support!

We can create a new catalog like:

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://testbucket/iceberg_ctl/'
    AWS_KEY_ID='minioadmin'
    AWS_SECRET_KEY='minioadmin'
    ENDPOINT_URL='${STORAGE_S3_ENDPOINT_URL}'
);

And we can show/drop them:

SHOW DATABASES IN iceberg_ctl;
SHOW TABLES IN iceberg_ctl.iceberg_db;
DROP CATALOG IF EXISTS iceberg_ctl

Databend now can read existing iceberg!

We can query data in an exisint iceberg table like the following:

SELECT count(*) FROM iceberg_ctl.iceberg_db.iceberg_tbl;

We have found a way to add data features in Databend. I have some ideas that we can start working on:

Tasks

Our current goal is to make reading from iceberg table fast and reliable.

  • Implement partiation for iceberg table
  • Implement push_down for iceberg table
  • Implement iceberg rest catalog support
  • Work with iceberg community to build iceberg-rust

Future

  • Implement write operation for iceberg table (users can ingest data in iceberg directly!)
  • Implement optimize operation for iceberg table (users can use databend cloud as a serverless table optimizer!)
@chrisfw
Copy link

chrisfw commented Nov 27, 2023

Hi @Xuanwo , this is an exciting feature! I was wondering though, if the initial implementation supports iceberg's temporal/as-of queries?

Regards,
Chris Whelan

@atifiu
Copy link

atifiu commented Apr 16, 2024

Currently, databend support querying Iceberg tables with partition on timestamp column with day/month/year transformation or does task "Implement partiation for iceberg table" means the same ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants