any plan for Iceberg Table on S3? #1468

Lindayangyy · 2020-09-16T19:00:19Z

New to Apache Iceberg, We are looking for Iceberg Table or warehouse (catalog) implementation upon S3, if without any reference to Hive and HDFS (hadoop) is possible? The current implementation seems tightly coupled with Hive and hadoop.

RussellSpitzer · 2020-09-16T19:05:09Z

You can use it with S3 with Hadoop client libraries only, you don't actually need a Hadoop cluster or HDFS.

HeartSaVioR · 2020-09-16T22:28:06Z

Supporting S3 requires Hive, because of S3's characteristic, eventual consistency. I see OSP version of Delta Lake solved it in different way, but pretty much limited. (It assumes concurrent writes for S3 only happen in "a" Spark driver. https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/storage/S3SingleDriverLogStore.scala)

aokolnychyi · 2020-09-16T22:47:10Z

Iceberg works reliably with s3 even if the same table is accessed via multiple clusters and query engines. Using Iceberg requires a catalog that can swap a pointer to the metadata file atomically. This can be done using a compare and swap or lock/unlock API. Iceberg contains a built-in implementation that uses Hive metastore to work with s3 reliably (lock/unlock). Anyone could easily build an integration for any catalog. For example, one may have a Cassandra-based catalog and use compare and swap to commit new table versions. That will be enough to work with s3 reliably.

jacques-n · 2020-09-16T22:51:01Z

We've been working on a non-Hive way to provide this functionality and plan on contributing it to the project within the next two weeks.

Lindayangyy · 2020-09-16T22:53:47Z

That will be awesome, can't wait to see it. Thank you - jacques-n!

Lindayangyy · 2020-09-16T22:54:47Z

Thanks for all the responses as alternatives. All answers are great!

HeartSaVioR · 2020-09-16T22:58:12Z

That sounds great! Assuming it still needs to do CAS with external storage (I'd be really curious if it doesn't rely on the external storage) which is that? Is it one of AWS services? If then even better, as there's no external dependency outside of AWS. Given we assume to use S3, which is already locked-in.

jacques-n · 2020-09-17T00:02:11Z

We're doing something pluggable but the default implementation is on top of DynamoDB.

ismailsimsek · 2020-09-20T20:19:18Z

is it possible to write JDBC based catalog? that could unlock many catalog option

kbendick · 2020-09-30T04:21:19Z

We're doing something pluggable but the default implementation is on top of DynamoDB.

That's a good idea. I know that AWS Glue is backed by DynamoDB, so if you can make a catalog using Dynamo, then possibly the AWS team can implement the atomic swap in Glue. If I'm not mistaken, you'd need to use either read / write consistency or possibly a DynamoDB versioned object.

Looking forward to seeing the DynamoDB catalog as I assume many companies looking to write to S3 are also likely using DynamoDB. I know that my company uses DynamoDB a ton so this would be a great work around until there is Glue Catalog support (which I've been giving some thought to myself).

jackye1995 · 2020-10-01T02:35:10Z

Hi @jacques-n this is Jack from AWS. We are planning to introduce a new iceberg-aws module, and we do have plan to offer a Glue + DynamoDB implementation for Catalog and TableOperations. Since you say you already have something working, let's have a sync after you have a PR and see what is the best way to have this shipped all together 😃

jacques-n · 2020-10-01T21:34:21Z

Hey guys, we just posted more information on the new stuff we've been building for Iceberg + DynamoDB. You can check it out here: https://projectnessie.org/

We'll have a PR up against Iceberg shortly to contribute the Iceberg integrations:
https://github.com/projectnessie/nessie/tree/main/clients/iceberg

RussellSpitzer · 2020-10-01T21:38:04Z

Very cool!

…

On Thu, Oct 1, 2020 at 4:34 PM Jacques Nadeau ***@***.***> wrote: Hey guys, we just posted more information on the new stuff we've been building for Iceberg + DynamoDB. You can check it out here: https://projectnessie.org/ We'll have a PR up against Iceberg shortly to contribute the Iceberg integrations: https://github.com/projectnessie/nessie/tree/main/clients/iceberg — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1468 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADE2YKA6G5T55NR2OUSRVLSITYWXANCNFSM4RPIUBNQ> .

jackye1995 · 2020-10-13T18:41:00Z

I just sent out a PR for AWS Glue support. With this update you can use HiveCatalog without the need to set up any Hive infrastructure and build your data lake on top of S3. #1608

jackye1995 · 2021-06-21T16:34:29Z

For anyone new to this issue, I think we have summarized all information in https://iceberg.apache.org/aws/, and we can close this issue. @Lindayangyy

github-actions · 2024-02-25T00:12:42Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-03-11T00:11:41Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

shengkui mentioned this issue Sep 10, 2021

java.io.IOException: Mkdirs failed to create file:/user/hive/warehouse/bench/metadata #3079

Closed

github-actions bot added the stale label Feb 25, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any plan for Iceberg Table on S3? #1468

any plan for Iceberg Table on S3? #1468

Lindayangyy commented Sep 16, 2020

RussellSpitzer commented Sep 16, 2020

HeartSaVioR commented Sep 16, 2020

aokolnychyi commented Sep 16, 2020

jacques-n commented Sep 16, 2020

Lindayangyy commented Sep 16, 2020

Lindayangyy commented Sep 16, 2020

HeartSaVioR commented Sep 16, 2020

jacques-n commented Sep 17, 2020

ismailsimsek commented Sep 20, 2020

kbendick commented Sep 30, 2020

jackye1995 commented Oct 1, 2020

jacques-n commented Oct 1, 2020

RussellSpitzer commented Oct 1, 2020 via email

jackye1995 commented Oct 13, 2020

jackye1995 commented Jun 21, 2021

github-actions bot commented Feb 25, 2024

github-actions bot commented Mar 11, 2024

any plan for Iceberg Table on S3? #1468

any plan for Iceberg Table on S3? #1468

Comments

Lindayangyy commented Sep 16, 2020

RussellSpitzer commented Sep 16, 2020

HeartSaVioR commented Sep 16, 2020

aokolnychyi commented Sep 16, 2020

jacques-n commented Sep 16, 2020

Lindayangyy commented Sep 16, 2020

Lindayangyy commented Sep 16, 2020

HeartSaVioR commented Sep 16, 2020

jacques-n commented Sep 17, 2020

ismailsimsek commented Sep 20, 2020

kbendick commented Sep 30, 2020

jackye1995 commented Oct 1, 2020

jacques-n commented Oct 1, 2020

RussellSpitzer commented Oct 1, 2020 via email

jackye1995 commented Oct 13, 2020

jackye1995 commented Jun 21, 2021

github-actions bot commented Feb 25, 2024

github-actions bot commented Mar 11, 2024