Tracking issues of iceberg-rust v0.3.0 #348

Fokko · 2024-04-24T05:23:42Z

Iceberg-rust 0.3.0

The main objective of 0.3.0 is to have a working read path (non-exhaustive list :)

Blocking issues:

Nice to have (related to the query plan optimizations above):

Implement skipping based on sequence number skip DELETE manifests that contain unrelated delete files.

State of catalog integration:

For the release after that, I think the commit path is going to be important.

Iceberg-rust 0.4.0 and beyond

Nice to have for the 0.3.0 release, but not required. Of course, open for debate.

Support for Positional Deletes Entails matching the deletes to the datafiles based on the statistics.
Support for Equality Deletes Entails putting the delete files in the right order to apply them in the right sequence.

Commit path

The commit path entails writing a new metadata JSON.

Metadata tables

Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.

Write support

Most of the work in write support is around generating the correct Iceberg metadata. Some decisions can be made, for example first supporting only FastAppends, and only V2 metadata.

It is common to have multiple snapshots in a single commit to the catalog. For example, an overwrite operation of a partition can be a delete + append operation. This makes the implementation easier since you can separate the problems, and tackle them one by one. Also, for the roadmap it makes it easier since their operations can be developed in parallel.

Future topics

Python bindings
WASM to run Iceberg-rust in the browser

Contribute

If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍

The text was updated successfully, but these errors were encountered:

marvinlanhenke · 2024-04-24T11:20:57Z

@Fokko thanks for your effort here

Fokko · 2024-04-24T13:31:58Z

@marvinlanhenke No problem, thank you for all the work on the project. While compiling this I realized how much work has been done 🚀

sdd · 2024-04-24T21:06:15Z

Thanks for putting this together @Fokko! It's great to have this clarity on where we're heading. Let's go! 🙌

liurenjie1024 · 2024-04-25T01:55:13Z

Hi, @Fokko About the read projection part, currently we can convert parquet files into arrow streams, but there are some limitations: it only support primitive types, and schema evolution is not supported yet. Our discussion is in this issue: #244 And here is the first step of projection by @viirya : #245

liurenjie1024 · 2024-04-25T02:03:10Z

About the glue, hive, rest catalogs, I think we already have integrations:
https://github.com/apache/iceberg-rust/blob/2018ffc87625bdff939aac791784d8eabc4eda38/crates/catalog/glue/tests/glue_catalog_test.rs
https://github.com/apache/iceberg-rust/blob/ffd76eb41594416b366a17cdbc85112c68c01a17/crates/catalog/hms/tests/hms_catalog_test.rs
https://github.com/apache/iceberg-rust/blob/d6703df40b24477d0a5a36939746bb1b36cc6933/crates/catalog/rest/tests/rest_catalog_test.rs

liurenjie1024 · 2024-04-25T05:51:39Z

Also as we discussed in this doc, do you mind to add datafusion integration, python binding, wasm binding into futures?

Fokko · 2024-04-25T07:10:21Z

Hi, @Fokko About the read projection part, currently we can convert parquet files into arrow streams, but there are some limitations: it only support primitive types, and schema evolution is not supported yet. Our discussion is in this issue: #244 And here is the first step of projection by @viirya : #245

Thanks for the context, I've just added this to the list.

About the glue, hive, rest catalogs, I think we already have integrations:

Ah yes, I forgot to check those marks, thanks!

Also as we discussed in this doc, do you mind to add datafusion integration, python binding, wasm binding into futures?

Certainly! Great suggestions! I'm less familiar on some of these topics (like Datafusion), feel free to edit the post if you feel something is missing.

marvinlanhenke · 2024-04-25T07:38:12Z

Certainly! Great suggestions! I'm less familiar on some of these topics (like Datafusion), feel free to edit the post if you feel something is missing.

...for Datafusion I have provided a basic design proposal and implementation for some of the datafustion traits, like catalog & schema provider; Perhaps we can also move forward on this: #324

liurenjie1024 · 2024-04-25T08:01:15Z

Certainly! Great suggestions! I'm less familiar on some of these topics (like Datafusion), feel free to edit the post if you feel something is missing.

...for Datafusion I have provided a basic design proposal and implementation for some of the datafustion traits, like catalog & schema provider; Perhaps we can also move forward on this: #324

Yeah, I'll take a review later.

Fokko added this to the 0.3.0 Release milestone Apr 24, 2024

Fokko changed the title ~~Tracking issues of iceberg-rust v0.2.0~~ Tracking issues of iceberg-rust v0.3.0 Apr 24, 2024

Fokko pinned this issue Apr 24, 2024

ZENOTME mentioned this issue Apr 25, 2024

feat: support append data file and add e2e test #349

Open

This was referenced Apr 28, 2024

Implement ExpressionEvaluator #358

Open

Add runtime module to enable concurrent load of manifest files. #124

Open

Fokko mentioned this issue May 27, 2024

Infra: Track subtasks from Iceberg improvement proposal apache/iceberg#10183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issues of iceberg-rust v0.3.0 #348

Tracking issues of iceberg-rust v0.3.0 #348

Fokko commented Apr 24, 2024 •

edited

marvinlanhenke commented Apr 24, 2024

Fokko commented Apr 24, 2024

sdd commented Apr 24, 2024

liurenjie1024 commented Apr 25, 2024 •

edited

liurenjie1024 commented Apr 25, 2024

liurenjie1024 commented Apr 25, 2024

Fokko commented Apr 25, 2024

marvinlanhenke commented Apr 25, 2024

liurenjie1024 commented Apr 25, 2024

Tracking issues of iceberg-rust v0.3.0 #348

Tracking issues of iceberg-rust v0.3.0 #348

Comments

Fokko commented Apr 24, 2024 • edited

Iceberg-rust 0.3.0

Iceberg-rust 0.4.0 and beyond

Commit path

Metadata tables

Write support

Future topics

Contribute

marvinlanhenke commented Apr 24, 2024

Fokko commented Apr 24, 2024

sdd commented Apr 24, 2024

liurenjie1024 commented Apr 25, 2024 • edited

liurenjie1024 commented Apr 25, 2024

liurenjie1024 commented Apr 25, 2024

Fokko commented Apr 25, 2024

marvinlanhenke commented Apr 25, 2024

liurenjie1024 commented Apr 25, 2024

Fokko commented Apr 24, 2024 •

edited

liurenjie1024 commented Apr 25, 2024 •

edited