-
Notifications
You must be signed in to change notification settings - Fork 347
docs: Add RFC for iceberg-kernel #1854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has checked 374 files.
| Valid | Invalid | Ignored | Fixed |
|---|---|---|---|
| 306 | 1 | 67 | 0 |
Click to see the invalid file list
- docs/rfcs/0001_kernel.md
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
Signed-off-by: Xuanwo <github@xuanwo.io>
|
I'm thinking is reasonable to split out |
I think that's an interesting idea. It’s fine to just expose a spec crate, but how useful would it be? For reading snapshots, manifest lists, and manifests, users still need a FileIO. Is there a use case that users just want to ser/de a manifest file? |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks like a great plan personally -- though I am not a direct user of the crate so I don't have a whole lot of gravitas.
I will also try and hype this PR up some more to get some more commetns
|
|
||
| ## Background | ||
|
|
||
| Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by delta-kernel-rs. Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by delta-kernel-rs. Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack. | |
| Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by [delta-kernel-rs](https://github.com/delta-io/delta-kernel-rs). Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack. |
| async fn read(path: &str) -> Result<Bytes>; | ||
| async fn reader(path: &str) -> Result<FileReader>; | ||
| async fn write(path: &str, bs: Bytes) -> Result<FileMetadata>; | ||
| async fn writer(path: &str) -> Result<FileWriter>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this returns a writer already, what is the benefit to also allowing a direct write? is it just convenience?
| ``` | ||
|
|
||
| - The kernel only defines the trait and error types. | ||
| - `iceberg-fileio-opendal` (new crate) ships an opendal-based implementation; other backends can publish their own crates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
| F: Future<Output = T> + Send + 'static, | ||
| T: Send + 'static; | ||
|
|
||
| fn spawn_blocking<F, T>(&self, f: F) -> Self::JoinHandle<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when woudl spawn_blocking be called? Avoiding blocking IO might be a nice design for the kernel -- and since all the IO traits are async I am not sure what the kernel would be calling that was blocking 🤔
| #### Scan / Planner | ||
|
|
||
| - The kernel produces pure `TableScanPlan` descriptions (manifests, data-files, predicates, task graph). | ||
| - Engines provide executors (e.g., `ArrowExecutor`) that transform plans into record batches or other runtime-specific artifacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
btw #1857 should resolve the CI issue |
Which issue does this PR close?
What changes are included in this PR?
Add RFC for iceberg-kernel
Are these changes tested?