Skip to content

Conversation

@Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Nov 13, 2025

Which issue does this PR close?

What changes are included in this PR?

Add RFC for iceberg-kernel

Are these changes tested?

Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has checked 374 files.

Valid Invalid Ignored Fixed
306 1 67 0
Click to see the invalid file list
  • docs/rfcs/0001_kernel.md
Use this command to fix any missing license headers
```bash

docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix

</details>

Signed-off-by: Xuanwo <github@xuanwo.io>
@ZENOTME
Copy link
Contributor

ZENOTME commented Nov 13, 2025

I'm thinking is reasonable to split out spec part as a single minimum module? It just provide the memory representation of iceberg spec and de/serialize way. It can be evolve quickly and as a minimum module reuse by external user or inner module.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 14, 2025

I'm thinking is reasonable to split out spec part as a single minimum module? It just provide the memory representation of iceberg spec and de/serialize way. It can be evolve quickly and as a minimum module reuse by external user or inner module.

I think that's an interesting idea. It’s fine to just expose a spec crate, but how useful would it be? For reading snapshots, manifest lists, and manifests, users still need a FileIO. Is there a use case that users just want to ser/de a manifest file?

Copy link

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a great plan personally -- though I am not a direct user of the crate so I don't have a whole lot of gravitas.

I will also try and hype this PR up some more to get some more commetns


## Background

Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by delta-kernel-rs. Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by delta-kernel-rs. Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack.
Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by [delta-kernel-rs](https://github.com/delta-io/delta-kernel-rs). Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack.

async fn read(path: &str) -> Result<Bytes>;
async fn reader(path: &str) -> Result<FileReader>;
async fn write(path: &str, bs: Bytes) -> Result<FileMetadata>;
async fn writer(path: &str) -> Result<FileWriter>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this returns a writer already, what is the benefit to also allowing a direct write? is it just convenience?

```

- The kernel only defines the trait and error types.
- `iceberg-fileio-opendal` (new crate) ships an opendal-based implementation; other backends can publish their own crates.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me

F: Future<Output = T> + Send + 'static,
T: Send + 'static;

fn spawn_blocking<F, T>(&self, f: F) -> Self::JoinHandle<T>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when woudl spawn_blocking be called? Avoiding blocking IO might be a nice design for the kernel -- and since all the IO traits are async I am not sure what the kernel would be calling that was blocking 🤔

#### Scan / Planner

- The kernel produces pure `TableScanPlan` descriptions (manifests, data-files, predicates, task graph).
- Engines provide executors (e.g., `ArrowExecutor`) that transform plans into record batches or other runtime-specific artifacts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kevinjqliu
Copy link
Contributor

btw #1857 should resolve the CI issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants