diff --git a/docs/rfcs/0001_kernel.md b/docs/rfcs/0001_kernel.md new file mode 100644 index 0000000000..fd711b0a54 --- /dev/null +++ b/docs/rfcs/0001_kernel.md @@ -0,0 +1,192 @@ + + +# RFC: Extract `iceberg-kernel` for Pluggable Execution Layers + +## Background + +Issue #1819 proposes decoupling the protocol/metadata/plan logic that currently lives inside the `iceberg` crate so that it can serve as a reusable “kernel,” similar to the approach taken by [delta-kernel-rs](https://github.com/delta-io/delta-kernel-rs). Today the `iceberg` crate simultaneously exposes the public trait surface and the default engine (Tokio runtime, opendal-backed FileIO, Arrow readers, etc.). This tight coupling makes it difficult for downstream projects to embed Iceberg metadata while providing their own storage, runtime, or execution stack. + +## Goals and Scope + +- **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.). +- **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol. +- **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates. +- **Stable facade for existing users**: the top-level `iceberg` crate continues to expose the familiar API by re-exporting the kernel plus a default engine feature. + +Out of scope: changes to the Iceberg table specification or rewriting catalog adapters. + +## Architecture Overview + +### Workspace Layout + +``` +crates/ + kernel/ # new: pure protocols & planning logic + spec/ expr/ catalog/ table/ transaction/ scan/ runtime_api + io/traits.rs # FileIO traits (no opendal) + fileio/ + opendal/ # e.g. `iceberg-fileio-opendal` + fs/ # other FileIO implementations + runtime/ + tokio/ # e.g. `iceberg-runtime-tokio` + smol/ + iceberg/ # facade re-exporting kernel + default engine + catalog/* # depend on kernel (+ chosen FileIO/Runtime crates) + integrations/* # e.g. datafusion using facade or composing crates +``` + +### Trait Surfaces + +#### FileIO + +```rust +pub struct FileMetadata { + pub size: u64, + ... +} + +pub type FileReader = Box; + +#[async_trait::async_trait] +pub trait FileRead: Send + Sync + 'static { + async fn read(&self, range: Range) -> Result; +} + +pub type FileWriter = Box; + +#[async_trait::async_trait] +pub trait FileWrite: Send + Unpin + 'static { + async fn write(&mut self, bs: Bytes) -> Result<()>; + async fn close(&mut self) -> Result; +} + +pub type StorageFactory = fn(attrs: HashMap -> Result>); + +#[async_trait::async_trait] +pub trait Storage: Clone + Send + Sync { + async fn reader(&self, path: &str) -> Result; + async fn writer(&self, path: &str) -> Result; + async fn delete(&self, path: &str) -> Result<()>; + async fn exists(&self, path: &str) -> Result; + + ... +} + +pub struct FileIO { + registry: DashMap, +} + +impl FileIO { + fn register(scheme: &str, factory: StorageFactory); + + async fn read(path: &str) -> Result; + async fn reader(path: &str) -> Result; + async fn write(path: &str, bs: Bytes) -> Result; + async fn writer(path: &str) -> Result; + + async fn delete(&self, path: &str) -> Result<()>; + ... +} +``` + +- The kernel only defines the trait and error types. +- `iceberg-fileio-opendal` (new crate) ships an opendal-based implementation; other backends can publish their own crates. + +#### Runtime + +```rust +pub trait Runtime: Send + Sync + 'static { + type JoinHandle: Future + Send + 'static; + + fn spawn(&self, fut: F) -> Self::JoinHandle + where + F: Future + Send + 'static, + T: Send + 'static; + + fn sleep(&self, dur: Duration) -> Pin + Send>>; +} +``` + +- `TableScan` planning, metadata refresh, and `Transaction::commit` depend only on this trait. +- Crates such as `iceberg-runtime-tokio` provide concrete schedulers; consumers pick whichever runtime crate fits their stack. + +#### Catalog / Table / Transaction + +- The `Catalog` trait moves into the kernel and returns lightweight `TableHandle` objects (metadata + FileIO + Runtime). +- `TableHandle` no longer embeds Arrow helpers; Arrow-specific logic lives in engine crates. +- Transactions and their actions remain in the kernel, but rely on injected `Runtime` for retries/backoff. + +#### Scan / Planner + +- The kernel produces pure `TableScanPlan` descriptions (manifests, data-files, predicates, task graph). +- Engines provide executors (e.g., `ArrowExecutor`) that transform plans into record batches or other runtime-specific artifacts. + +### Facade Behavior + +- The top-level `iceberg` crate becomes a facade (`pub use iceberg_kernel::*`) that enables a *composition* of default crates (e.g. `iceberg-runtime-tokio`, `iceberg-fileio-opendal`, and a reference executor) behind feature flags. +- Existing convenience APIs (`Table::scan().to_arrow()`, `MemoryCatalog`, etc.) stay available but internally assemble the kernel with those default building blocks. + +## Migration Plan + +1. **Phase 1 – Create the kernel crate** + - Add `crates/kernel` and move `spec`, `expr`, `catalog`, `table`, `transaction`, `scan`, and supporting modules. + - Introduce temporary shim modules in the facade so existing imports keep working (mark them deprecated). + +2. **Phase 2 – Abstract runtime & IO** + - Define the `Runtime` and `FileIO` traits inside the kernel. + - Remove direct `tokio`/`opendal` dependencies from kernel modules. + - Introduce standalone crates (`iceberg-runtime-tokio`, `iceberg-fileio-opendal`, etc.) that implement the new traits. + +3. **Phase 3 – Detach Arrow/execution** + - Move `arrow` helpers and `ArrowReaderBuilder` into a reference executor crate (e.g. `iceberg-engine-arrow`). + - Update the DataFusion integration to depend on the facade or directly compose kernel + runtime + fileio + executor crates. + +4. **Phase 4 – Catalog & integration updates** + - Point catalog crates and other integrations to the kernel interfaces; depend on specific FileIO/Runtime crates only when required. + - Keep `iceberg-catalog-loader` kernel-only so users can inject their preferred combinations. + +5. **Phase 5 – Release & documentation** + - Finish the split within the 0.y.z series, provide an upgrade guide, and add kernel acceptance tests to guarantee trait stability. + +## Compatibility + +- Users who stick with the `iceberg` facade keep their existing API surface; the facade simply composes kernel + default runtime + default FileIO + reference executor under the hood. +- Advanced integrators can depend solely on `iceberg-kernel` and mix in whichever `FileIO`, `Runtime`, and executor crates they need (or author their own). +- CI keeps running the current integration tests and adds kernel-specific acceptance suites. + +## Risks and Mitigations + +| Risk | Description | Mitigation | +| ---- | ----------- | ---------- | +| Trait churn | Updating catalog/scan traits could break downstream crates | Maintain shim modules, use `#[deprecated]` windows, and document migration steps | +| Generic complexity | New traits may introduce complicated type bounds | Prefer `Arc` and `BoxFuture` to keep signatures manageable | +| Documentation gap | Users may not know which engine to pick | Publish new docs, diagrams, and “custom engine” tutorials alongside the split | + +## Open Questions + +1. Should the kernel expose any Arrow helpers, or should every Arrow-specific function live exclusively in engine crates? +2. Do we need a `ScanExecutor` trait inside the kernel for non-Arrow consumers? + +## Conclusion + +Extracting `iceberg-kernel` plus a pluggable engine layer lets the project: + +- Offer a lightweight, embeddable implementation of the Iceberg protocol, +- Enable external engines (DataFusion, Spark Connect, custom services) to reuse Iceberg metadata without inheriting specific runtime/storage dependencies,