diff --git a/README.md b/README.md index b1523e35..7f45268d 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,15 @@ Apache Paimon Rust is an exciting project currently under active development. Wh - Start discussion thread at [dev mailing list](mailto:dev@paimon.apache.org) ([subscribe]() / [unsubscribe]() / [archives](https://lists.apache.org/list.html?dev@paimon.apache.org)) - Talk to community directly at [Slack #paimon channel](https://join.slack.com/t/the-asf/shared_invite/zt-2l9rns8pz-H8PE2Xnz6KraVd2Ap40z4g). +## Documentation + +The project documentation is built with [MkDocs](https://www.mkdocs.org/). See [docs/README.md](docs/README.md) for details. + +```bash +pip3 install mkdocs-material +cd docs && mkdocs serve +``` + ## Getting help Submit [issues](https://github.com/apache/paimon-rust/issues/new/choose) for bug report or asking questions in [discussion](https://github.com/apache/paimon-rust/discussions/new?category=q-a). diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..1e7d4cf3 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,55 @@ + + +# Documentation + +This directory contains the source files for the Apache Paimon Rust documentation site, built with [MkDocs](https://www.mkdocs.org/) and the [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) theme. + +## Prerequisites + +- Python 3.8+ +- pip3 + +## Setup + +```bash +pip3 install mkdocs-material +``` + +## Development + +Preview the docs locally with live reload: + +```bash +cd docs +mkdocs serve +``` + +Then open [http://127.0.0.1:8000](http://127.0.0.1:8000) in your browser. + +## Build + +Generate the static site: + +```bash +cd docs +mkdocs build +``` + +The output will be in the `docs/site/` directory. diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml new file mode 100644 index 00000000..4e59001e --- /dev/null +++ b/docs/mkdocs.yml @@ -0,0 +1,65 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +site_name: Apache Paimon Rust +site_description: The Rust implementation of Apache Paimon +site_url: https://apache.github.io/paimon-rust/ +repo_url: https://github.com/apache/paimon-rust +repo_name: apache/paimon-rust + +docs_dir: src + +theme: + name: material + palette: + - scheme: default + primary: indigo + accent: indigo + toggle: + icon: material/brightness-7 + name: Switch to dark mode + - scheme: slate + primary: indigo + accent: indigo + toggle: + icon: material/brightness-4 + name: Switch to light mode + features: + - navigation.sections + - navigation.expand + - navigation.top + - search.suggest + - content.code.copy + +nav: + - Home: index.md + - Getting Started: getting-started.md + - Architecture: architecture.md + - Releases: releases.md + - Contributing: contributing.md + +markdown_extensions: + - admonition + - pymdownx.details + - pymdownx.superfences + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.inlinehilite + - pymdownx.tabbed: + alternate_style: true + - toc: + permalink: true diff --git a/docs/src/architecture.md b/docs/src/architecture.md new file mode 100644 index 00000000..f12950e8 --- /dev/null +++ b/docs/src/architecture.md @@ -0,0 +1,60 @@ + + +# Architecture + +## Overview + +Apache Paimon Rust is organized as a Cargo workspace with multiple crates, each responsible for a distinct layer of functionality. + +## Crate Structure + +### `crates/paimon` — Core Library + +The core crate implements the Paimon table format, including: + +- **Catalog** — Catalog client for discovering and managing databases and tables +- **Table** — Table abstraction for reading Paimon tables +- **Snapshot & Manifest** — Reading snapshot and manifest metadata +- **Schema** — Table schema management and evolution +- **File IO** — Abstraction layer for storage backends (local filesystem, S3) +- **File Format** — Parquet file reading and writing via Apache Arrow + +### `crates/integrations/datafusion` — DataFusion Integration + +Provides a `TableProvider` implementation that allows querying Paimon tables using [Apache DataFusion](https://datafusion.apache.org/)'s SQL engine. + +## Data Model + +Paimon organizes data in a layered structure: + +``` +Catalog + └── Database + └── Table + ├── Schema + └── Snapshot + └── Manifest + └── Data Files (Parquet) +``` + +- **Catalog** manages databases and tables, accessed via REST API +- **Snapshot** represents a consistent view of a table at a point in time +- **Manifest** lists the data files that belong to a snapshot +- **Data Files** store the actual data in Parquet format diff --git a/docs/src/contributing.md b/docs/src/contributing.md new file mode 100644 index 00000000..6e5c0bd9 --- /dev/null +++ b/docs/src/contributing.md @@ -0,0 +1,62 @@ + + +# Contributing + +Apache Paimon Rust welcomes contributions from everyone. See the full [Contributing Guide](https://github.com/apache/paimon-rust/blob/main/CONTRIBUTING.md) for detailed instructions. + +## Quick Start + +1. Fork the [repository](https://github.com/apache/paimon-rust) +2. Clone your fork: `git clone https://github.com//paimon-rust.git` +3. Create a feature branch: `git checkout -b feature/my-feature` +4. Make your changes and add tests +5. Run checks locally before submitting +6. Open a Pull Request + +## Development Setup + +```bash +# Ensure you have the correct Rust toolchain +rustup show + +# Build the project +cargo build + +# Run all tests +cargo test + +# Format code +cargo fmt + +# Lint (matches CI) +cargo clippy --all-targets --workspace -- -D warnings +``` + +## Finding Issues + +- Check [open issues](https://github.com/apache/paimon-rust/issues) for tasks to work on +- Issues labeled `good first issue` are great starting points +- See the [0.1.0 tracking issue](https://github.com/apache/paimon-rust/issues/3) for the current roadmap + +## Community + +- **GitHub Issues**: [apache/paimon-rust/issues](https://github.com/apache/paimon-rust/issues) +- **Mailing List**: [dev@paimon.apache.org](mailto:dev@paimon.apache.org) ([subscribe](mailto:dev-subscribe@paimon.apache.org) / [archives](https://lists.apache.org/list.html?dev@paimon.apache.org)) +- **Slack**: [#paimon channel](https://join.slack.com/t/the-asf/shared_invite/zt-2l9rns8pz-H8PE2Xnz6KraVd2Ap40z4g) on the ASF Slack diff --git a/docs/src/getting-started.md b/docs/src/getting-started.md new file mode 100644 index 00000000..c1dc287c --- /dev/null +++ b/docs/src/getting-started.md @@ -0,0 +1,180 @@ + + +# Getting Started + +## Installation + +Add `paimon` to your `Cargo.toml`: + +```toml +[dependencies] +paimon = "0.0.0" +tokio = { version = "1", features = ["full"] } +``` + +By default, the `storage-fs` (local filesystem) and `storage-memory` (in-memory) backends are enabled. To use additional storage backends, enable the corresponding feature flags: + +```toml +[dependencies] +paimon = { version = "0.0.0", features = ["storage-s3"] } +``` + +Available storage features: + +| Feature | Backend | +|------------------|------------------| +| `storage-fs` | Local filesystem | +| `storage-memory` | In-memory | +| `storage-s3` | Amazon S3 | +| `storage-oss` | Alibaba Cloud OSS| +| `storage-all` | All of the above | + +## Catalog Management + +`FileSystemCatalog` manages databases and tables stored on a local (or remote) filesystem. + +### Create a Catalog + +```rust +use paimon::FileSystemCatalog; + +let catalog = FileSystemCatalog::new("/tmp/paimon-warehouse")?; +``` + +### Manage Databases + +```rust +use paimon::Catalog; // import the trait +use std::collections::HashMap; + +// Create a database +catalog.create_database("my_db", false, HashMap::new()).await?; + +// List databases +let databases = catalog.list_databases().await?; + +// Drop a database (cascade = true to drop all tables inside) +catalog.drop_database("my_db", false, true).await?; +``` + +### Manage Tables + +```rust +use paimon::catalog::Identifier; +use paimon::spec::{DataType, IntType, VarCharType, Schema}; + +// Define a schema +let schema = Schema::builder() + .column("id", DataType::Int(IntType::new())) + .column("name", DataType::VarChar(VarCharType::string_type())) + .build()?; + +// Create a table +let identifier = Identifier::new("my_db", "my_table"); +catalog.create_table(&identifier, schema, false).await?; + +// List tables in a database +let tables = catalog.list_tables("my_db").await?; + +// Get a table handle +let table = catalog.get_table(&identifier).await?; +``` + +## Reading a Table + +Paimon Rust uses a scan-then-read pattern: first scan the table to produce splits, then read data from those splits as Arrow `RecordBatch` streams. + +```rust +use futures::StreamExt; + +// Get a table from the catalog +let table = catalog.get_table(&Identifier::new("my_db", "my_table")).await?; + +// Create a read builder +let read_builder = table.new_read_builder(); + +// Step 1: Scan — produces a Plan containing DataSplits +let plan = { + let scan = read_builder.new_scan(); + scan.plan().await? +}; + +// Step 2: Read — consumes splits and returns Arrow RecordBatches +let reader = read_builder.new_read()?; +let mut stream = reader.to_arrow(plan.splits())?; + +while let Some(batch) = stream.next().await { + let batch = batch?; + println!("RecordBatch: {batch:#?}"); +} +``` + +## DataFusion Integration + +Query Paimon tables using SQL with [Apache DataFusion](https://datafusion.apache.org/). Add the integration crate: + +```toml +[dependencies] +paimon = "0.0.0" +paimon-datafusion = "0.0.0" +datafusion = "52" +``` + +Register a Paimon table and run SQL queries: + +```rust +use std::sync::Arc; +use datafusion::prelude::SessionContext; +use paimon_datafusion::PaimonTableProvider; + +// Get a Paimon table from your catalog +let table = catalog.get_table(&identifier).await?; + +// Register as a DataFusion table +let provider = PaimonTableProvider::try_new(table)?; +let ctx = SessionContext::new(); +ctx.register_table("my_table", Arc::new(provider))?; + +// Query with SQL +let df = ctx.sql("SELECT * FROM my_table").await?; +df.show().await?; +``` + +> **Note:** The DataFusion integration currently supports full table scans only. Column projection and predicate pushdown are not yet implemented. + +## Building from Source + +```bash +git clone https://github.com/apache/paimon-rust.git +cd paimon-rust +cargo build +``` + +## Running Tests + +```bash +# Unit tests +cargo test + +# Integration tests (requires Docker) +make docker-up +cargo test -p integration_tests +make docker-down +``` diff --git a/docs/src/index.md b/docs/src/index.md new file mode 100644 index 00000000..b9c14723 --- /dev/null +++ b/docs/src/index.md @@ -0,0 +1,41 @@ + + +# Apache Paimon Rust + +The Rust implementation of [Apache Paimon](https://paimon.apache.org/) — a streaming data lake platform with high-speed data ingestion, changelog tracking, and efficient real-time analytics. + +## Overview + +Apache Paimon Rust provides native Rust libraries for reading and writing Paimon tables, enabling high-performance data lake access from the Rust ecosystem. + +Key features: + +- Native Rust reader for Paimon table format +- Support for local filesystem, S3, and OSS storage backends +- REST Catalog integration +- Apache DataFusion integration for SQL queries + +## Status + +The project is under active development, tracking the [0.1.0 milestone](https://github.com/apache/paimon-rust/issues/3). + +## License + +Apache Paimon Rust is licensed under the [Apache License 2.0](https://github.com/apache/paimon-rust/blob/main/LICENSE). diff --git a/docs/src/releases.md b/docs/src/releases.md new file mode 100644 index 00000000..11b8b390 --- /dev/null +++ b/docs/src/releases.md @@ -0,0 +1,44 @@ + + +# Releases + +## Release Policy + +Apache Paimon Rust follows [Semantic Versioning](https://semver.org/). All releases are published to [crates.io](https://crates.io/crates/paimon). + +## Upcoming + +### 0.1.0 (In Development) + +The first release of Apache Paimon Rust. Track progress at the [0.1.0 milestone](https://github.com/apache/paimon-rust/issues/3). + +Planned features: + +- Paimon table format reader +- Local filesystem, S3, and OSS storage backends +- REST Catalog client +- Apache DataFusion integration +- Partitioned table support +- C FFI bindings +- Go bindings + +## Past Releases + +No releases yet. Stay tuned!