[doc] Update index documentation to not only realtime data lake#7519
Merged
JingsongLi merged 1 commit intoapache:masterfrom Mar 25, 2026
Merged
[doc] Update index documentation to not only realtime data lake#7519JingsongLi merged 1 commit intoapache:masterfrom
JingsongLi merged 1 commit intoapache:masterfrom
Conversation
Contributor
|
+1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Apache Paimon is a lake format for building Lakehouse Architecture for both streaming and batch
operations. Paimon provides large-scale data lake storage for analytics, realtime streaming updates
powered by LSM (Log-structured merge-tree) structure, and multimodal data management for AI workloads
— all in a single unified format.
Large-Scale Data Lake
Paimon is built for huge analytic datasets. A single table can contain tens of petabytes of data, and even
these huge tables can be read efficiently without a distributed SQL engine.
examine changes. Version rollback allows users to quickly correct problems by resetting tables to a good state.
File Index (BloomFilter, Bitmap, Range Bitmap) and aggregate push-down further accelerate queries.
Doris, working just like a SQL table.
Realtime Data Lake
Paimon's Primary Key Table brings realtime streaming updates into the lake architecture, powered by the LSM
(Log-structured merge-tree) structure.
Aggregation to aggregate values, or First Row to keep the earliest record — update records however you like.
for flexible read/write trade-offs.
engines, simplifying your streaming analytics.
Multimodal Data Lake
Paimon is a multimodal lakehouse for AI. Keep multimodal data, metadata, and embeddings in the same table and query
them via vector search, full-text search, or SQL.
new features (columns) as your application evolves, without copying existing data.
layout — blob data is stored in dedicated
.blobfiles while metadata stays in standard columnar files.nearest neighbor search.
including Ray, PyTorch, Pandas and PyArrow for data loading, training, and inference workflows.