[Lake/Paimon] Create datalake enabled table should also create in lake #640

luoyuxia · 2025-03-20T07:16:54Z

Purpose

Linked issue: close #430

Brief change log

Introduce LakeStoragePluginSetUp that load the LakeStoragePlugin by datalake format
Introduce LakeCatalog to create table in lake
When create table with lake enabeld, create the table in lake via LakeCatalog

Tests

LakeEnabledTableCreateITCase

API and Format

Documentation

luoyuxia · 2025-03-21T07:53:32Z

fluss-dist/src/main/assemblies/plugins.xml

+            <useTransitiveDependencies>true</useTransitiveDependencies>
+            <useTransitiveFiltering>true</useTransitiveFiltering>
+            <includes>
+                <include>org.apache.flink:flink-shaded-hadoop-2-uber</include>


Paimon requires hadoop bundled, soe we include it in paimon plugin dir
https://paimon.apache.org/docs/master/flink/quick-start/

wuchong · 2025-03-22T09:41:37Z

Is it ready to review? @luoyuxia

luoyuxia · 2025-03-24T02:52:27Z

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

+        }
+
+        // set pk
+        if (tableDescriptor.getSchema().getPrimaryKey().isPresent()) {


At first, I introduce additional offset and timestamp coumns, that's to enabled Fluss to subscribe the data in lake from a given offset and timestamp via Fluss client. But now, I feel like we can remove these two additional offset and timestamp columns at least for now.

Now, we mainly focus on Flink read the historical data in paimon and real-time data in Fluss. The offset and timestamp columns is not used. Introduce these columns may bring unnecassary complexity in early stage

subscribe via offset and timestamp columns only works for log table and only works for paimon with bucket-num specified. But in paimon, it's recommend not to set bucket-num. So, offset and timestamp columns become useless in most cases.

Still, we keep the possibility to support to subscribe the data in lake from a given offset and timestamp in the future. We can then introduce a option to enabled this feature for lake table.

After discuss, still keep the ability to subscribe via offset/timestamp, so, let's introduce another column bucket to help us to subscribe via bucket + offset.

luoyuxia · 2025-03-24T03:52:55Z

Is it ready to review? @luoyuxia

Yes, now, it's ready to review.

luoyuxia · 2025-03-24T03:53:44Z

@wuchong @leonardBang Could you please help review?

leonardBang

Thanks @luoyuxia for the contribution, the PR looks generally to me, only left some minor comments

fluss-common/src/main/java/com/alibaba/fluss/lakehouse/lakestorage/LakeCatalog.java

leonardBang · 2025-04-15T03:59:04Z

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

+    public static final String OFFSET_COLUMN_NAME = "__offset";
+    public static final String TIMESTAMP_COLUMN_NAME = "__timestamp";
+    public static final String BUCKET_COLUMN_NAME = "__bucket";
+


For system metadata column, could we expose system metadata column configuration for users to avoid potential column conflict ?

Currently, these system metadata columns are used for fluss client to subcribe from a given timetime/offset. I hope it to be fixed now since most system's metadata column is fixed and fixed columns make it easy to understand.
I think we can make it congiurable in the future if we does found it help. It's a compatible change.

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

leonardBang · 2025-04-15T04:06:44Z

fluss-server/src/main/java/com/alibaba/fluss/server/coordinator/CoordinatorService.java

+                        String.format(
+                                "The table %s already exists in %s catalog, please "
+                                        + "first drop the table in %s catalog.",
+                                tablePath, dataLakeFormat, dataLakeFormat));


Both drop existed table and suggest a new table name makes sense in this case, the later should be better?

I suggest both of them in the message.

leonardBang · 2025-04-15T10:19:29Z

fluss-test-coverage/pom.xml

                                        <!-- end exclude for lakehouse-paimon -->
                                        <exclude>com.alibaba.fluss.lakehouse.cli.*</exclude>
                                        <exclude>com.alibaba.fluss.kafka.*</exclude>
+                                        <exclde>com.alibaba.fluss.lake.paimon.FlussDataTypeToPaimonDataType</exclde>


Adding a full types test in LakeEnabledTableCreateITCase is better?

I add full types of fluss, but we have still to keep it in here since Fluss doesn't support array, map, row type. So the maxinum line coverage can only reach 65%.

makes sense to me

leonardBang · 2025-04-17T12:55:56Z

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

+                    CoreOptions.ChangelogProducer.INPUT.toString());
+        } else {
+            // for log table, need to set bucket, offset and timestamp
+            schemaBuilder.column(BUCKET_COLUMN_NAME, DataTypes.INT());


I mean we need to check the original schema contains same system column name like __bucket or not, to avoid conflict with users' original columns.

wuchong

@luoyuxia the pull request looks good in general. I left some minor comments.

...-format-paimon/src/test/java/com/alibaba/fluss/lake/paimon/LakeEnabledTableCreateITCase.java

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

fluss-server/src/main/java/com/alibaba/fluss/server/coordinator/CoordinatorService.java

...-format-paimon/src/test/java/com/alibaba/fluss/lake/paimon/LakeEnabledTableCreateITCase.java

fluss-common/src/main/java/com/alibaba/fluss/lakehouse/lakestorage/LakeStoragePluginSetUp.java

wuchong · 2025-04-30T10:09:57Z

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

+                tableDescriptor.getSchema().getColumns()) {
+            String columnName = column.getName();
+            if (systemColumns.containsKey(columnName)) {
+                throw new InvalidTableException(


I created an issue #810 to avoid creating table with system columns even if the table is not enabled lake.

wuchong · 2025-04-30T10:12:02Z

Besides, could you rename the module fluss-lake-format-paimon to fluss-lake-paimon? Because we will introduce the tiering service in fluss-flink instead of fluss-lake/. So fluss-lake/ may only contain lake formats (e.g., fluss-lake-iceberg, fluss-lake-delta)

luoyuxia · 2025-05-07T04:52:19Z

@wuchong Comments addressed

…ke (apache#640)

luoyuxia force-pushed the create-lake-table branch 4 times, most recently from ad3e4a7 to b768922 Compare March 21, 2025 03:12

luoyuxia commented Mar 21, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch from b768922 to 74248a9 Compare March 21, 2025 08:09

luoyuxia commented Mar 24, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch from 74248a9 to 569d5e9 Compare March 24, 2025 03:41

luoyuxia marked this pull request as ready for review March 24, 2025 03:51

luoyuxia force-pushed the create-lake-table branch from 569d5e9 to 8e06422 Compare March 24, 2025 03:52

luoyuxia requested review from leonardBang and wuchong March 24, 2025 03:52

luoyuxia force-pushed the create-lake-table branch from 8e06422 to 85e28a2 Compare March 25, 2025 04:17

leonardBang approved these changes Apr 15, 2025

View reviewed changes

leonardBang reviewed Apr 15, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch 4 times, most recently from f5ef0f4 to eea9c6a Compare April 17, 2025 11:53

leonardBang reviewed Apr 17, 2025

View reviewed changes

wuchong mentioned this pull request Apr 30, 2025

Avoid create tables with system column names #810

Closed

2 tasks

wuchong reviewed Apr 30, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch 2 times, most recently from 8c14974 to baf72d7 Compare May 7, 2025 03:19

[Lake/Paimon] Create datalake enabled table should also create in lake

0389cf9

luoyuxia added 4 commits May 9, 2025 16:04

[Lake/Paimon] Create datalake enabled table should also create in lake

f322f5e

address comments

cdb6a7d

address comments 2

fa719c7

address jark's comments

d4887cb

luoyuxia force-pushed the create-lake-table branch from baf72d7 to d4887cb Compare May 9, 2025 08:05

add license

ab24114

wuchong approved these changes May 10, 2025

View reviewed changes

wuchong merged commit bd9e1c4 into apache:main May 10, 2025
4 checks passed

wuchong mentioned this pull request May 10, 2025

Tolerate lake table existent if the schema and properties matches when creating lake enabled table #846

Closed

2 tasks

gyang94 mentioned this pull request May 23, 2025

Automatically create corresponding Lake tables when 'table.datalake.enabled' enabled #431

Closed

2 tasks

ZmmBigdata pushed a commit to ZmmBigdata/fluss that referenced this pull request Jun 20, 2025

[lake][paimon] Create datalake enabled table should also create in la…

3746a7e

…ke (apache#640)

polyzos pushed a commit to polyzos/fluss that referenced this pull request Aug 30, 2025

[lake][paimon] Create datalake enabled table should also create in la…

c37daa4

…ke (apache#640)

polyzos pushed a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025

[lake][paimon] Create datalake enabled table should also create in la…

694a200

…ke (apache#640)

[Lake/Paimon] Create datalake enabled table should also create in lake #640

[Lake/Paimon] Create datalake enabled table should also create in lake #640

Uh oh!

Conversation

luoyuxia commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong commented Mar 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia commented Mar 24, 2025

Uh oh!

luoyuxia commented Mar 24, 2025

Uh oh!

leonardBang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong commented Apr 30, 2025

Uh oh!

luoyuxia commented May 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luoyuxia commented Mar 20, 2025 •

edited

Loading