Skip to content

Conversation

@zcoo
Copy link
Contributor

@zcoo zcoo commented Apr 7, 2025

Purpose

Linked issue: close #712

Brief change log

Drop table's all partitions when coordinator listens a partitioned table dropped.

Tests

API and Format

Documentation

@wuchong
Copy link
Member

wuchong commented Apr 12, 2025

I think there are issues with the current fix.

When deleting tables/t1, its child partitions, such as tables/t1/partitions/p1, should also be deleted automatically. There should be no need to explicitly and additionally delete the partition directories.

In my opinion, the root cause here is that after ZooKeeper deletes tables/t1, the AutoPartitionManager is triggered and recreates the partitions (with the old table ID). This results in an exception when attempting to recreate the table.

The solution is straightforward: before dropping the table, the AutoPartitionManager should unregister the table id first.

With the current fix, which deletes partitions in the processDropTableEvent, may lead to the deletion of partitions belonging to the new table, potentially causing data loss.

cc @luoyuxia

@zcoo
Copy link
Contributor Author

zcoo commented Apr 14, 2025

@wuchong I think your solution may be inconsistent with current code implement.

Here is the original code logic:

Step 1. Create Table 1 (only created on zk node without adding to AutoPartitionManager)
Step 2. Delete Table 1 (only dropped on zk node)
Step 3. zk table creation is being monitored and the table creation request is added to the CoordinatorEventProcessor request queue
Step 4. The zk deletion table is being monitored and the deletion request is added to the CoordinatorEventProcessor request queue
Step 5. CoordinatorEventProcessor queue executes table creation request, AutoPartitionManager adds the table zk, creates table partition nodes
Step 6. CoordinatEventProcessor queue executes droping a table, but only unregister for table 1 from AutoPartitionManager without deleting the table partition node, resulting in the retention of the zk table partition node

"The solution is straightforward: before dropping the table, the AutoPartitionManager should unregister the table id first."
So this solution doesn't work because before Step 2, the Table 1 haven't been register from AutoPartitionManager.

@wuchong
Copy link
Member

wuchong commented Apr 14, 2025

Step 6. CoordinatEventProcessor queue executes droping a table, but only unregister for table 1 from AutoPartitionManager without deleting the table partition node, resulting in the retention of the zk table partition node

The partition node should has already been deleted in Step2, because we use deletingChildrenIfNeeded() for deleting the table node, see the code:
https://github.com/alibaba/fluss/blob/main/fluss-server/src/main/java/com/alibaba/fluss/server/zk/ZooKeeperClient.java#L362

So there is no need for AutoPartitionManager to delete any partition nodes. If there is any dirty partition node exist, that's caused by unregitstering AutoPartitionManager for the table too late.

So this solution doesn't work because before Step 2, the Table 1 haven't been register from AutoPartitionManager.

Yes, I agree. So we need to register it into AutoPartitionManager in Step1 and unregister it in Step2. Besides, we need to remove the registering and unregsitering in CoordinatorEventProcessor.

cc @luoyuxia

@luoyuxia
Copy link
Contributor

+1 for register into AutoPartitionManager in CoordinatorService#createTable. No need to wait to get CreateTableEvent to put to AutoPartitionManager.

@wuchong
Copy link
Member

wuchong commented Apr 19, 2025

@zcoo @luoyuxia I reviewed the new fix and the failed cases. It seems the fix is not such easy as expected. We also need to handle the databse drop: read all the table_id of the tables in the database, which is very heavy operation. Besides, the partition node may still created under a re-created table node because a table-drop-recreate may happen in the middle of auto partition scheduling. It seems we have to tolerate the inconsistent case.

Thus, I prefer a simple fix for this problem (see the PR #766), and fix the rare inconsistent case in a long-term issue #767.

@wuchong
Copy link
Member

wuchong commented Apr 25, 2025

fixed by #766

@wuchong wuchong closed this Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The table id in table zknode is not same with the partition id in the children node of the table in zk

3 participants