Skip to content

branch-4.1: [Feature](Iceberg) Implement publish_changes procedure for Iceberg tables #58755#61358

Open
xylaaaaa wants to merge 1 commit intoapache:branch-4.1from
xylaaaaa:auto-pick-58755-branch-4.1
Open

branch-4.1: [Feature](Iceberg) Implement publish_changes procedure for Iceberg tables #58755#61358
xylaaaaa wants to merge 1 commit intoapache:branch-4.1from
xylaaaaa:auto-pick-58755-branch-4.1

Conversation

@xylaaaaa
Copy link
Contributor

Cherry-pick #58755 to branch-4.1

What problem does this PR solve?

Implements the publish_changes action for Iceberg tables (WAP pattern).

Cherry-pick commit

🤖 Generated with Claude Code

@xylaaaaa xylaaaaa requested a review from yiguolei as a code owner March 16, 2026 03:10
Copilot AI review requested due to automatic review settings March 16, 2026 03:10
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Iceberg publish_changes EXECUTE action in FE to support the Write-Audit-Publish (WAP) workflow, and adds regression coverage plus Docker preinstalled data to exercise the behavior in external Iceberg tests.

Changes:

  • Add FE action publish_changes that locates a snapshot by wap.id and cherry-picks it into the current table state.
  • Register publish_changes in the Iceberg execute-action factory.
  • Add external regression tests (and expected output) plus a Spark preinstalled SQL script to set up WAP/non-WAP Iceberg tables.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
regression-test/suites/external_table_p0/iceberg/action/test_iceberg_execute_actions.groovy Adds WAP publish_changes happy-path + negative regression cases
regression-test/data/external_table_p0/iceberg/action/test_iceberg_execute_actions.out Updates golden output for new WAP queries
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/action/IcebergPublishChangesAction.java New FE action implementation for publish_changes
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/action/IcebergExecuteActionFactory.java Registers publish_changes as a supported Iceberg action
docker/thirdparties/docker-compose/iceberg/scripts/create_preinstalled_scripts/iceberg/run23.sql Preinstalls WAP/non-WAP Iceberg tables & staged snapshot for regression testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import org.apache.doris.common.UserException;
import org.apache.doris.datasource.ExternalTable;
import org.apache.doris.datasource.iceberg.IcebergExternalTable;
import org.apache.doris.info.PartitionNamesInfo;
Comment on lines +551 to +640
// Test Case 6: publish_changes action with WAP (Write-Audit-Publish) pattern
// Simplified workflow:
//
// - Main branch is initially empty (0 rows)
// - A WAP snapshot exists with wap.id = "test_wap_001" and 2 rows
// - publish_changes should cherry-pick the WAP snapshot into the main branch
// =====================================================================================

logger.info("Starting simplified WAP (Write-Audit-Publish) workflow verification test")

// WAP test database and table
String wap_db = "wap_test"
String wap_table = "orders_wap"

// Step 1: Verify no data is visible before publish_changes
logger.info("Step 1: Verifying table is empty before publish_changes")
qt_wap_before_publish """
SELECT order_id, customer_id, amount, order_date
FROM ${catalog_name}.${wap_db}.${wap_table}
ORDER BY order_id
"""

// Step 2: Publish the WAP changes with wap_id = "test_wap_001"
logger.info("Step 2: Publishing WAP changes with wap_id=test_wap_001")
sql """
ALTER TABLE ${catalog_name}.${wap_db}.${wap_table}
EXECUTE publish_changes("wap_id" = "test_wap_001")
"""
logger.info("Publish changes executed successfully")

// Step 3: Verify WAP data is visible after publish_changes
logger.info("Step 3: Verifying WAP data is visible after publish_changes")
qt_wap_after_publish """
SELECT order_id, customer_id, amount, order_date
FROM ${catalog_name}.${wap_db}.${wap_table}
ORDER BY order_id
"""

logger.info("Simplified WAP (Write-Audit-Publish) workflow verification completed successfully")

// Negative tests for publish_changes

// publish_changes on table without write.wap.enabled = true (should fail)
test {
String nonWapDb = "wap_test"
String nonWapTable = "orders_non_wap"

sql """
ALTER TABLE ${catalog_name}.${nonWapDb}.${nonWapTable}
EXECUTE publish_changes("wap_id" = "test_wap_001")
"""
exception "Cannot find snapshot with wap.id = test_wap_001"
}


// publish_changes with missing wap_id (should fail)
test {
sql """
ALTER TABLE ${catalog_name}.${db_name}.${table_name}
EXECUTE publish_changes ()
"""
exception "Missing required argument: wap_id"
}

// publish_changes with invalid wap_id (should fail)
test {
sql """
ALTER TABLE ${catalog_name}.${wap_db}.${wap_table}
EXECUTE publish_changes("wap_id" = "non_existing_wap_id")
"""
exception "Cannot find snapshot with wap.id = non_existing_wap_id"
}

// publish_changes with partition specification (should fail)
test {
sql """
ALTER TABLE ${catalog_name}.${db_name}.${table_name}
EXECUTE publish_changes ("wap_id" = "test_wap_001") PARTITIONS (part1)
"""
exception "Action 'publish_changes' does not support partition specification"
}

// publish_changes with WHERE condition (should fail)
test {
sql """
ALTER TABLE ${catalog_name}.${db_name}.${table_name}
EXECUTE publish_changes ("wap_id" = "test_wap_001") WHERE id > 0
"""
exception "Action 'publish_changes' does not support WHERE condition"
}
INSERT INTO orders_wap VALUES
(1, 103, 150.00, '2025-12-03'),
(2, 104, 320.25, '2025-12-04');

@xylaaaaa
Copy link
Contributor Author

run buildall

…bles (apache#58755)

- **Issue Number**: part of apache#58199
- **Related PR**: N/A

Problem Summary:
This PR implements the `publish_changes` action for Iceberg tables. This
action serves as the "Publish" step in the Write-Audit-Publish (WAP)
pattern. The procedure locates a snapshot tagged with a specific
`wap.id` property and cherry-picks it into the current table state. This
allows users to atomically make "staged" data visible after validation.

Syntax:
```sql
EXECUTE TABLE catalog.db.table_name publish_changes("wap_id" = "batch_123");
````

Output:
Returns `previous_snapshot_id` (STRING) and `current_snapshot_id`
(STRING) indicating the state transition.

Use cases:

1.  Implement Write-Audit-Publish (WAP) workflows.
2.  Atomically publish validated data to the main branch.
3.  Manage staged snapshots based on custom WAP IDs.

Co-authored-by: Chenjunwei <chenjunwei@ChenjunweideMacBook-Pro.local>
@xylaaaaa xylaaaaa force-pushed the auto-pick-58755-branch-4.1 branch from cea928c to af87e5f Compare March 16, 2026 05:10
@xylaaaaa
Copy link
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants