Add commit info to partition exports table#1832
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a9022d641
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| LOG_INFO(log, | ||
| "Export transaction {} already committed, skipping re-commit", | ||
| transaction_id); | ||
| return; | ||
| /// Empty commit info: paths produced by the original commit are not recoverable | ||
| /// here without re-reading the snapshot. ExportPartitionUtils::commit only | ||
| /// writes the commit_info znode when this struct is non-empty, so the original | ||
| /// committer's record (if any) is preserved. | ||
| return {}; |
There was a problem hiding this comment.
Return commit info for already-committed recovery path
When isExportPartitionTransactionAlreadyCommitted is true here, the function returns an empty ExportPartitionCommitInfo, which makes ExportPartitionUtils::commit skip writing <export-entry>/commit_info and only flip status to COMPLETED. In the common recovery scenario (snapshot committed previously, then retried before ZooKeeper was updated), this leaves committed_* fields permanently empty and contradicts the new observability behavior/documentation that expects a visible marker for already-committed retries.
Useful? React with 👍 / 👎.
| assert committed_metadata_file == "<committed in a previous run, paths unavailable>", ( | ||
| f"Expected sentinel in committed_metadata_file for already-committed retry, got: {committed_metadata_file!r}" |
There was a problem hiding this comment.
Fix sentinel expectation in post-publish failpoint test
This assertion is inconsistent with the implementation under iceberg_writes_post_publish_throw: that failpoint is ONCE, and the published catch path now returns real file paths (storage_metadata_name, manifest list, manifest file), so committed_metadata_file should be a metadata path, not the "previous run" sentinel. As written, the test will fail despite correct behavior and can block CI.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1832 (Add commit info to partition exports table): Confirmed defectsMedium: commit_info serialization failure can block COMPLETED transitionImpact: A successful destination commit (Iceberg snapshot or object-storage marker) can remain stuck in Anchor: Trigger: Any exception thrown by Why defect: The new observability side-effect (JSON stringify) is now on the critical path of the state transition; a stringify exception aborts the function before the status update, changing behavior from “commit then mark completed” to “commit then throw”. Fix direction (short): Make commit_info persistence best-effort: wrap Regression test direction (short): Add a failpoint (or targeted fault injection) that throws during commit_info serialization and assert the task still reaches EvidenceSerialization enables stream exceptions: std::string toJsonString() const
{
Poco::JSON::Object json;
json.set("iceberg_metadata_file", iceberg_metadata_file);
// ...
std::ostringstream oss;
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(json, oss);
return oss.str();
}
if (!destination_commit_info.empty())
{
ExportReplicatedMergeTreePartitionCommitInfoEntry commit_info_entry { /* ... */ };
const std::string commit_info_path = fs::path(entry_path) / "commit_info";
Coordination::Requests ops;
ops.emplace_back(zkutil::makeCreateRequest(commit_info_path, commit_info_entry.toJsonString(), zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeSetRequest(status_path, completed_name, -1));
// ...
if (rc == Coordination::Error::ZOK)
{
LOG_INFO(log, "ExportPartition: Marked export as completed and persisted commit_info");
return;
}
// fall through to status-only set on ZNODEEXISTS / other errors
}Coverage summary
|
| ### Commit info columns | ||
|
|
||
| These columns surface paths produced by the destination storage during commit, so it is possible to inspect what was written without consulting the destination directly: | ||
|
|
There was a problem hiding this comment.
Description for destination_file_paths column is missing.
| /// files and reaching this point, the task still completes via the recovery | ||
| /// path but commit_info will be absent. Recovering commit_info from the | ||
| /// live Iceberg snapshot in that case is a possible future enhancement. | ||
| const std::string status_path = fs::path(entry_path) / "status"; |
There was a problem hiding this comment.
AI thinks that exception in the block below can breaks commit transaction.
There was a problem hiding this comment.
I don't think that is totally true. here's what can happen:
once we commit to iceberg, we need to mark it as completed in zookeeper. If the code that creates the commit info throws, we don't mark it as completed in zookeeper and it remains in pending state. In the next scheduler tick, we'll try to commit it again, and we'll notice it has already been committed. In that case, we just mark it as completed. It won't remain in pending forever as far as I can tell
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add Iceberg metadata file paths and data file paths to the
system.replicated_partition_exportsto improve observability and debuggingDocumentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: