[python] Add CLI command for table repair (Part 3/3)#7945
Conversation
Implement read-only metadata consistency verification for Paimon tables. This verifies the chain: LATEST -> snapshot -> manifest list -> manifest files -> data files, and reports any broken links or corrupted files. Key components: - RepairIssue/RepairReport: data classes for structured issue reporting - TableRepair.verify(): walks the metadata chain and detects issues - Support for branch-qualified tables and partitioned data file paths - Respects custom partition.default-name configuration - Progress logging every 1000 data files when check_data_files=True - Documented time complexity: O(total_data_files)
Add the ability to fix metadata inconsistencies found during verification. Currently supports fixing the LATEST hint file to point to the newest valid snapshot when it references a missing one. Key additions: - TableRepair.repair(dry_run=False): applies fixes after verification - repair_table/repair_database/repair_catalog module-level entry points - Catalog.repair_table/repair_database/repair_catalog API with type annotations - FileSystemCatalog implementation delegating to repair module - Fix mode selects newest snapshot with intact manifest chain - check_data_files is respected when choosing which snapshot to fix to - Per-table error isolation in repair_database (continues on failure) - Idempotent fix operations (safe to re-run after interruption)
Add 'table repair' CLI command that verifies and optionally fixes table metadata consistency. Usage: pypaimon table repair <database.table> [--fix] - Default mode: dry-run (report issues only) - With --fix: apply available repairs
JingsongLi
left a comment
There was a problem hiding this comment.
This PR seems to contain the full diff from Parts 1+2+3 combined (the operation/repair.py and repair_test.py appear in full). If this is meant to be reviewed as a standalone "Part 3/3" (CLI addition only), the diff should only contain the CLI changes — the repair module changes belong in Parts 1 and 2.
If the intent is that this PR is self-contained (all 3 parts in one), then Parts 1 and 2 (#7943, #7944) should probably be closed.
Regarding the CLI-specific additions:
-
cmd_table_repairlooks good — simple and follows the pattern of other commands. -
Missing
--check-data-filesflag: Same comment as on Part 2 — consider exposing this option in the CLI. -
Exit code: When
report.has_errorsand it's a dry run, you print a hint but exit 0. Consider exit(1) for unhealthy tables even in dry-run mode, so scripting/CI can detect issues:if report.has_errors: sys.exit(1)
Please clarify the PR dependency/structure between #7943, #7944, and this PR.
JunRuiLee
left a comment
There was a problem hiding this comment.
Thanks for the review! This PR depends on Part 1 (#7943) and Part 2 (#7944). The current diff includes all three parts because the base hasn't been rebased yet. Marking as Draft — once #7943 and #7944 are merged in order, I'll rebase onto master so this PR only contains the CLI-specific changes. Will address your feedback points then.
Summary
pypaimon table repair <db.table> [--fix]CLI command--fixapplies available repairsContext
Split from #7940 following @JingsongLi's review comment.
Depends on Part 2: #7944
Please merge in order: Part 1 → Part 2 → Part 3.
Tests added