Skip to content

[python] Add CLI command for table repair (Part 3/3)#7945

Draft
JunRuiLee wants to merge 3 commits into
apache:masterfrom
JunRuiLee:pypaimon-repair-cli
Draft

[python] Add CLI command for table repair (Part 3/3)#7945
JunRuiLee wants to merge 3 commits into
apache:masterfrom
JunRuiLee:pypaimon-repair-cli

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Summary

  • Add pypaimon table repair <db.table> [--fix] CLI command
  • Default is dry-run (report only); --fix applies available repairs

Context

Split from #7940 following @JingsongLi's review comment.

Depends on Part 2: #7944

  • Part 1: Read-only verification logic ✅
  • Part 2: Fix mode + catalog integration ✅
  • Part 3 (this PR): CLI command

Please merge in order: Part 1 → Part 2 → Part 3.

Tests added

  • No new test cases in this PR (CLI integration tested via existing catalog-level tests)

JunRuiLee added 3 commits May 24, 2026 11:16
Implement read-only metadata consistency verification for Paimon tables.
This verifies the chain: LATEST -> snapshot -> manifest list -> manifest
files -> data files, and reports any broken links or corrupted files.

Key components:
- RepairIssue/RepairReport: data classes for structured issue reporting
- TableRepair.verify(): walks the metadata chain and detects issues
- Support for branch-qualified tables and partitioned data file paths
- Respects custom partition.default-name configuration
- Progress logging every 1000 data files when check_data_files=True
- Documented time complexity: O(total_data_files)
Add the ability to fix metadata inconsistencies found during verification.
Currently supports fixing the LATEST hint file to point to the newest
valid snapshot when it references a missing one.

Key additions:
- TableRepair.repair(dry_run=False): applies fixes after verification
- repair_table/repair_database/repair_catalog module-level entry points
- Catalog.repair_table/repair_database/repair_catalog API with type annotations
- FileSystemCatalog implementation delegating to repair module
- Fix mode selects newest snapshot with intact manifest chain
- check_data_files is respected when choosing which snapshot to fix to
- Per-table error isolation in repair_database (continues on failure)
- Idempotent fix operations (safe to re-run after interruption)
Add 'table repair' CLI command that verifies and optionally fixes
table metadata consistency.

Usage: pypaimon table repair <database.table> [--fix]
- Default mode: dry-run (report issues only)
- With --fix: apply available repairs
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to contain the full diff from Parts 1+2+3 combined (the operation/repair.py and repair_test.py appear in full). If this is meant to be reviewed as a standalone "Part 3/3" (CLI addition only), the diff should only contain the CLI changes — the repair module changes belong in Parts 1 and 2.

If the intent is that this PR is self-contained (all 3 parts in one), then Parts 1 and 2 (#7943, #7944) should probably be closed.

Regarding the CLI-specific additions:

  1. cmd_table_repair looks good — simple and follows the pattern of other commands.

  2. Missing --check-data-files flag: Same comment as on Part 2 — consider exposing this option in the CLI.

  3. Exit code: When report.has_errors and it's a dry run, you print a hint but exit 0. Consider exit(1) for unhealthy tables even in dry-run mode, so scripting/CI can detect issues:

    if report.has_errors:
        sys.exit(1)

Please clarify the PR dependency/structure between #7943, #7944, and this PR.

@JunRuiLee JunRuiLee marked this pull request as draft May 24, 2026 12:57
Copy link
Copy Markdown
Contributor Author

@JunRuiLee JunRuiLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! This PR depends on Part 1 (#7943) and Part 2 (#7944). The current diff includes all three parts because the base hasn't been rebased yet. Marking as Draft — once #7943 and #7944 are merged in order, I'll rebase onto master so this PR only contains the CLI-specific changes. Will address your feedback points then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants