Skip to content

Adding hudi-cli-mcp to assist AI assistants to help and manage hudi tables#18554

Open
nsivabalan wants to merge 1 commit intoapache:masterfrom
nsivabalan:hudi-cli-mcp-support
Open

Adding hudi-cli-mcp to assist AI assistants to help and manage hudi tables#18554
nsivabalan wants to merge 1 commit intoapache:masterfrom
nsivabalan:hudi-cli-mcp-support

Conversation

@nsivabalan
Copy link
Copy Markdown
Contributor

@nsivabalan nsivabalan commented Apr 23, 2026

Describe the issue this Pull Request addresses

Adds an MCP (Model Context Protocol) server for the Hudi CLI, enabling AI assistants (Claude, ChatGPT, etc.) to inspect, diagnose, and operate on Hudi tables through natural language.
This bridges the gap between Hudi's powerful CLI tooling and the growing ecosystem of AI-powered developer tools.

Summary and Changelog

Introduces scripts/hudi-cli-mcp, a Python-based MCP server that wraps the Hudi CLI in a structured interface with 40+ tools.

  • Connection management: connect/disconnect to local or cloud Hudi tables
  • Read-only execution: execute any read-only Hudi CLI command with structured JSON output
  • Composite read workflows: table health check, timeline summary, storage analysis, commit details
  • Write operations with tiered safety: LOW (immediate), MEDIUM (token confirmation), HIGH (dry-run preview + token confirmation)
  • 19 write operation tools: savepoints, compaction, clustering, cleaning, rollbacks, metadata management, table config updates, repair, archival
  • 4 guided write workflows: compaction, clustering, safe rollback, table repair
  • ASCII FlipTable parser: converts raw Hudi CLI output into structured JSON for reliable LLM consumption
  • Token-based confirmation protocol: 5-minute TTL, single-use tokens for destructive operations
  • Full test suite: unit tests for command validation, output parsing, safety manager, session management, and write operations

No existing code was modified. All new code is under scripts/hudi-cli-mcp/.

Impact

No impact on existing Hudi code, APIs, or storage format. This is a standalone Python tool under scripts/ that invokes the Hudi CLI as a subprocess.

Risk Level

none

This is a new, additive script with no changes to existing Hudi modules. It shells out to the existing hudi-cli binary and does not modify any table state without explicit user
confirmation through the safety protocol.

Documentation Update

A detailed README.md is included in scripts/hudi-cli-mcp/ covering:

  • Prerequisites and environment setup
  • Installation instructions
  • MCP client configuration (Claude Desktop, Claude Code, generic clients)
  • Full tool reference organized by category
  • Safety model explanation
  • Usage examples
  • Architecture overview
  • Contributing guide

A follow-up update to the Hudi website can be done to add this to the CLI/tooling documentation.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@nsivabalan
Copy link
Copy Markdown
Contributor Author

Sample commands that I was able to try out

"connect to /tmp/trips_table"

⏺ Here's what's at /tmp/trips_table:

┌──────────────────┬────────────────────┐
│ Property │ Value │
├──────────────────┼────────────────────┤
│ Table name │ trips_table │
├──────────────────┼────────────────────┤
│ Table type │ COPY_ON_WRITE │
├──────────────────┼────────────────────┤
│ Record key │ uuid │
├──────────────────┼────────────────────┤
│ Partition field │ city │
├──────────────────┼────────────────────┤
│ Base file format │ Parquet │
├──────────────────┼────────────────────┤
│ Key generator │ SimpleKeyGenerator │
├──────────────────┼────────────────────┤
│ Table version │ 6 │
└──────────────────┴────────────────────┘

Partitions: chennai, san_francisco, sao_paulo

Timeline (5 completed commits + 1 inflight):

  • 20260422001020750 — commit
  • 20260422001112975 — commit
  • 20260422001125821 — commit
  • 20260422001236612 — commit
  • 20260422001326843 — commit
  • 20260422001330433 — replacecommit (clustering/compaction)
  • 20260422001401481 — inflight (incomplete)

Plus 2 clean actions.

What would you like to do with this table? I can help with inspecting commits, reading data, analyzing the timeline, running CLI commands, etc.

❯ analyze the timeline

Timeline Analysis for /tmp/trips_table

Schema

fields: ts (long), uuid (string), rider (string), driver (string), fare (double), city (string)

Commit History

┌─────┬──────────────┬───────────────┬─────────┬───────────────┬───────────┬─────────┬──────────────┬────────────────┐
│ # │ Instant │ Type │ Op │ Inserts │ Updates │ Deletes │ Total Writes │ Size/partition │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 1 │ ...001020750 │ commit │ UPSERT │ 5 (3+1+1) │ 0 │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 2 │ ...001112975 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 3 │ ...001125821 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 4 │ ...001236612 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 5 │ ...001326843 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 6 │ ...001330433 │ replacecommit │ CLUSTER │ 5 (new files) │ 0 │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 7 │ ...001401481 │ inflight │ UPSERT │ — │ — │ — │ — │ not committed │
└─────┴──────────────┴───────────────┴─────────┴───────────────┴───────────┴─────────┴──────────────┴────────────────┘

Clean Actions

  • ...001239093 — Cleaned files from commit 1 (...001020750), retaining from commit 3 onward. Policy: KEEP_LATEST_COMMITS. Deleted 3 files (one per partition).
  • ...001329128 — Cleaned files from commit 2 (...001112975), retaining from commit 4 onward. Deleted 3 files (one per partition).

File Lineage

Original file group (commits 1-5, one file per partition):

┌───────────────┬──────────────┐
│ Partition │ FileId │
├───────────────┼──────────────┤
│ san_francisco │ 415dbece-... │
├───────────────┼──────────────┤
│ sao_paulo │ b0b31443-... │
├───────────────┼──────────────┤
│ chennai │ 30ae9920-... │
└───────────────┴──────────────┘

After clustering (commit 6 replaced all original files with new ones):

┌───────────────┬──────────────┐
│ Partition │ New FileId │
├───────────────┼──────────────┤
│ san_francisco │ 6014f1fa-... │
├───────────────┼──────────────┤
│ sao_paulo │ bc88c4f3-... │
├───────────────┼──────────────┤
│ chennai │ eec5aa74-... │
└───────────────┴──────────────┘

Key Observations

  1. Initial load + 4 update rounds: Commit 1 inserted 5 records (3 SF, 1 SP, 1 CH). Commits 2-5 each updated all 5 records with zero inserts — pure in-place updates.
  2. Clustering executed: Commit 6 is a replacecommit/CLUSTER that replaced all 3 original file IDs with new ones across all partitions. The data was rewritten into fresh files.
  3. Inflight commit (stuck?): ...001401481 has a .requested and .inflight but no .commit — this write never completed. The inflight metadata shows it was attempting upserts against the
    post-clustering file IDs (e.g., 6014f1fa-...), with entries for empty fileIds alongside the real ones. This could indicate a failed or interrupted write.
  4. Small table: Only 5 total records across 3 partitions. File sizes are ~436KB each (dominated by Parquet/metadata overhead for such a small dataset).
  5. Clean policy working correctly: KEEP_LATEST_COMMITS is retaining ~2 recent versions and cleaning older file slices as expected.

Would you like me to dig deeper into the inflight commit, inspect the actual parquet data, or check the archived timeline?

❯ timeline show incomplete

⏺ There's one incomplete instant on the timeline:

┌───────────────────┬──────────┬────────┐
│ Instant │ State │ Action │
├───────────────────┼──────────┼────────┤
│ 20260422001401481 │ inflight │ UPSERT │
└───────────────────┴──────────┴────────┘

Details from the inflight metadata:

  • Partitions touched: san_francisco, sao_paulo, chennai
  • File groups targeted: The post-clustering file IDs (6014f1fa-..., bc88c4f3-..., eec5aa74-...)
  • Planned writes: 3 updates (SF) + 1 update (SP) + 1 update (CH) = 5 update writes
  • Bytes written: 0 (never got to writing)
  • Has .requested: yes
  • Has .inflight: yes
  • Has .commit: no — this write was started but never completed

This looks like a failed or interrupted upsert that was left dangling. If you want to recover the table, this instant would need to be rolled back (commit rollback).

❯ metadata timeline show incomplete

Read 1 file (ctrl+o to expand)

⏺ The metadata table timeline has no incomplete instants. All deltacommits are fully completed (each has .requested, .inflight, and .deltacommit):

┌───────────────────┬────────────────────────────┐
│ Metadata Instant │ Corresponds To │
├───────────────────┼────────────────────────────┤
│ 00000000000000010 │ Bootstrap/init │
├───────────────────┼────────────────────────────┤
│ 20260422001020750 │ Commit 1 (initial inserts) │
├───────────────────┼────────────────────────────┤
│ 20260422001112975 │ Commit 2 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001125821 │ Commit 3 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001236612 │ Commit 4 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001239093 │ Clean 1 │
├───────────────────┼────────────────────────────┤
│ 20260422001326843 │ Commit 5 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001329128 │ Clean 2 │
├───────────────────┼────────────────────────────┤
│ 20260422001330433 │ Clustering replacecommit │
└───────────────────┴────────────────────────────┘

Notable: the inflight data table commit (20260422001401481) has no corresponding entry in the metadata table timeline — meaning the metadata table was never updated for that failed
write, which is consistent (metadata commits after the data commit succeeds).

❯ show fsview latest

⏺ Bash(for partition in chennai san_francisco sao_paulo; do echo "=== $partition ==="; ls -la /tmp/trips_table/$partition/*.parquet 2>/dev/null; done)
⎿  === chennai ===
-rw-r--r-- 1 nsb wheel 436028 Apr 22 00:13 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet
-rw-r--r-- 1 nsb wheel 436024 Apr 22 00:11 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet
… +14 lines (ctrl+o to expand)

⏺ Latest File System View

The latest valid file slice per file group is determined by the last completed commit. Since 20260422001401481 is inflight (not committed), those files are not valid for reads.

chennai (1 record)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ eec5aa74-... (clustered) │ eec5aa74-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
30ae9920-... │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk (not yet cleaned):

  • 30ae9920-..._20260422001125821.parquet (436 KB)
  • 30ae9920-..._20260422001236612.parquet (436 KB)
  • 30ae9920-..._20260422001326843.parquet (436 KB)

Inflight (invalid): eec5aa74-..._20260422001401481.parquet — 0 bytes (empty file)

san_francisco (3 records)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ 6014f1fa-... (clustered) │ 6014f1fa-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
415dbece-... │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk:

  • 415dbece-..._20260422001125821.parquet (436 KB)
  • 415dbece-..._20260422001236612.parquet (436 KB)
  • 415dbece-..._20260422001326843.parquet (436 KB)

Inflight (invalid): 6014f1fa-..._20260422001401481.parquet (436 KB — data written but not committed)

sao_paulo (1 record)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ bc88c4f3-... (clustered) │ bc88c4f3-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
b0b31443-... │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk:

  • b0b31443-..._20260422001125821.parquet (436 KB)
  • b0b31443-..._20260422001236612.parquet (436 KB)
  • b0b31443-..._20260422001326843.parquet (436 KB)

Inflight (invalid): bc88c4f3-..._20260422001401481.parquet (436 KB)

Summary

  • 3 active file groups (post-clustering), one per partition
  • 9 stale file slices from pre-clustering file groups (awaiting next clean)
  • 3 inflight files from the failed commit — 1 is 0 bytes (chennai), 2 have data but are uncommitted
  • Total disk usage: ~15 files, ~6.5 MB (mostly overhead for 5 tiny records)

❯ Can you not trim or shorten the content while printing. I could not see the full file path or file group id.

⏺ Understood. Let me redo that with full values.

Read 1 file, recalled 1 memory, wrote 2 memories (ctrl+o to expand)

⏺ Latest File System View

chennai (1 record)

Active file group: eec5aa74-7eab-42e9-a67a-37c354dd9288-0

┌────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬──────────────────────┐
│ File │ Commit │ Size │ Status │
├────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼──────────────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,161 bytes │ Latest valid slice │
├────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼──────────────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet │ 20260422001401481 (inflight) │ 0 bytes │ Inflight, empty file │
└────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴──────────────────────┘

Replaced file group: 30ae9920-ee63-40cb-9527-64661b63dcbb-0 (replaced by clustering at 20260422001330433)

┌────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 20260422001125821 │ 436,024 bytes │ Stale, awaiting clean │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 20260422001236612 │ 436,025 bytes │ Stale, awaiting clean │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 20260422001326843 │ 436,028 bytes │ Stale, awaiting clean │
└────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘


san_francisco (3 records)

Active file group: 6014f1fa-25f1-41cf-b91f-91737c25691a-0

┌──────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,378 bytes │ Latest valid slice │
├──────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_0-167-271_20260422001401481.parquet │ 20260422001401481 (inflight) │ 436,235 bytes │ Inflight, uncommitted │
└──────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴───────────────────────┘

Replaced file group: 415dbece-a4c1-42e3-9eae-c9b871634675-0 (replaced by clustering at 20260422001330433)

┌──────────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-68-106_20260422001125821.parquet │ 20260422001125821 │ 436,238 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-91-145_20260422001236612.parquet │ 20260422001236612 │ 436,238 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-124-200_20260422001326843.parquet │ 20260422001326843 │ 436,243 bytes │ Stale, awaiting clean │
└──────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘


sao_paulo (1 record)

Active file group: bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0

┌──────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,185 bytes │ Latest valid slice │
├──────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_1-167-272_20260422001401481.parquet │ 20260422001401481 (inflight) │ 436,051 bytes │ Inflight, uncommitted │
└──────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴───────────────────────┘

Replaced file group: b0b31443-70a7-4123-9c06-4233b57997ac-0 (replaced by clustering at 20260422001330433)

┌──────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-68-107_20260422001125821.parquet │ 20260422001125821 │ 436,048 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-91-146_20260422001236612.parquet │ 20260422001236612 │ 436,048 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-124-201_20260422001326843.parquet │ 20260422001326843 │ 436,052 bytes │ Stale, awaiting clean │
└──────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘


Summary

  • 3 active file groups (post-clustering), one per partition
  • 9 stale file slices from replaced file groups (awaiting next clean)
  • 3 inflight files from failed commit 20260422001401481 (chennai one is 0 bytes, other two have data but are uncommitted)
  • Total: 15 files on disk, ~6.5 MB

✻ Sautéed for 32s

❯ can you show lineage for file group 30ae9920-ee63-40cb-9527-64661b63dcbb-0

⏺ File Group Lineage: 30ae9920-ee63-40cb-9527-64661b63dcbb-0 (partition: chennai)

┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬─────────┬─────────┬─────────┬────────────┬────────────┐
│ Commit │ File Path │ Prev Commit │ Inserts │ Updates │ Deletes │ Total │ Size │
│ │ │ │ │ │ │ Writes │ (bytes) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001020750 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-22-30_20260422001020750.parquet │ null (created) │ 1 │ 0 │ 0 │ 1 │ 436,018 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001112975 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-45-69_20260422001112975.parquet │ 20260422001020750 │ 0 │ 1 │ 0 │ 1 │ 436,020 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001125821 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 20260422001112975 │ 0 │ 1 │ 0 │ 1 │ 436,024 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001236612 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 20260422001125821 │ 0 │ 1 │ 0 │ 1 │ 436,025 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001326843 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 20260422001236612 │ 0 │ 1 │ 0 │ 1 │ 436,028 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001330433 │ REPLACED by clustering │ — │ — │ — │ — │ — │ — │
└───────────────────┴────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴─────────┴─────────┴─────────┴────────────┴────────────┘

Successor file group: eec5aa74-7eab-42e9-a67a-37c354dd9288-0 (created by replacecommit/CLUSTER at 20260422001330433)

Cleaned file slices:

  • chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-22-30_20260422001020750.parquet — cleaned at 20260422001239093
  • chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-45-69_20260422001112975.parquet — cleaned at 20260422001329128

Still on disk (not yet cleaned):

  • chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet (436,024 bytes)
  • chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet (436,025 bytes)
  • chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet (436,028 bytes)

Lifecycle: Created at commit 20260422001020750 with 1 insert, received 4 successive upserts (1 record updated each time), then replaced by clustering at 20260422001330433. The file
group is now dead — 2 of 5 slices have been cleaned, 3 remain on disk awaiting the next clean action.

❯ stats filesizes

⏺ Bash(for f in $(find /tmp/trips_table -name ".parquet" -not -path "/.hoodie/*" | sort); do stat -f "%z" "$f" | tr '\n' ' '; echo "$f"; done)
⎿  436028 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet
436024 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet
436025 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet
… +12 lines (ctrl+o to expand)

⏺ File Sizes

┌──────────────────────────────────────────────────────────────────────────────────────────┬──────────────┐
│ File │ Size (bytes) │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 436,024 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 436,025 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 436,028 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330433.parquet │ 436,161 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet │ 0 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-68-106_20260422001125821.parquet │ 436,238 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-91-145_20260422001236612.parquet │ 436,238 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-124-200_20260422001326843.parquet │ 436,243 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet │ 436,378 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_0-167-271_20260422001401481.parquet │ 436,235 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-68-107_20260422001125821.parquet │ 436,048 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-91-146_20260422001236612.parquet │ 436,048 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-124-201_20260422001326843.parquet │ 436,052 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_20260422001330433.parquet │ 436,185 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_1-167-272_20260422001401481.parquet │ 436,051 │
└──────────────────────────────────────────────────────────────────────────────────────────┴──────────────┘

Summary Statistics

┌───────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric │ Value │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Total files │ 15 │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Total size │ 6,104,094 bytes (5.82 MB) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Min size │ 0 bytes (chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Max size │ 436,378 bytes (san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Mean size │ 406,940 bytes │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Median size │ 436,048 bytes │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Std deviation │ 112,536 bytes (skewed by the 0-byte inflight file) │
└───────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Per-Partition Breakdown

┌───────────────┬───────┬─────────────────┬───────────────┐
│ Partition │ Files │ Total Size │ Avg Size │
├───────────────┼───────┼─────────────────┼───────────────┤
│ chennai │ 5 │ 1,744,238 bytes │ 348,848 bytes │
├───────────────┼───────┼─────────────────┼───────────────┤
│ san_francisco │ 5 │ 2,181,332 bytes │ 436,266 bytes │
├───────────────┼───────┼─────────────────┼───────────────┤
│ sao_paulo │ 5 │ 2,180,384 bytes │ 436,077 bytes │
└───────────────┴───────┴─────────────────┴───────────────┘

Note: Excluding the 0-byte inflight file in chennai, all files are tightly clustered around 436 KB — which is almost entirely Parquet metadata/overhead given there are only 1-3 records
per file.

❯ commit showpartitions

⏺ Which commit instant? Here are the available ones:

  1. 20260422001020750 — commit (initial inserts)
  2. 20260422001112975 — commit (upsert)
  3. 20260422001125821 — commit (upsert)
  4. 20260422001236612 — commit (upsert)
  5. 20260422001326843 — commit (upsert)
  6. 20260422001330433 — replacecommit (clustering)

❯ 20260422001326843

⏺ Commit 20260422001326843 — Partition Details

┌───────────────┬───────────┬────────────────────────────────────────┬─────────┬─────────┬─────────┬──────────────┬─────────────┬───────────────┐
│ Partition │ Operation │ File Group │ Inserts │ Updates │ Deletes │ Total Writes │ Write Bytes │ File Size │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ san_francisco │ UPSERT │ 415dbece-a4c1-42e3-9eae-c9b871634675-0 │ 0 │ 3 │ 0 │ 3 │ 436,243 │ 436,243 bytes │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ sao_paulo │ UPSERT │ b0b31443-70a7-4123-9c06-4233b57997ac-0 │ 0 │ 1 │ 0 │ 1 │ 436,052 │ 436,052 bytes │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ chennai │ UPSERT │ 30ae9920-ee63-40cb-9527-64661b63dcbb-0 │ 0 │ 1 │ 0 │ 1 │ 436,028 │ 436,028 bytes │
└───────────────┴───────────┴────────────────────────────────────────┴─────────┴─────────┴─────────┴──────────────┴─────────────┴───────────────┘

┌─────────────────────┬───────────────────┐
│ Metric │ Value │
├─────────────────────┼───────────────────┤
│ Previous commit │ 20260422001236612 │
├─────────────────────┼───────────────────┤
│ Total partitions │ 3 │
├─────────────────────┼───────────────────┤
│ Total inserts │ 0 │
├─────────────────────┼───────────────────┤
│ Total updates │ 5 │
├─────────────────────┼───────────────────┤
│ Total deletes │ 0 │
├─────────────────────┼───────────────────┤
│ Total writes │ 5 │
├─────────────────────┼───────────────────┤
│ Total bytes written │ 1,308,323 │
├─────────────────────┼───────────────────┤
│ Write errors │ 0 │
└─────────────────────┴───────────────────┘

Runtime Stats

┌───────────────┬────────────────┬──────────────────┬──────────────────┐
│ Partition │ Scan Time (ms) │ Upsert Time (ms) │ Create Time (ms) │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ san_francisco │ 0 │ 236 │ 0 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ sao_paulo │ 0 │ 235 │ 0 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ chennai │ 0 │ 110 │ 0 │
└───────────────┴────────────────┴──────────────────┴──────────────────┘

❯ commit show_write_stats

⏺ Which commit instant?

  1. 20260422001020750 — commit (initial inserts)
  2. 20260422001112975 — commit (upsert)
  3. 20260422001125821 — commit (upsert)
  4. 20260422001236612 — commit (upsert)
  5. 20260422001326843 — commit (upsert)
  6. 20260422001330433 — replacecommit (clustering)

❯ 20260422001330433

⏺ Write Stats for Commit 20260422001330433 (replacecommit / CLUSTER)

Per-File Write Stats

┌────────────┬─────────────────────────────────┬─────────────────────────────────────────────────────────────────────────┬──────┬───────┬───────┬───────┬──────┬───────┬───────┐
│ │ │ │ Prev │ Inser │ Updat │ Delet │ Tota │ Write │ File │
│ Partition │ File Group │ File Path │ Com │ ts │ es │ es │ l Wr │ │ Size │
│ │ │ │ mit │ │ │ │ ites │ Bytes │ │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ san_franci │ 6014f1fa-25f1-41cf-b91f-91737c2 │ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422 │ null │ │ │ │ │ 436,3 │ 436,3 │
│ sco │ 5691a-0 │ 001330433.parquet │ (ne │ 3 │ 0 │ 0 │ 3 │ 78 │ 78 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ │ bc88c4f3-b6af-4928-8b9a-8f23bd7 │ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_202604220013 │ null │ │ │ │ │ 436,1 │ 436,1 │
│ sao_paulo │ 76c14-0 │ 30433.parquet │ (ne │ 1 │ 0 │ 0 │ 1 │ 85 │ 85 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ │ eec5aa74-7eab-42e9-a67a-37c354d │ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330 │ null │ │ │ │ │ 436,1 │ 436,1 │
│ chennai │ d9288-0 │ 433.parquet │ (ne │ 1 │ 0 │ 0 │ 1 │ 61 │ 61 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
└────────────┴─────────────────────────────────┴─────────────────────────────────────────────────────────────────────────┴──────┴───────┴───────┴───────┴──────┴───────┴───────┘

Replaced File Groups

┌───────────────┬────────────────────────────────────────┐
│ Partition │ Replaced File Group │
├───────────────┼────────────────────────────────────────┤
│ san_francisco │ 415dbece-a4c1-42e3-9eae-c9b871634675-0 │
├───────────────┼────────────────────────────────────────┤
│ sao_paulo │ b0b31443-70a7-4123-9c06-4233b57997ac-0 │
├───────────────┼────────────────────────────────────────┤
│ chennai │ 30ae9920-ee63-40cb-9527-64661b63dcbb-0 │
└───────────────┴────────────────────────────────────────┘

Runtime Stats

┌───────────────┬────────────────┬──────────────────┬──────────────────┐
│ Partition │ Scan Time (ms) │ Upsert Time (ms) │ Create Time (ms) │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ san_francisco │ 0 │ 0 │ 220 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ sao_paulo │ 0 │ 0 │ 104 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ chennai │ 0 │ 0 │ 220 │
└───────────────┴────────────────┴──────────────────┴──────────────────┘

Totals

┌─────────────────────┬───────────┐
│ Metric │ Value │
├─────────────────────┼───────────┤
│ Total inserts │ 5 │
├─────────────────────┼───────────┤
│ Total updates │ 0 │
├─────────────────────┼───────────┤
│ Total deletes │ 0 │
├─────────────────────┼───────────┤
│ Total writes │ 5 │
├─────────────────────┼───────────┤
│ Total bytes written │ 1,308,724 │
├─────────────────────┼───────────┤
│ Write errors │ 0 │
├─────────────────────┼───────────┤
│ Log records │ 0 │
├─────────────────────┼───────────┤
│ Log files compacted │ 0 │
├─────────────────────┼───────────┤
│ Compacted │ false │
└─────────────────────┴───────────┘

All records were written as inserts into new file groups — this is expected for clustering, which rewrites data from old file groups into new ones.

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the docs update! The README clearly explains the MCP server's purpose, tool inventory, safety tiers, and client configuration. A few specifics may need verification: the example JAR versions pair hudi-spark3.5-bundle with 0.14.1 (Spark 3.5 support did not land until a later Hudi release), pytest isn't listed in requirements.txt even though the Running Tests section uses it, the supported Java versions may need to be broadened, and the Claude Code config path/location could be clarified. Please have a Hudi committer or PMC member verify the version examples and finalize review.


| Variable | Description | Example |
|----------|-------------|---------|
| `SPARK_HOME` | Path to your Spark installation | `/opt/spark-3.5.3-bin-hadoop3` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The example pairs hudi-spark3.5-bundle_2.12-0.14.1.jar with Hudi 0.14.1, but Spark 3.5 support wasn't added until a later Hudi release (0.15.x/1.x) — this artifact coordinate likely doesn't exist on Maven Central. Could we point to a version combination that actually ships a Spark 3.5 bundle (or show a Spark 3.4 bundle with 0.14.1)? This appears in multiple places (the env-variable table, the Running the Server section, and the Claude Desktop / Claude Code client configs).

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.


### Environment Variables

The following environment variables must be set before starting the server:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Is Java 8 or 11 still the intended supported range? Recent Hudi releases also support Java 17, and since this tool will commonly be used against current Hudi installations, it may help to list Java 17 as well (or explicitly state which Hudi versions this is tested against).

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.


```bash
cd scripts/hudi-cli-mcp
python -m venv venv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This section states the only dependency is fastmcp>=2.0.0, but the Running Tests section below uses python -m pytest. It would help to either add pytest to requirements.txt (or a requirements-dev.txt) and mention it here, or call out that contributors need to pip install pytest separately before running tests.

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

"command": "/path/to/hudi/scripts/hudi-cli-mcp/venv/bin/python",
"args": ["/path/to/hudi/scripts/hudi-cli-mcp/server.py"],
"env": {
"SPARK_HOME": "/path/to/spark",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Could we confirm that ~/.claude/mcp.json is the correct configuration location for Claude Code? Claude Code typically manages MCP servers via the claude mcp add CLI (writing to ~/.claude.json or a project-scoped .mcp.json), so a user who creates ~/.claude/mcp.json manually may find it isn't picked up. Linking to the official Claude Code MCP docs, or showing the claude mcp add command, would be more reliable.

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.


## Overview

The server wraps the Hudi CLI (`hudi-cli`) in a structured MCP interface with:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Minor: the README refers to the Hudi CLI as hudi-cli, but the launcher in the Hudi repo is hudi-cli.sh (under hudi-cli/). A one-line note on how the server locates/invokes the CLI binary (env var? PATH lookup? bundled bundles?) would help operators who don't already know the layout.

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.97%. Comparing base (ddbdbb9) to head (503754d).
⚠️ Report is 38 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18554      +/-   ##
============================================
- Coverage     68.85%   66.97%   -1.88%     
+ Complexity    28473    27765     -708     
============================================
  Files          2475     2475              
  Lines        136672   136672              
  Branches      16631    16631              
============================================
- Hits          94110    91542    -2568     
- Misses        34999    37673    +2674     
+ Partials       7563     7457     -106     
Flag Coverage Δ
common-and-other-modules 44.46% <ø> (-0.01%) ⬇️
hadoop-mr-java-client 44.78% <ø> (-0.01%) ⬇️
spark-client-hadoop-common 48.56% <ø> (+<0.01%) ⬆️
spark-java-tests 49.40% <ø> (+0.01%) ⬆️
spark-scala-tests 45.32% <ø> (+<0.01%) ⬆️
utilities 0.00% <ø> (-38.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 132 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

5 participants