Adding hudi-cli-mcp to assist AI assistants to help and manage hudi tables by nsivabalan · Pull Request #18554 · apache/hudi

nsivabalan · 2026-04-23T03:16:50Z

Describe the issue this Pull Request addresses

Adds an MCP (Model Context Protocol) server for the Hudi CLI, enabling AI assistants (Claude, ChatGPT, etc.) to inspect, diagnose, and operate on Hudi tables through natural language.
This bridges the gap between Hudi's powerful CLI tooling and the growing ecosystem of AI-powered developer tools.

Summary and Changelog

Introduces scripts/hudi-cli-mcp, a Python-based MCP server that wraps the Hudi CLI in a structured interface with 40+ tools.

Connection management: connect/disconnect to local or cloud Hudi tables
Read-only execution: execute any read-only Hudi CLI command with structured JSON output
Composite read workflows: table health check, timeline summary, storage analysis, commit details
Write operations with tiered safety: LOW (immediate), MEDIUM (token confirmation), HIGH (dry-run preview + token confirmation)
19 write operation tools: savepoints, compaction, clustering, cleaning, rollbacks, metadata management, table config updates, repair, archival
4 guided write workflows: compaction, clustering, safe rollback, table repair
ASCII FlipTable parser: converts raw Hudi CLI output into structured JSON for reliable LLM consumption
Token-based confirmation protocol: 5-minute TTL, single-use tokens for destructive operations
Full test suite: unit tests for command validation, output parsing, safety manager, session management, and write operations

No existing code was modified. All new code is under scripts/hudi-cli-mcp/.

Impact

No impact on existing Hudi code, APIs, or storage format. This is a standalone Python tool under scripts/ that invokes the Hudi CLI as a subprocess.

Risk Level

none

This is a new, additive script with no changes to existing Hudi modules. It shells out to the existing hudi-cli binary and does not modify any table state without explicit user
confirmation through the safety protocol.

Documentation Update

A detailed README.md is included in scripts/hudi-cli-mcp/ covering:

Prerequisites and environment setup
Installation instructions
MCP client configuration (Claude Desktop, Claude Code, generic clients)
Full tool reference organized by category
Safety model explanation
Usage examples
Architecture overview
Contributing guide

A follow-up update to the Hudi website can be done to add this to the CLI/tooling documentation.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

…ables

nsivabalan · 2026-04-23T03:21:57Z

Sample commands that I was able to try out

"connect to /tmp/trips_table"

⏺ Here's what's at /tmp/trips_table:

┌──────────────────┬────────────────────┐
│ Property │ Value │
├──────────────────┼────────────────────┤
│ Table name │ trips_table │
├──────────────────┼────────────────────┤
│ Table type │ COPY_ON_WRITE │
├──────────────────┼────────────────────┤
│ Record key │ uuid │
├──────────────────┼────────────────────┤
│ Partition field │ city │
├──────────────────┼────────────────────┤
│ Base file format │ Parquet │
├──────────────────┼────────────────────┤
│ Key generator │ SimpleKeyGenerator │
├──────────────────┼────────────────────┤
│ Table version │ 6 │
└──────────────────┴────────────────────┘

Partitions: chennai, san_francisco, sao_paulo

Timeline (5 completed commits + 1 inflight):

20260422001020750 — commit
20260422001112975 — commit
20260422001125821 — commit
20260422001236612 — commit
20260422001326843 — commit
20260422001330433 — replacecommit (clustering/compaction)
20260422001401481 — inflight (incomplete)

Plus 2 clean actions.

What would you like to do with this table? I can help with inspecting commits, reading data, analyzing the timeline, running CLI commands, etc.

❯ analyze the timeline

Timeline Analysis for /tmp/trips_table

Schema

fields: ts (long), uuid (string), rider (string), driver (string), fare (double), city (string)

Commit History

┌─────┬──────────────┬───────────────┬─────────┬───────────────┬───────────┬─────────┬──────────────┬────────────────┐
│ # │ Instant │ Type │ Op │ Inserts │ Updates │ Deletes │ Total Writes │ Size/partition │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 1 │ ...001020750 │ commit │ UPSERT │ 5 (3+1+1) │ 0 │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 2 │ ...001112975 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 3 │ ...001125821 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 4 │ ...001236612 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 5 │ ...001326843 │ commit │ UPSERT │ 0 │ 5 (3+1+1) │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 6 │ ...001330433 │ replacecommit │ CLUSTER │ 5 (new files) │ 0 │ 0 │ 5 │ ~436KB each │
├─────┼──────────────┼───────────────┼─────────┼───────────────┼───────────┼─────────┼──────────────┼────────────────┤
│ 7 │ ...001401481 │ inflight │ UPSERT │ — │ — │ — │ — │ not committed │
└─────┴──────────────┴───────────────┴─────────┴───────────────┴───────────┴─────────┴──────────────┴────────────────┘

Clean Actions

...001239093 — Cleaned files from commit 1 (...001020750), retaining from commit 3 onward. Policy: KEEP_LATEST_COMMITS. Deleted 3 files (one per partition).
...001329128 — Cleaned files from commit 2 (...001112975), retaining from commit 4 onward. Deleted 3 files (one per partition).

File Lineage

Original file group (commits 1-5, one file per partition):

┌───────────────┬──────────────┐
│ Partition │ FileId │
├───────────────┼──────────────┤
│ san_francisco │ 415dbece-... │
├───────────────┼──────────────┤
│ sao_paulo │ b0b31443-... │
├───────────────┼──────────────┤
│ chennai │ 30ae9920-... │
└───────────────┴──────────────┘

After clustering (commit 6 replaced all original files with new ones):

┌───────────────┬──────────────┐
│ Partition │ New FileId │
├───────────────┼──────────────┤
│ san_francisco │ 6014f1fa-... │
├───────────────┼──────────────┤
│ sao_paulo │ bc88c4f3-... │
├───────────────┼──────────────┤
│ chennai │ eec5aa74-... │
└───────────────┴──────────────┘

Key Observations

Initial load + 4 update rounds: Commit 1 inserted 5 records (3 SF, 1 SP, 1 CH). Commits 2-5 each updated all 5 records with zero inserts — pure in-place updates.
Clustering executed: Commit 6 is a replacecommit/CLUSTER that replaced all 3 original file IDs with new ones across all partitions. The data was rewritten into fresh files.
Inflight commit (stuck?): ...001401481 has a .requested and .inflight but no .commit — this write never completed. The inflight metadata shows it was attempting upserts against the
post-clustering file IDs (e.g., 6014f1fa-...), with entries for empty fileIds alongside the real ones. This could indicate a failed or interrupted write.
Small table: Only 5 total records across 3 partitions. File sizes are ~436KB each (dominated by Parquet/metadata overhead for such a small dataset).
Clean policy working correctly: KEEP_LATEST_COMMITS is retaining ~2 recent versions and cleaning older file slices as expected.

Would you like me to dig deeper into the inflight commit, inspect the actual parquet data, or check the archived timeline?

❯ timeline show incomplete

⏺ There's one incomplete instant on the timeline:

┌───────────────────┬──────────┬────────┐
│ Instant │ State │ Action │
├───────────────────┼──────────┼────────┤
│ 20260422001401481 │ inflight │ UPSERT │
└───────────────────┴──────────┴────────┘

Details from the inflight metadata:

Partitions touched: san_francisco, sao_paulo, chennai
File groups targeted: The post-clustering file IDs (6014f1fa-..., bc88c4f3-..., eec5aa74-...)
Planned writes: 3 updates (SF) + 1 update (SP) + 1 update (CH) = 5 update writes
Bytes written: 0 (never got to writing)
Has .requested: yes
Has .inflight: yes
Has .commit: no — this write was started but never completed

This looks like a failed or interrupted upsert that was left dangling. If you want to recover the table, this instant would need to be rolled back (commit rollback).

❯ metadata timeline show incomplete

Read 1 file (ctrl+o to expand)

⏺ The metadata table timeline has no incomplete instants. All deltacommits are fully completed (each has .requested, .inflight, and .deltacommit):

┌───────────────────┬────────────────────────────┐
│ Metadata Instant │ Corresponds To │
├───────────────────┼────────────────────────────┤
│ 00000000000000010 │ Bootstrap/init │
├───────────────────┼────────────────────────────┤
│ 20260422001020750 │ Commit 1 (initial inserts) │
├───────────────────┼────────────────────────────┤
│ 20260422001112975 │ Commit 2 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001125821 │ Commit 3 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001236612 │ Commit 4 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001239093 │ Clean 1 │
├───────────────────┼────────────────────────────┤
│ 20260422001326843 │ Commit 5 (upsert) │
├───────────────────┼────────────────────────────┤
│ 20260422001329128 │ Clean 2 │
├───────────────────┼────────────────────────────┤
│ 20260422001330433 │ Clustering replacecommit │
└───────────────────┴────────────────────────────┘

Notable: the inflight data table commit (20260422001401481) has no corresponding entry in the metadata table timeline — meaning the metadata table was never updated for that failed
write, which is consistent (metadata commits after the data commit succeeds).

❯ show fsview latest

⏺ Bash(for partition in chennai san_francisco sao_paulo; do echo "=== $partition ==="; ls -la /tmp/trips_table/$partition/*.parquet 2>/dev/null; done)
⎿ === chennai ===
-rw-r--r-- 1 nsb wheel 436028 Apr 22 00:13 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet
-rw-r--r-- 1 nsb wheel 436024 Apr 22 00:11 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet
… +14 lines (ctrl+o to expand)

⏺ Latest File System View

The latest valid file slice per file group is determined by the last completed commit. Since 20260422001401481 is inflight (not committed), those files are not valid for reads.

chennai (1 record)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ eec5aa74-... (clustered) │ eec5aa74-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ ~~30ae9920-...~~ │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk (not yet cleaned):

30ae9920-..._20260422001125821.parquet (436 KB)
30ae9920-..._20260422001236612.parquet (436 KB)
30ae9920-..._20260422001326843.parquet (436 KB)

Inflight (invalid): eec5aa74-..._20260422001401481.parquet — 0 bytes (empty file)

san_francisco (3 records)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ 6014f1fa-... (clustered) │ 6014f1fa-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ ~~415dbece-...~~ │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk:

415dbece-..._20260422001125821.parquet (436 KB)
415dbece-..._20260422001236612.parquet (436 KB)
415dbece-..._20260422001326843.parquet (436 KB)

Inflight (invalid): 6014f1fa-..._20260422001401481.parquet (436 KB — data written but not committed)

sao_paulo (1 record)

┌──────────────────────────┬────────────────────────────────────────┬────────────┬────────┐
│ File Group │ Latest Valid Slice │ Commit │ Size │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ bc88c4f3-... (clustered) │ bc88c4f3-..._20260422001330433.parquet │ clustering │ 436 KB │
├──────────────────────────┼────────────────────────────────────────┼────────────┼────────┤
│ ~~b0b31443-...~~ │ replaced by clustering │ — │ — │
└──────────────────────────┴────────────────────────────────────────┴────────────┴────────┘

Stale slices still on disk:

b0b31443-..._20260422001125821.parquet (436 KB)
b0b31443-..._20260422001236612.parquet (436 KB)
b0b31443-..._20260422001326843.parquet (436 KB)

Inflight (invalid): bc88c4f3-..._20260422001401481.parquet (436 KB)

Summary

3 active file groups (post-clustering), one per partition
9 stale file slices from pre-clustering file groups (awaiting next clean)
3 inflight files from the failed commit — 1 is 0 bytes (chennai), 2 have data but are uncommitted
Total disk usage: ~15 files, ~6.5 MB (mostly overhead for 5 tiny records)

❯ Can you not trim or shorten the content while printing. I could not see the full file path or file group id.

⏺ Understood. Let me redo that with full values.

Read 1 file, recalled 1 memory, wrote 2 memories (ctrl+o to expand)

⏺ Latest File System View

chennai (1 record)

Active file group: eec5aa74-7eab-42e9-a67a-37c354dd9288-0

┌────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬──────────────────────┐
│ File │ Commit │ Size │ Status │
├────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼──────────────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,161 bytes │ Latest valid slice │
├────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼──────────────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet │ 20260422001401481 (inflight) │ 0 bytes │ Inflight, empty file │
└────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴──────────────────────┘

Replaced file group: 30ae9920-ee63-40cb-9527-64661b63dcbb-0 (replaced by clustering at 20260422001330433)

┌────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 20260422001125821 │ 436,024 bytes │ Stale, awaiting clean │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 20260422001236612 │ 436,025 bytes │ Stale, awaiting clean │
├────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 20260422001326843 │ 436,028 bytes │ Stale, awaiting clean │
└────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘

san_francisco (3 records)

Active file group: 6014f1fa-25f1-41cf-b91f-91737c25691a-0

┌──────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,378 bytes │ Latest valid slice │
├──────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_0-167-271_20260422001401481.parquet │ 20260422001401481 (inflight) │ 436,235 bytes │ Inflight, uncommitted │
└──────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴───────────────────────┘

Replaced file group: 415dbece-a4c1-42e3-9eae-c9b871634675-0 (replaced by clustering at 20260422001330433)

┌──────────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-68-106_20260422001125821.parquet │ 20260422001125821 │ 436,238 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-91-145_20260422001236612.parquet │ 20260422001236612 │ 436,238 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-124-200_20260422001326843.parquet │ 20260422001326843 │ 436,243 bytes │ Stale, awaiting clean │
└──────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘

sao_paulo (1 record)

Active file group: bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0

┌──────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_20260422001330433.parquet │ 20260422001330433 (clustering) │ 436,185 bytes │ Latest valid slice │
├──────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_1-167-272_20260422001401481.parquet │ 20260422001401481 (inflight) │ 436,051 bytes │ Inflight, uncommitted │
└──────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┴───────────────┴───────────────────────┘

Replaced file group: b0b31443-70a7-4123-9c06-4233b57997ac-0 (replaced by clustering at 20260422001330433)

┌──────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬───────────────┬───────────────────────┐
│ File │ Commit │ Size │ Status │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-68-107_20260422001125821.parquet │ 20260422001125821 │ 436,048 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-91-146_20260422001236612.parquet │ 20260422001236612 │ 436,048 bytes │ Stale, awaiting clean │
├──────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼───────────────┼───────────────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-124-201_20260422001326843.parquet │ 20260422001326843 │ 436,052 bytes │ Stale, awaiting clean │
└──────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────────┴───────────────────────┘

Summary

3 active file groups (post-clustering), one per partition
9 stale file slices from replaced file groups (awaiting next clean)
3 inflight files from failed commit 20260422001401481 (chennai one is 0 bytes, other two have data but are uncommitted)
Total: 15 files on disk, ~6.5 MB

✻ Sautéed for 32s

❯ can you show lineage for file group 30ae9920-ee63-40cb-9527-64661b63dcbb-0

⏺ File Group Lineage: 30ae9920-ee63-40cb-9527-64661b63dcbb-0 (partition: chennai)

┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬─────────┬─────────┬─────────┬────────────┬────────────┐
│ Commit │ File Path │ Prev Commit │ Inserts │ Updates │ Deletes │ Total │ Size │
│ │ │ │ │ │ │ Writes │ (bytes) │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001020750 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-22-30_20260422001020750.parquet │ null (created) │ 1 │ 0 │ 0 │ 1 │ 436,018 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001112975 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-45-69_20260422001112975.parquet │ 20260422001020750 │ 0 │ 1 │ 0 │ 1 │ 436,020 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001125821 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 20260422001112975 │ 0 │ 1 │ 0 │ 1 │ 436,024 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001236612 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 20260422001125821 │ 0 │ 1 │ 0 │ 1 │ 436,025 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001326843 │ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 20260422001236612 │ 0 │ 1 │ 0 │ 1 │ 436,028 │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────┼─────────┼─────────┼────────────┼────────────┤
│ 20260422001330433 │ REPLACED by clustering │ — │ — │ — │ — │ — │ — │
└───────────────────┴────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴─────────┴─────────┴─────────┴────────────┴────────────┘

Successor file group: eec5aa74-7eab-42e9-a67a-37c354dd9288-0 (created by replacecommit/CLUSTER at 20260422001330433)

Cleaned file slices:

chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-22-30_20260422001020750.parquet — cleaned at 20260422001239093
chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-45-69_20260422001112975.parquet — cleaned at 20260422001329128

Still on disk (not yet cleaned):

chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet (436,024 bytes)
chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet (436,025 bytes)
chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet (436,028 bytes)

Lifecycle: Created at commit 20260422001020750 with 1 insert, received 4 successive upserts (1 record updated each time), then replaced by clustering at 20260422001330433. The file
group is now dead — 2 of 5 slices have been cleaned, 3 remain on disk awaiting the next clean action.

❯ stats filesizes

⏺ Bash(for f in $(find /tmp/trips_table -name ".parquet" -not -path "/.hoodie/*" | sort); do stat -f "%z" "$f" | tr '\n' ' '; echo "$f"; done)
⎿ 436028 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet
436024 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet
436025 /tmp/trips_table/chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet
… +12 lines (ctrl+o to expand)

⏺ File Sizes

┌──────────────────────────────────────────────────────────────────────────────────────────┬──────────────┐
│ File │ Size (bytes) │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-68-108_20260422001125821.parquet │ 436,024 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-91-147_20260422001236612.parquet │ 436,025 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/30ae9920-ee63-40cb-9527-64661b63dcbb-0_2-124-202_20260422001326843.parquet │ 436,028 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330433.parquet │ 436,161 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet │ 0 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-68-106_20260422001125821.parquet │ 436,238 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-91-145_20260422001236612.parquet │ 436,238 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/415dbece-a4c1-42e3-9eae-c9b871634675-0_0-124-200_20260422001326843.parquet │ 436,243 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet │ 436,378 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_0-167-271_20260422001401481.parquet │ 436,235 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-68-107_20260422001125821.parquet │ 436,048 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-91-146_20260422001236612.parquet │ 436,048 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/b0b31443-70a7-4123-9c06-4233b57997ac-0_1-124-201_20260422001326843.parquet │ 436,052 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_20260422001330433.parquet │ 436,185 │
├──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_1-167-272_20260422001401481.parquet │ 436,051 │
└──────────────────────────────────────────────────────────────────────────────────────────┴──────────────┘

Summary Statistics

┌───────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric │ Value │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Total files │ 15 │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Total size │ 6,104,094 bytes (5.82 MB) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Min size │ 0 bytes (chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_2-167-273_20260422001401481.parquet) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Max size │ 436,378 bytes (san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422001330433.parquet) │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Mean size │ 406,940 bytes │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Median size │ 436,048 bytes │
├───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Std deviation │ 112,536 bytes (skewed by the 0-byte inflight file) │
└───────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Per-Partition Breakdown

┌───────────────┬───────┬─────────────────┬───────────────┐
│ Partition │ Files │ Total Size │ Avg Size │
├───────────────┼───────┼─────────────────┼───────────────┤
│ chennai │ 5 │ 1,744,238 bytes │ 348,848 bytes │
├───────────────┼───────┼─────────────────┼───────────────┤
│ san_francisco │ 5 │ 2,181,332 bytes │ 436,266 bytes │
├───────────────┼───────┼─────────────────┼───────────────┤
│ sao_paulo │ 5 │ 2,180,384 bytes │ 436,077 bytes │
└───────────────┴───────┴─────────────────┴───────────────┘

Note: Excluding the 0-byte inflight file in chennai, all files are tightly clustered around 436 KB — which is almost entirely Parquet metadata/overhead given there are only 1-3 records
per file.

❯ commit showpartitions

⏺ Which commit instant? Here are the available ones:

20260422001020750 — commit (initial inserts)
20260422001112975 — commit (upsert)
20260422001125821 — commit (upsert)
20260422001236612 — commit (upsert)
20260422001326843 — commit (upsert)
20260422001330433 — replacecommit (clustering)

❯ 20260422001326843

⏺ Commit 20260422001326843 — Partition Details

┌───────────────┬───────────┬────────────────────────────────────────┬─────────┬─────────┬─────────┬──────────────┬─────────────┬───────────────┐
│ Partition │ Operation │ File Group │ Inserts │ Updates │ Deletes │ Total Writes │ Write Bytes │ File Size │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ san_francisco │ UPSERT │ 415dbece-a4c1-42e3-9eae-c9b871634675-0 │ 0 │ 3 │ 0 │ 3 │ 436,243 │ 436,243 bytes │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ sao_paulo │ UPSERT │ b0b31443-70a7-4123-9c06-4233b57997ac-0 │ 0 │ 1 │ 0 │ 1 │ 436,052 │ 436,052 bytes │
├───────────────┼───────────┼────────────────────────────────────────┼─────────┼─────────┼─────────┼──────────────┼─────────────┼───────────────┤
│ chennai │ UPSERT │ 30ae9920-ee63-40cb-9527-64661b63dcbb-0 │ 0 │ 1 │ 0 │ 1 │ 436,028 │ 436,028 bytes │
└───────────────┴───────────┴────────────────────────────────────────┴─────────┴─────────┴─────────┴──────────────┴─────────────┴───────────────┘

┌─────────────────────┬───────────────────┐
│ Metric │ Value │
├─────────────────────┼───────────────────┤
│ Previous commit │ 20260422001236612 │
├─────────────────────┼───────────────────┤
│ Total partitions │ 3 │
├─────────────────────┼───────────────────┤
│ Total inserts │ 0 │
├─────────────────────┼───────────────────┤
│ Total updates │ 5 │
├─────────────────────┼───────────────────┤
│ Total deletes │ 0 │
├─────────────────────┼───────────────────┤
│ Total writes │ 5 │
├─────────────────────┼───────────────────┤
│ Total bytes written │ 1,308,323 │
├─────────────────────┼───────────────────┤
│ Write errors │ 0 │
└─────────────────────┴───────────────────┘

Runtime Stats

┌───────────────┬────────────────┬──────────────────┬──────────────────┐
│ Partition │ Scan Time (ms) │ Upsert Time (ms) │ Create Time (ms) │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ san_francisco │ 0 │ 236 │ 0 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ sao_paulo │ 0 │ 235 │ 0 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ chennai │ 0 │ 110 │ 0 │
└───────────────┴────────────────┴──────────────────┴──────────────────┘

❯ commit show_write_stats

⏺ Which commit instant?

20260422001020750 — commit (initial inserts)
20260422001112975 — commit (upsert)
20260422001125821 — commit (upsert)
20260422001236612 — commit (upsert)
20260422001326843 — commit (upsert)
20260422001330433 — replacecommit (clustering)

❯ 20260422001330433

⏺ Write Stats for Commit 20260422001330433 (replacecommit / CLUSTER)

Per-File Write Stats

┌────────────┬─────────────────────────────────┬─────────────────────────────────────────────────────────────────────────┬──────┬───────┬───────┬───────┬──────┬───────┬───────┐
│ │ │ │ Prev │ Inser │ Updat │ Delet │ Tota │ Write │ File │
│ Partition │ File Group │ File Path │ Com │ ts │ es │ es │ l Wr │ │ Size │
│ │ │ │ mit │ │ │ │ ites │ Bytes │ │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ san_franci │ 6014f1fa-25f1-41cf-b91f-91737c2 │ san_francisco/6014f1fa-25f1-41cf-b91f-91737c25691a-0_1-144-233_20260422 │ null │ │ │ │ │ 436,3 │ 436,3 │
│ sco │ 5691a-0 │ 001330433.parquet │ (ne │ 3 │ 0 │ 0 │ 3 │ 78 │ 78 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ │ bc88c4f3-b6af-4928-8b9a-8f23bd7 │ sao_paulo/bc88c4f3-b6af-4928-8b9a-8f23bd776c14-0_2-144-234_202604220013 │ null │ │ │ │ │ 436,1 │ 436,1 │
│ sao_paulo │ 76c14-0 │ 30433.parquet │ (ne │ 1 │ 0 │ 0 │ 1 │ 85 │ 85 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
├────────────┼─────────────────────────────────┼─────────────────────────────────────────────────────────────────────────┼──────┼───────┼───────┼───────┼──────┼───────┼───────┤
│ │ eec5aa74-7eab-42e9-a67a-37c354d │ chennai/eec5aa74-7eab-42e9-a67a-37c354dd9288-0_0-144-232_20260422001330 │ null │ │ │ │ │ 436,1 │ 436,1 │
│ chennai │ d9288-0 │ 433.parquet │ (ne │ 1 │ 0 │ 0 │ 1 │ 61 │ 61 │
│ │ │ │ w) │ │ │ │ │ │ bytes │
└────────────┴─────────────────────────────────┴─────────────────────────────────────────────────────────────────────────┴──────┴───────┴───────┴───────┴──────┴───────┴───────┘

Replaced File Groups

┌───────────────┬────────────────────────────────────────┐
│ Partition │ Replaced File Group │
├───────────────┼────────────────────────────────────────┤
│ san_francisco │ 415dbece-a4c1-42e3-9eae-c9b871634675-0 │
├───────────────┼────────────────────────────────────────┤
│ sao_paulo │ b0b31443-70a7-4123-9c06-4233b57997ac-0 │
├───────────────┼────────────────────────────────────────┤
│ chennai │ 30ae9920-ee63-40cb-9527-64661b63dcbb-0 │
└───────────────┴────────────────────────────────────────┘

Runtime Stats

┌───────────────┬────────────────┬──────────────────┬──────────────────┐
│ Partition │ Scan Time (ms) │ Upsert Time (ms) │ Create Time (ms) │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ san_francisco │ 0 │ 0 │ 220 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ sao_paulo │ 0 │ 0 │ 104 │
├───────────────┼────────────────┼──────────────────┼──────────────────┤
│ chennai │ 0 │ 0 │ 220 │
└───────────────┴────────────────┴──────────────────┴──────────────────┘

Totals

┌─────────────────────┬───────────┐
│ Metric │ Value │
├─────────────────────┼───────────┤
│ Total inserts │ 5 │
├─────────────────────┼───────────┤
│ Total updates │ 0 │
├─────────────────────┼───────────┤
│ Total deletes │ 0 │
├─────────────────────┼───────────┤
│ Total writes │ 5 │
├─────────────────────┼───────────┤
│ Total bytes written │ 1,308,724 │
├─────────────────────┼───────────┤
│ Write errors │ 0 │
├─────────────────────┼───────────┤
│ Log records │ 0 │
├─────────────────────┼───────────┤
│ Log files compacted │ 0 │
├─────────────────────┼───────────┤
│ Compacted │ false │
└─────────────────────┴───────────┘

All records were written as inserts into new file groups — this is expected for clustering, which rewrites data from old file groups into new ones.

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the docs update! The README clearly explains the MCP server's purpose, tool inventory, safety tiers, and client configuration. A few specifics may need verification: the example JAR versions pair hudi-spark3.5-bundle with 0.14.1 (Spark 3.5 support did not land until a later Hudi release), pytest isn't listed in requirements.txt even though the Running Tests section uses it, the supported Java versions may need to be broadened, and the Claude Code config path/location could be clarified. Please have a Hudi committer or PMC member verify the version examples and finalize review.

hudi-agent · 2026-04-23T03:22:09Z

+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `SPARK_HOME` | Path to your Spark installation | `/opt/spark-3.5.3-bin-hadoop3` |


🤖 The example pairs hudi-spark3.5-bundle_2.12-0.14.1.jar with Hudi 0.14.1, but Spark 3.5 support wasn't added until a later Hudi release (0.15.x/1.x) — this artifact coordinate likely doesn't exist on Maven Central. Could we point to a version combination that actually ships a Spark 3.5 bundle (or show a Spark 3.4 bundle with 0.14.1)? This appears in multiple places (the env-variable table, the Running the Server section, and the Claude Desktop / Claude Code client configs).

_{- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.}

hudi-agent · 2026-04-23T03:22:09Z

+
+### Environment Variables
+
+The following environment variables must be set before starting the server:


🤖 Is Java 8 or 11 still the intended supported range? Recent Hudi releases also support Java 17, and since this tool will commonly be used against current Hudi installations, it may help to list Java 17 as well (or explicitly state which Hudi versions this is tested against).

_{- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.}

hudi-agent · 2026-04-23T03:22:09Z

+
+```bash
+cd scripts/hudi-cli-mcp
+python -m venv venv


🤖 This section states the only dependency is fastmcp>=2.0.0, but the Running Tests section below uses python -m pytest. It would help to either add pytest to requirements.txt (or a requirements-dev.txt) and mention it here, or call out that contributors need to pip install pytest separately before running tests.

_{- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.}

hudi-agent · 2026-04-23T03:22:09Z

+      "command": "/path/to/hudi/scripts/hudi-cli-mcp/venv/bin/python",
+      "args": ["/path/to/hudi/scripts/hudi-cli-mcp/server.py"],
+      "env": {
+        "SPARK_HOME": "/path/to/spark",


🤖 Could we confirm that ~/.claude/mcp.json is the correct configuration location for Claude Code? Claude Code typically manages MCP servers via the claude mcp add CLI (writing to ~/.claude.json or a project-scoped .mcp.json), so a user who creates ~/.claude/mcp.json manually may find it isn't picked up. Linking to the official Claude Code MCP docs, or showing the claude mcp add command, would be more reliable.

_{- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.}

hudi-agent · 2026-04-23T03:22:09Z

+
+## Overview
+
+The server wraps the Hudi CLI (`hudi-cli`) in a structured MCP interface with:


🤖 Minor: the README refers to the Hudi CLI as hudi-cli, but the launcher in the Hudi repo is hudi-cli.sh (under hudi-cli/). A one-line note on how the server locates/invokes the CLI binary (env var? PATH lookup? bundled bundles?) would help operators who don't already know the layout.

_{- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.}

hudi-bot · 2026-04-23T03:57:16Z

CI report:

503754d Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

codecov-commenter · 2026-05-03T07:28:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.97%. Comparing base (ddbdbb9) to head (503754d).
⚠️ Report is 38 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18554      +/-   ##
============================================
- Coverage     68.85%   66.97%   -1.88%     
+ Complexity    28473    27765     -708     
============================================
  Files          2475     2475              
  Lines        136672   136672              
  Branches      16631    16631              
============================================
- Hits          94110    91542    -2568     
- Misses        34999    37673    +2674     
+ Partials       7563     7457     -106

Flag	Coverage Δ
common-and-other-modules	`44.46% <ø> (-0.01%)`	⬇️
hadoop-mr-java-client	`44.78% <ø> (-0.01%)`	⬇️
spark-client-hadoop-common	`48.56% <ø> (+<0.01%)`	⬆️
spark-java-tests	`49.40% <ø> (+0.01%)`	⬆️
spark-scala-tests	`45.32% <ø> (+<0.01%)`	⬆️
utilities	`0.00% <ø> (-38.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 132 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adding hudi-cli-mcp to assist AI assistants to help and manage hudi t…

503754d

…ables

hudi-agent reviewed Apr 23, 2026

View reviewed changes

rahil-c mentioned this pull request May 2, 2026

Adding support for mcp for hudi cli operations #18553

Open

14 tasks

rahil-c added this to Hudi 1.3.0 (Epic only) May 2, 2026

rahil-c moved this to In progress in Hudi 1.3.0 (Epic only) May 2, 2026

geserdugarov mentioned this pull request May 7, 2026

feat: skill for utilization of Hudi CLI geserdugarov/hudi-open-source#28

Open

pichaautobot mentioned this pull request May 7, 2026

#28: feat: skill for utilization of Hudi CLI geserdugarov/hudi-open-source#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding hudi-cli-mcp to assist AI assistants to help and manage hudi tables#18554

Adding hudi-cli-mcp to assist AI assistants to help and manage hudi tables#18554
nsivabalan wants to merge 1 commit intoapache:masterfrom
nsivabalan:hudi-cli-mcp-support

nsivabalan commented Apr 23, 2026 •

edited

Loading

Uh oh!

nsivabalan commented Apr 23, 2026

Uh oh!

hudi-agent left a comment

Uh oh!

hudi-agent Apr 23, 2026

Uh oh!

hudi-agent Apr 23, 2026

Uh oh!

hudi-agent Apr 23, 2026

Uh oh!

hudi-agent Apr 23, 2026

Uh oh!

hudi-agent Apr 23, 2026

Uh oh!

hudi-bot commented Apr 23, 2026

Uh oh!

codecov-commenter commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		### Environment Variables

		The following environment variables must be set before starting the server:


		## Overview

		The server wraps the Hudi CLI (`hudi-cli`) in a structured MCP interface with:

Conversation

nsivabalan commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan commented Apr 23, 2026

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-agent Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Apr 23, 2026

CI report:

Uh oh!

codecov-commenter commented May 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nsivabalan commented Apr 23, 2026 •

edited

Loading