chore: add timing logs for file index partition and file listing by suryaprasanna · Pull Request #18417 · apache/hudi

suryaprasanna · 2026-03-29T20:22:55Z

Describe the issue this Pull Request addresses

File index operations currently do not provide enough visibility into how much time is spent listing partitions, fetching files from metadata, and filtering files into file slices. This makes it harder to debug slow queries and identify where file-index time is being spent.

Summary and Changelog

Adds timing and context logs in BaseHoodieTableFileIndex for key file-index stages.

Log time taken to list partition paths with and without partition predicates
Log cache miss counts before fetching uncached partition files
Log time taken by getAllFilesInPartitions
Log time taken by filterFiles while building file slices
Include table name in partition listing failure messages

Impact

No public API or user-facing behavior change. This improves observability for file-index execution and helps diagnose performance issues in partition and file listing paths.

Risk Level

low

This change only adds logging and slightly updates an internal exception message with table context. No functional behavior is intended to change.

Documentation Update

none

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

codecov-commenter · 2026-03-29T21:42:12Z

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.22%. Comparing base (1eb97b3) to head (6b4efb3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...java/org/apache/hudi/BaseHoodieTableFileIndex.java	94.73%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #18417   +/-   ##
=========================================
  Coverage     68.21%   68.22%           
+ Complexity    27709    27698   -11     
=========================================
  Files          2440     2440           
  Lines        134249   134265   +16     
  Branches      16179    16181    +2     
=========================================
+ Hits          91578    91599   +21     
+ Misses        35565    35560    -5     
  Partials       7106     7106

Flag	Coverage Δ
common-and-other-modules	`44.32% <78.94%> (+<0.01%)`	⬆️
hadoop-mr-java-client	`45.01% <78.94%> (+0.08%)`	⬆️
spark-client-hadoop-common	`48.32% <78.94%> (+<0.01%)`	⬆️
spark-java-tests	`48.72% <94.73%> (+0.01%)`	⬆️
spark-scala-tests	`45.25% <94.73%> (+0.01%)`	⬆️
utilities	`38.40% <78.94%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...java/org/apache/hudi/BaseHoodieTableFileIndex.java	`84.54% <94.73%> (+1.06%)`	⬆️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hudi-bot · 2026-03-29T21:45:54Z

CI report:

6b4efb3 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

voonhous

Main concern here is that the logs are too noisy.

Are these logs only used for debugging?
Should we use log.debug(...) for routine timing logs instead of log.info(...)?
We can consider logging at info level when something is slow, i.e. if (elapsed > threshold) log.info(...) or unexpected (e.g., large cache miss ratio)

voonhous · 2026-03-30T07:54:57Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

                      .collect(Collectors.toList())
          ));
+    } finally {
+      log.info("On {} with query instant as {}, it took {}ms to filter {} files into file slices across {} partitions",


My main concern here is that the logs generated here will be too noisy.

This fires on every single query, even if it took 1ms. For a busy Spark application hitting multiple Hudi tables, this alone generates one info line per query.

voonhous · 2026-03-30T07:57:01Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

+      log.info("On {}, out of {} partition paths, {} are missing from cache. Loading them.",
+          metaClient.getTableConfig().getTableName(), partitionPaths.size(), missingPartitionPaths.size());


uard the "missing from cache" log with if (missingPartitionPaths.size() > 0) there is no point logging when everything is cached.

Log time taken in listing partitions and list file under them

6b4efb3

github-actions bot added the size:S PR with lines of changes in (10, 100] label Mar 29, 2026

voonhous reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add timing logs for file index partition and file listing#18417

chore: add timing logs for file index partition and file listing#18417
suryaprasanna wants to merge 1 commit intoapache:masterfrom
suryaprasanna:file-index-logging

suryaprasanna commented Mar 29, 2026

Uh oh!

codecov-commenter commented Mar 29, 2026

Uh oh!

hudi-bot commented Mar 29, 2026

Uh oh!

voonhous left a comment

Uh oh!

voonhous Mar 30, 2026

Uh oh!

voonhous Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		log.info("On {}, out of {} partition paths, {} are missing from cache. Loading them.",
		metaClient.getTableConfig().getTableName(), partitionPaths.size(), missingPartitionPaths.size());

Conversation

suryaprasanna commented Mar 29, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

codecov-commenter commented Mar 29, 2026

Codecov Report

Uh oh!

hudi-bot commented Mar 29, 2026

CI report:

Uh oh!

voonhous left a comment

Choose a reason for hiding this comment

Uh oh!

voonhous Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

voonhous Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants