[HUDI-18691] Honor IF NOT EXISTS when creating indexes by 201573 · Pull Request #18699 · apache/hudi

201573 · 2026-05-07T17:28:43Z

Describe the issue this Pull Request addresses

Spark SQL parses IF NOT EXISTS for CREATE INDEX, but the parsed flag was not propagated into the Spark index client. As a result, duplicate index creation still failed even when users explicitly requested idempotent behavior.

Summary and Changelog

This pull request honors IF NOT EXISTS for Spark SQL CREATE INDEX statements.

Changes:

Pass the parsed ignoreIfExists flag from Spark SQL CREATE INDEX commands into the Spark index client.
Skip index creation when the index already exists and IF NOT EXISTS is used.
Preserve the existing duplicate-index failure behavior when IF NOT EXISTS is not specified.

No code was copied from external sources.

Impact

Low user-facing impact. This makes CREATE INDEX IF NOT EXISTS behave as expected for existing indexes while keeping the existing strict failure behavior for plain CREATE INDEX.

There is no public API change, storage format change, or expected performance impact.

Risk Level

low

The change is scoped to Spark SQL index creation and preserves the existing non-IF NOT EXISTS failure path. Verification covered both syntax handling and secondary index behavior.

Documentation Update

none

This fixes existing command semantics and does not add a new user-facing command, config, or API.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

Testing:

git diff --check
mvn -pl hudi-spark-datasource/hudi-spark -am -Pspark3.5 -DskipTests -DskipITs -DskipUTs -DskipFTs -DskipDocker -Drat.skip=true -Dmaven.javadoc.skip=true install
mvn -pl hudi-spark-datasource/hudi-spark -Pspark3.5 -DwildcardSuites=org.apache.spark.sql.hudi.feature.index.TestIndexSyntax -Drat.skip=true org.scalatest:scalatest-maven-plugin:2.2.0:test
mvn -pl hudi-spark-datasource/hudi-spark -Pspark3.5 -DwildcardSuites=org.apache.spark.sql.hudi.feature.index.TestSecondaryIndex -Drat.skip=true org.scalatest:scalatest-maven-plugin:2.2.0:test

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR threads an ignoreIfExists flag through HoodieSparkIndexClient.create so that CREATE INDEX IF NOT EXISTS is honored for record, secondary, expression, bloom_filters, and column_stats indexes, and adds tests covering the new path. The existence-check prefix logic mirrors what HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition already uses, and the default behavior (without IF NOT EXISTS) is preserved. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small readability suggestion on the guard pattern in createExpressionOrSecondaryIndex; rest of the change is clean.

cc @yihua

hudi-agent · 2026-05-07T17:35:59Z

+    String fullIndexName = indexType.equals(PARTITION_NAME_SECONDARY_INDEX)
+        ? PARTITION_NAME_SECONDARY_INDEX_PREFIX + userIndexName
+        : PARTITION_NAME_EXPRESSION_INDEX_PREFIX + userIndexName;
+    if (indexExists(metaClient, fullIndexName) && ignoreIfExists) {


🤖 nit: the && guard here reads differently from the nested-if pattern used in createRecordIndex (line ~117). A future reader might wonder what happens when indexExists && !ignoreIfExists — not immediately obvious that the throw is handled further down. Could you use the same nested-if shape for consistency, e.g. if (indexExists(...)) { if (ignoreIfExists) { return; } }?

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

Done, updated this guard to match the nested shape used in createRecordIndex. Verified with git diff --check and the hudi-spark-client Spark 3.5 build.

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! The PR threads an ignoreIfExists flag from CreateIndexCommand through HoodieSparkIndexClient.create so that CREATE INDEX IF NOT EXISTS short-circuits when the index already exists, while leaving the existing duplicate-error path intact for plain CREATE INDEX. I traced the record-index and secondary/expression-index paths; the fullIndexName prefix logic matches what HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition uses for its own duplicate check, so behavior stays consistent when ignoreIfExists is false. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small simplification suggestion below; otherwise the change reads cleanly.

cc @yihua

hudi-agent · 2026-05-07T19:07:27Z

+    String fullIndexName = indexType.equals(PARTITION_NAME_SECONDARY_INDEX)
+        ? PARTITION_NAME_SECONDARY_INDEX_PREFIX + userIndexName
+        : PARTITION_NAME_EXPRESSION_INDEX_PREFIX + userIndexName;
+    if (indexExists(metaClient, fullIndexName)) {


🤖 nit: since there's no throw after the inner branch (unlike createRecordIndex), this nested if is just a conjunction — could you flatten to if (ignoreIfExists && indexExists(metaClient, fullIndexName)) { ... } for readability?

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

Thanks. I kept the nested shape here to match the record-index guard above and make the plain CREATE INDEX path fall through to the existing duplicate-index check in HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition. That keeps the two CREATE INDEX paths visually consistent.

hudi-bot · 2026-05-07T20:00:13Z

CI report:

f93f2c6 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

voonhous · 2026-05-12T13:51:19Z

Thank you for the contribution, can you please help to edit the PR using the template provided so that the complaince check passes?

Thank you!

codecov-commenter · 2026-05-12T15:25:03Z

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.87%. Comparing base (34e9c7c) to head (f93f2c6).
⚠️ Report is 11 commits behind head on master.

Files with missing lines	Patch %	Lines
.../org/apache/hudi/index/HoodieSparkIndexClient.java	85.71%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18699      +/-   ##
============================================
- Coverage     67.92%   67.87%   -0.05%     
+ Complexity    29003    28978      -25     
============================================
  Files          2522     2522              
  Lines        141166   141181      +15     
  Branches      17506    17509       +3     
============================================
- Hits          95881    95826      -55     
- Misses        37415    37489      +74     
+ Partials       7870     7866       -4

Flag	Coverage Δ
common-and-other-modules	`44.17% <0.00%> (-0.01%)`	⬇️
hadoop-mr-java-client	`45.00% <ø> (+<0.01%)`	⬆️
spark-client-hadoop-common	`48.34% <0.00%> (-0.01%)`	⬇️
spark-java-tests	`48.76% <40.00%> (-0.24%)`	⬇️
spark-scala-tests	`44.91% <90.00%> (+0.01%)`	⬆️
utilities	`37.63% <25.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../apache/spark/sql/hudi/command/IndexCommands.scala	`87.35% <100.00%> (+0.45%)`	⬆️
.../org/apache/hudi/index/HoodieSparkIndexClient.java	`85.61% <85.71%> (+1.28%)`	⬆️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

201573 · 2026-05-12T16:30:12Z

I updated the PR description to match the current Hudi PR template sections. The new PR Compliance run triggered by the edit is currently in action_required state and appears to need maintainer approval.\n\nFor the Java CI failure, the actual Java UT 1 - Common & Spark step completed successfully; the failed step is Generate merged coverage report in the Spark 4.0 / Scala 2.13 Java 17 job. I do not have permission to rerun the upstream checks directly, so could a maintainer please approve/re-run the failed checks when convenient?

[HUDI-18691] Honor IF NOT EXISTS when creating indexes

9e64b9b

hudi-agent reviewed May 7, 2026

View reviewed changes

[HUDI-18691] Align index existence guard style

f93f2c6

github-actions Bot added the size:S PR with lines of changes in (10, 100] label May 7, 2026

hudi-agent reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-18691] Honor IF NOT EXISTS when creating indexes#18699

[HUDI-18691] Honor IF NOT EXISTS when creating indexes#18699
201573 wants to merge 2 commits into
apache:masterfrom
201573:codex/hudi-18691-create-index-if-not-exists

201573 commented May 7, 2026 •

edited

Loading

Uh oh!

hudi-agent left a comment

Uh oh!

hudi-agent May 7, 2026

Uh oh!

201573 May 7, 2026

Uh oh!

hudi-agent left a comment

Uh oh!

hudi-agent May 7, 2026

Uh oh!

201573 May 7, 2026

Uh oh!

hudi-bot commented May 7, 2026

Uh oh!

voonhous commented May 12, 2026

Uh oh!

codecov-commenter commented May 12, 2026

Uh oh!

201573 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

201573 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-agent May 7, 2026

Choose a reason for hiding this comment

Uh oh!

201573 May 7, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-agent May 7, 2026

Choose a reason for hiding this comment

Uh oh!

201573 May 7, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented May 7, 2026

CI report:

Uh oh!

voonhous commented May 12, 2026

Uh oh!

codecov-commenter commented May 12, 2026

Codecov Report

Uh oh!

201573 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

201573 commented May 7, 2026 •

edited

Loading