[HUDI-18691] Honor IF NOT EXISTS when creating indexes#18699
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR threads an ignoreIfExists flag through HoodieSparkIndexClient.create so that CREATE INDEX IF NOT EXISTS is honored for record, secondary, expression, bloom_filters, and column_stats indexes, and adds tests covering the new path. The existence-check prefix logic mirrors what HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition already uses, and the default behavior (without IF NOT EXISTS) is preserved. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small readability suggestion on the guard pattern in createExpressionOrSecondaryIndex; rest of the change is clean.
cc @yihua
| String fullIndexName = indexType.equals(PARTITION_NAME_SECONDARY_INDEX) | ||
| ? PARTITION_NAME_SECONDARY_INDEX_PREFIX + userIndexName | ||
| : PARTITION_NAME_EXPRESSION_INDEX_PREFIX + userIndexName; | ||
| if (indexExists(metaClient, fullIndexName) && ignoreIfExists) { |
There was a problem hiding this comment.
🤖 nit: the && guard here reads differently from the nested-if pattern used in createRecordIndex (line ~117). A future reader might wonder what happens when indexExists && !ignoreIfExists — not immediately obvious that the throw is handled further down. Could you use the same nested-if shape for consistency, e.g. if (indexExists(...)) { if (ignoreIfExists) { return; } }?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
There was a problem hiding this comment.
Done, updated this guard to match the nested shape used in createRecordIndex. Verified with git diff --check and the hudi-spark-client Spark 3.5 build.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for working on this! The PR threads an ignoreIfExists flag from CreateIndexCommand through HoodieSparkIndexClient.create so that CREATE INDEX IF NOT EXISTS short-circuits when the index already exists, while leaving the existing duplicate-error path intact for plain CREATE INDEX. I traced the record-index and secondary/expression-index paths; the fullIndexName prefix logic matches what HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition uses for its own duplicate check, so behavior stays consistent when ignoreIfExists is false. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small simplification suggestion below; otherwise the change reads cleanly.
cc @yihua
| String fullIndexName = indexType.equals(PARTITION_NAME_SECONDARY_INDEX) | ||
| ? PARTITION_NAME_SECONDARY_INDEX_PREFIX + userIndexName | ||
| : PARTITION_NAME_EXPRESSION_INDEX_PREFIX + userIndexName; | ||
| if (indexExists(metaClient, fullIndexName)) { |
There was a problem hiding this comment.
🤖 nit: since there's no throw after the inner branch (unlike createRecordIndex), this nested if is just a conjunction — could you flatten to if (ignoreIfExists && indexExists(metaClient, fullIndexName)) { ... } for readability?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
There was a problem hiding this comment.
Thanks. I kept the nested shape here to match the record-index guard above and make the plain CREATE INDEX path fall through to the existing duplicate-index check in HoodieIndexUtils.getSecondaryOrExpressionIndexDefinition. That keeps the two CREATE INDEX paths visually consistent.
|
Thank you for the contribution, can you please help to edit the PR using the template provided so that the complaince check passes? Thank you! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18699 +/- ##
============================================
- Coverage 67.92% 67.87% -0.05%
+ Complexity 29003 28978 -25
============================================
Files 2522 2522
Lines 141166 141181 +15
Branches 17506 17509 +3
============================================
- Hits 95881 95826 -55
- Misses 37415 37489 +74
+ Partials 7870 7866 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
I updated the PR description to match the current Hudi PR template sections. The new PR Compliance run triggered by the edit is currently in action_required state and appears to need maintainer approval.\n\nFor the Java CI failure, the actual |
Describe the issue this Pull Request addresses
Closes #18691.
Spark SQL parses
IF NOT EXISTSforCREATE INDEX, but the parsed flag was not propagated into the Spark index client. As a result, duplicate index creation still failed even when users explicitly requested idempotent behavior.Summary and Changelog
This pull request honors
IF NOT EXISTSfor Spark SQLCREATE INDEXstatements.Changes:
ignoreIfExistsflag from Spark SQLCREATE INDEXcommands into the Spark index client.IF NOT EXISTSis used.IF NOT EXISTSis not specified.No code was copied from external sources.
Impact
Low user-facing impact. This makes
CREATE INDEX IF NOT EXISTSbehave as expected for existing indexes while keeping the existing strict failure behavior for plainCREATE INDEX.There is no public API change, storage format change, or expected performance impact.
Risk Level
low
The change is scoped to Spark SQL index creation and preserves the existing non-
IF NOT EXISTSfailure path. Verification covered both syntax handling and secondary index behavior.Documentation Update
none
This fixes existing command semantics and does not add a new user-facing command, config, or API.
Contributor's checklist
Testing:
git diff --checkmvn -pl hudi-spark-datasource/hudi-spark -am -Pspark3.5 -DskipTests -DskipITs -DskipUTs -DskipFTs -DskipDocker -Drat.skip=true -Dmaven.javadoc.skip=true installmvn -pl hudi-spark-datasource/hudi-spark -Pspark3.5 -DwildcardSuites=org.apache.spark.sql.hudi.feature.index.TestIndexSyntax -Drat.skip=true org.scalatest:scalatest-maven-plugin:2.2.0:testmvn -pl hudi-spark-datasource/hudi-spark -Pspark3.5 -DwildcardSuites=org.apache.spark.sql.hudi.feature.index.TestSecondaryIndex -Drat.skip=true org.scalatest:scalatest-maven-plugin:2.2.0:test