Part of #18676. RFC-104 / design PR.
Scope
User-facing entry point that triggers the bootstrap pipeline from sub-issue 5.
Tasks
- Extend
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala to recognize vector_index index type.
- Extend
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.java to:
- Accept SQL options:
vectorColumn (required), numClusters (optional, default from config), fgPerCluster (optional, default from config).
- Validate the column exists on the table and is of type
array<float> or array<double>.
- Persist user-supplied params into
HoodieIndexDefinition (so the bootstrap can read them back).
- Invoke
ScheduleIndexActionExecutor → metadata writer bootstrap path.
Example DDL the change must support:
CREATE INDEX my_vec_idx ON hudi_tbl
USING vector_index (embedding)
OPTIONS (numClusters = '128', fgPerCluster = '2');
Tests
- Negative test: missing
vectorColumn option → clear error.
- Negative test: non-array column → clear error.
- Positive test: valid DDL parses and persists the
HoodieIndexDefinition correctly.
Depends on
- Sub-issues 1, 5 (need partition type + bootstrap implementation)
Out of scope
DROP INDEX and REFRESH INDEX for vector indexes (later milestone).
Part of #18676. RFC-104 / design PR.
Scope
User-facing entry point that triggers the bootstrap pipeline from sub-issue 5.
Tasks
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scalato recognizevector_indexindex type.hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.javato:vectorColumn(required),numClusters(optional, default from config),fgPerCluster(optional, default from config).array<float>orarray<double>.HoodieIndexDefinition(so the bootstrap can read them back).ScheduleIndexActionExecutor→ metadata writer bootstrap path.Example DDL the change must support:
Tests
vectorColumnoption → clear error.HoodieIndexDefinitioncorrectly.Depends on
Out of scope
DROP INDEXandREFRESH INDEXfor vector indexes (later milestone).