Skip to content

[Vector Index] (Tracking) RaBitQ, read-path pruning, write path, maintenance #18857

@rahil-c

Description

@rahil-c

Part of #18676. RFC-104 / design PR.

Scope

Tracking issue only. These items are explicitly out of scope for the milestone-1 sub-issues (1–7). They live here so they don't get lost — each will be broken out into its own sub-issue once milestone 1 lands.

Deferred work

  • RaBitQ quantization: replace the raw array<float> payload with packed binary codes + optional norm scalar (see RaBitQEncoder.java, VectorQuantizer.java in the design PR).
  • Generation manifest & quantizer record: __manifest__ / __centroids__ / __quantizer__ rows for atomic generation activation.
  • Read-path pruning: VectorIndexPruner, VectorIndexMdtSearchUtils, RaBitQApproxDistanceUDF, VectorIndexSupport.scala.
  • Write path: assign incoming records to clusters at write time, write tombstones for deletes (RFC-104 write-path doc).
  • Maintenance: cluster-imbalance / centroid-drift detection, LIRE-style incremental rebalancing, generation rebuild.
  • Flink and Java engine support (Spark-first stays as Spark-only in milestone 1).

Action

Leave this issue open; close once milestone 1 (sub-issues 1–7) is merged and follow-up sub-issues are filed for each item above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:featureNew features and enhancements

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions