fix: add new validator for cosine metric#209
Conversation
Greptile SummaryThis PR adds a validator inside Confidence Score: 4/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[FieldSchema::validate] --> B{is_vector_field?}
B -- No --> Z[Scalar index check]
B -- Yes --> C{is_sparse?}
C -- Yes --> D{data_type in\nsupport_sparse_vector_type?}
D -- No --> ERR1[InvalidArgument:\nunsupported sparse data type]
D -- Yes --> E{metric == IP?}
E -- No --> ERR2[InvalidArgument:\nsparse only supports IP metric]
E -- Yes --> F{index_params set?}
C -- No --> G{data_type in\nsupport_dense_vector_type?}
G -- No --> ERR3[InvalidArgument:\nunsupported dense data type]
G -- Yes --> H{index_type in\nsupport_dense_vector_index?}
H -- No --> ERR4[InvalidArgument:\nunsupported dense index type]
H -- Yes --> F
F -- No --> OK[OK]
F -- Yes --> I{quantize_type != UNDEFINED?}
I -- Yes --> J{data_type in\nquantize_type_map?}
J -- No --> ERR5[InvalidArgument:\ndata type does not support quantize]
J -- Yes --> K{quantize_type in\nallowed set?}
K -- No --> ERR6[InvalidArgument:\nunsupported quantize type]
K -- Yes --> L{IVF && IP?}
I -- No --> L
L -- Yes --> M{data_type is\nFP16 or FP32?}
M -- No --> ERR7[InvalidArgument:\nIVF+IP requires FP32/FP16]
M -- Yes --> N
L -- No --> N{metric == COSINE?\n🆕 NEW CHECK}
N -- Yes --> O{data_type is\nFP16 or FP32?}
O -- No --> ERR8[InvalidArgument:\nCOSINE requires FP32/FP16\nfield name + actual data type]
O -- Yes --> OK
N -- No --> OK
Last reviewed commit: 5e01a0c |
|
@greptile |
|
@greptile |
|
@greptile |
|
Why is cosine + int8 not supported at this moment? |
Currently, corresponding methods are lacking and still need to be developed. |
|
@greptile |
Greptile Summary
This PR adds a new schema validation check that rejects COSINE metric when the field's data type is not
VECTOR_FP32orVECTOR_FP16, preventing configurations such asVECTOR_INT8 + COSINE metricfrom being accepted. The fix correctly addresses three of the four issues raised in the previous review cycle:data_type_ == VECTOR_INT8), which would silently pass any other non-FP type. The new check inverts this — rejecting anything that is notVECTOR_FP16and notVECTOR_FP32— making it correctly forward-proof.quantize_type == UNDEFINEDsub-condition that was dead code for INT8 data; the new code omits it entirely.field[...]) and the actual offending data type viaDataTypeCodeBook::AsString(data_type_), consistent with the style of other error messages in the function.The check is placed after the sparse/dense and index-type guards, which means sparse vectors cannot reach it (they are already rejected if their metric is not IP), and the code path is logically sound. The validation aligns with the existing
IVF+IPguard pattern, though the COSINE check is intentionally broader (index-type-agnostic) since the FP32/FP16 requirement for COSINE is not IVF-specific.Confidence Score: 4/5
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[FieldSchema::validate] --> B{is_vector_field?} B -- No --> C{index_params_ && is_vector_index_type?} C -- Yes --> D[Error: scalar field has vector index] C -- No --> E[OK] B -- Yes --> F{is_sparse?} F -- Yes --> G{data_type in support_sparse_vector_type?} G -- No --> H[Error: unsupported sparse data type] G -- Yes --> I{index_params_?} F -- No --> J{data_type in support_dense_vector_type?} J -- No --> K[Error: unsupported dense data type] J -- Yes --> I I -- No --> L[OK] I -- Yes --> M{is_sparse?} M -- Yes --> N{index in support_sparse_vector_index?} N -- No --> O[Error: unsupported sparse index] N -- Yes --> P{metric == IP?} P -- No --> Q[Error: sparse only supports IP] P -- Yes --> R M -- No --> S{index in support_dense_vector_index?} S -- No --> T[Error: unsupported dense index] S -- Yes --> R R{quantize_type != UNDEFINED?} -- Yes --> U{data_type in quantize_type_map?} U -- No --> V[Error: data type does not support quantize] U -- Yes --> W{quantize_type in allowed set?} W -- No --> X[Error: unsupported quantize type] W -- Yes --> Y R -- No --> Y Y{IVF index AND IP metric?} -- Yes --> Z{data_type FP32 or FP16?} Z -- No --> AA[Error: IVF+IP only supports FP32/FP16] Z -- Yes --> AB Y -- No --> AB AB{metric == COSINE?} -- Yes --> AC{data_type FP32 or FP16?} AC -- No --> AD[NEW: Error: cosine only supports FP32/FP16, field name + actual type] AC -- Yes --> AE[OK] AB -- No --> AELast reviewed commit: 5e01a0c