From 532788c78d08a5acf2e55d93d38a83f400c0133e Mon Sep 17 00:00:00 2001 From: LJ Date: Thu, 6 Mar 2025 16:50:43 -0800 Subject: [PATCH] Add documents to clarify indexable types and vector indexing metrics. --- docs/docs/core/data_types.mdx | 28 +++++++++++++++++++++++++++- docs/docs/core/flow_def.mdx | 4 ++-- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/docs/docs/core/data_types.mdx b/docs/docs/core/data_types.mdx index 4d72b8b9..791b0cc7 100644 --- a/docs/docs/core/data_types.mdx +++ b/docs/docs/core/data_types.mdx @@ -43,4 +43,30 @@ A struct has a bunch of fields, each with a name and a type. A table has a collection of rows, each of which is a struct with specified schema. -The first field of a table is always the primary key. \ No newline at end of file +The first field of a table is always the primary key. + +## Indexable Types + +### Key Types + +Currently, the following types are supported as types for key fields: + +- `bytes` +- `str` +- `bool` +- `int64` +- `range` +- Struct with all fields being key types + +### Vector Type + +Users can create vector index on fields with `vector` types. +A vector index also needs to be configured with a similarity metric, and the index is only effective when this metric is used during retrieval. + +Following metrics are supported: + +| Metric Name | Description | Similarity Order | +|-------------|-------------|------------------| +| `CosineSimilarity` | [Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) | Larger is more similar | +| `L2Distance` | [L2 distance (a.k.a. Euclidean distance)](https://en.wikipedia.org/wiki/Euclidean_distance) | Smaller is more similar | +| `InnerProduct` | [Inner product](https://en.wikipedia.org/wiki/Inner_product_space) | Larger is more similar | diff --git a/docs/docs/core/flow_def.mdx b/docs/docs/core/flow_def.mdx index 6ed0b64f..f8675640 100644 --- a/docs/docs/core/flow_def.mdx +++ b/docs/docs/core/flow_def.mdx @@ -198,8 +198,8 @@ Export must happen at the top level of a flow, i.e. not within any child scopes * `name`: the name to identify the export target. * `target_spec`: the storage spec as the export target. -* `primary_key_fields` (optional): the fields to be used as primary key. -* `vector_index` (optional): the fields to create vector index. +* `primary_key_fields` (optional): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details. +* `vector_index` (optional): the fields to create vector index. Each item is a tuple of a field name and a similarity metric. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.