From 3cc07c65a179c231a5fa72117ca933339d5fa0be Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Fri, 12 Sep 2025 15:30:49 -0400 Subject: [PATCH 1/2] * Added note to Euclidean Distance to explain that Couchbase uses Euclidean Squared Distance instead behind the scenes. * Changed "FTS Vector Index" to "Search Vector Index" which is PM's current nomenclature. --- .../pages/hyperscale-vector-index.adoc | 2 +- .../pages/use-vector-indexes.adoc | 28 +++++++++---------- .../pages/vectors-and-indexes-overview.adoc | 27 +++++++++++------- .../partials/fts-vector-app-workflow.puml | 2 +- 4 files changed, 33 insertions(+), 26 deletions(-) diff --git a/modules/vector-index/pages/hyperscale-vector-index.adoc b/modules/vector-index/pages/hyperscale-vector-index.adoc index 036dcf776..9ca86667a 100644 --- a/modules/vector-index/pages/hyperscale-vector-index.adoc +++ b/modules/vector-index/pages/hyperscale-vector-index.adoc @@ -13,7 +13,7 @@ They can scale up to a billion documents containing vectors with a large number of dimensions. Because they provide the best performance, consider testing a Hyperscale Vector index for your application before resorting to the other types of indexes. -If you find theirs performance does not meet your needs, then test using a Composite Vector Index or a FTS Vector Index. +If you find theirs performance does not meet your needs, then test using a Composite Vector Index or a Search Vector Index. == How the Hyperscale Vector Index Works diff --git a/modules/vector-index/pages/use-vector-indexes.adoc b/modules/vector-index/pages/use-vector-indexes.adoc index 1faaea4ee..970ab7926 100644 --- a/modules/vector-index/pages/use-vector-indexes.adoc +++ b/modules/vector-index/pages/use-vector-indexes.adoc @@ -55,7 +55,7 @@ Use Composite Vector indexes when you want to perform searches of documents usin To learn how to use Composite Vector indexes, see xref:vector-index:composite-vector-index.adoc[]. -FTS Vector Index:: +Search Vector Index:: + -- * Combines a Couchbase {product-name} Search index with a single vector column @@ -67,7 +67,7 @@ FTS Vector Index:: Use this index type when you need to perform hybrid searches that combine vectors with full-text or geospatial searches. + -To learn how to use FTS Vector indexes, see xref:vector-search:vector-search.adoc[]. +To learn how to use Search Vector Indexes, see xref:vector-search:vector-search.adoc[]. == Choosing the Right Index Type @@ -82,7 +82,7 @@ The following table summarizes the differences between the three types of vector [%autowidth] |=== -| | Hyperscale Vector Index | Composite Vector Index | FTS Vector Index +| | Hyperscale Vector Index | Composite Vector Index | Search Vector Index | *First Available in Version* | 8.0 @@ -142,7 +142,7 @@ When choosing which type of index to use, consider the following: * In most cases, test using a Hyperscale Vector index. If you find that the performance is not what you need, you can try using one of the other index types. -* If your dataset will not grow beyond 100 million documents and you need to perform hybrid searches that combine vector searches with Full-Text Search or geospatial searches, use an FTS Vector index. +* If your dataset will not grow beyond 100 million documents and you need to perform hybrid searches that combine vector searches with Full-Text Search or geospatial searches, use an Search Vector Index. == Applications for Vector Indexes @@ -238,10 +238,10 @@ Finally, it sends the results of the similarity search back to the Query Service See xref:vector-index:composite-vector-index.adoc[] for more information about Composite Vector indexes. [#fts] -=== FTS Vector Index Applications +=== Search Vector Index Applications -FTS Vector indexes contain a single vector column in addition to a Full-Text Search index. -Some of the applications for FTS Vector indexes include: +Search Vector Indexes contain a single vector column in addition to a Full-Text Search index. +Some of the applications for Search Vector Indexes include: E-Commerce product recommendations:: E-Commerce applications can use scalar, text, and vector searches to find products that match a customer's search. @@ -256,16 +256,16 @@ Users often want to search for hotels using multiple criteria: * Semantic searches of descriptions and reviews for searches that do not rely on literal text matches, such as "modern beach resort with chic décor," which requires vector searches. + -An FTS vector index can combine geospatial, keyword, and semantic searches into a single index. +An Search Vector Index can combine geospatial, keyword, and semantic searches into a single index. Real estate searches:: -Real estate applications can use FTS Vector indexes to find properties within a search region and have floor plan similar to an uploaded image. +Real estate applications can use Search Vector Indexes to find properties within a search region and have floor plan similar to an uploaded image. -=== FTS Vector Index Application Workflow +=== Search Vector Index Application Workflow -After you create an FTS Vector index, your application follows the workflow shown in the following diagram: +After you create an Search Vector Index, your application follows the workflow shown in the following diagram: -.Application Workflow with FTS Vector Indexes +.Application Workflow with Search Vector Indexes [plantuml,fts-app-workflow,svg] .... include::vector-index:partial$fts-vector-app-workflow.puml[] @@ -275,10 +275,10 @@ The steps shown in the diagram are: . When your application loads data it wants to search semantically, it calls an embedding model to generate a vector for it. . It sends the data and the vector to Couchbase {product-name} for storage. -. The Data Service sends the embedded vector along with scalar fields to the Search Service for inclusion in the FTS Vector index. +. The Data Service sends the embedded vector along with scalar fields to the Search Service for inclusion in the Search Vector Index. . When your application needs to perform a search that includes a vector, it uses the same embedding model to generate a vector for the search value. . It sends the search vector and text, geospatial, and other search values as part of a search request to the Couchbase {product-name} Search Service. -. The Search Service performs an index scan in the FTS Vector index to find documents that match the text or geospatial portions of the query. +. The Search Service performs an index scan in the Search Vector Index to find documents that match the text or geospatial portions of the query. Then it performs a vector similarity search on the results using the search vector. . The Search Service returns results to your application. diff --git a/modules/vector-index/pages/vectors-and-indexes-overview.adoc b/modules/vector-index/pages/vectors-and-indexes-overview.adoc index 868d2f5c9..d6c2e5a8b 100644 --- a/modules/vector-index/pages/vectors-and-indexes-overview.adoc +++ b/modules/vector-index/pages/vectors-and-indexes-overview.adoc @@ -127,7 +127,13 @@ Use this method when the actual distance of the vectors and their magnitudes are This method is useful if the distance between vectors represents a real-world value. image::euclidean-distance-example.svg["Three-dimensional plot showing two vectors with points along each vector joined by dotted lines, indicating the summing of corresponding points."] - + +NOTE: When you select Euclidean Distance or L2 as the metric for a vector index, Couchbase {product-name} internally uses the <<#euclidean-squared>> metric (explained in the sext section) to perform vector comparisons. +This approach improves performance because it avoids performing a computationally expensive square root operation. +Vector searches using the Euclidean Squared metric return the same relevant vectors and ranking of results as Euclidean Distance. +If your query materializes or projects the actual distance between vectors, Couchbase {product-name} calculates the actual Euclidean Distance. +For example, if your query returns the distance between vectors as a column, Couchbase {product-name} calculates the square root of the Euclidean Squared distance to return the actual Euclidean Distance. + Euclidean Distance is useful for tasks such as: * 3D motion capture where you're detecting similar positions or trajectories of joints, objects, where finding real-world values for thresholds is important. @@ -135,13 +141,14 @@ Euclidean Distance is useful for tasks such as: * Other cases where you use the results as filters in calculations that require the actual distance between the vectors. NOTE: Only Hyperscale Vector and Composite Vector indexes support this metric. -FTS Vector indexes do not support it. +Search Vector Indexes do not support it. [#euclidean-squared] === Euclidean Squared Distance -Euclidean Squared Distance (also known as L2 Squared or L2^2^) is similar to Euclidean Distance, but it does not take the square root of the sum distances between the vectors: +Euclidean Squared Distance (also known as L2 Squared or L2^2^) is similar to Euclidean Distance. +However, it does not take the square root of the sum distances between the vectors: Euclidean Distance Formula:: @@ -169,7 +176,7 @@ For example: * Locating similar genomic and biological sequences in a dataset, such as related gene profiles. NOTE: Only Hyperscale Vector and Composite Vector indexes support this metric. -FTS Vector indexes do not support it. +Search Vector Indexes do not support it. [#dot] === Dot Product @@ -226,7 +233,7 @@ However, searching a flat index is inefficient. The search must compare every vector in the index to find matches. You should only use it for small data sets or for testing. -NOTE: FTS Vector indexes use a flat index when indexing datasets with 1000 or fewer vectors. +NOTE: Search Vector Indexes use a flat index when indexing datasets with 1000 or fewer vectors. Hyperscale Vector and Composite Vector indexes only support the next algorithm, IVF. [#IVF] @@ -256,7 +263,7 @@ This graph connects related centroids together so that a similarity search can q Hyperscale also adds other proprietary optimizations that allow it to scale to billions of vectors. -FTS Vector indexes automatically uses IVF when the indexing datasets larger than 1000 vectors. +Search Vector Indexes automatically uses IVF when the indexing datasets larger than 1000 vectors. [#quantization] == Quantization @@ -313,7 +320,7 @@ Use PQ quantization when: Hyoerscale Vector and Composite Vector indexes support PQ quantization. -FTS Vector indexes do not support it. +Search Vector Indexes do not support it. [#sq] === Scalar Quantization (SQ) @@ -359,11 +366,11 @@ All three types of vector indexes support SQ quantization. === Choosing a Quantization Method -You do not choose a quantization method for FTS vector indexes. +You do not choose a quantization method for Search Vector Indexes. Instead, they automatically choose whether to use quantization: -* FTS Vector indexes do not use quantization for datasets smaller than 10000 vectors. -* FTS Vector indexes automatically use 8-bit SQ quantization for datasets with 10000 vectors or larger. +* Search Vector Indexes do not use quantization for datasets smaller than 10000 vectors. +* Search Vector Indexes automatically use 8-bit SQ quantization for datasets with 10000 vectors or larger. When creating a Hyperscale Vector or Composite Index, you choose which quantization method to use when creating the index. When deciding, consider the following: diff --git a/modules/vector-index/partials/fts-vector-app-workflow.puml b/modules/vector-index/partials/fts-vector-app-workflow.puml index 2e5a3074d..800174fe4 100644 --- a/modules/vector-index/partials/fts-vector-app-workflow.puml +++ b/modules/vector-index/partials/fts-vector-app-workflow.puml @@ -15,7 +15,7 @@ sprite Couchbase d="m 82.1,57.6 c 0,2.9 -1.7,5.5 -5,6.1 -5.8,1 -17.9,1.6 -28.1,1.6 -10.2,0 -22.3,-0.7 -28.1,-1.6 -3.3,-0.6 -5,-3.2 -5,-6.1 V 38.4 c 0,-2.9 2.3,-5.7 5,-6.1 1.7,-0.3 5.6,-0.6 8.8,-0.6 1.2,0 2.2,0.9 2.2,2.3 V 47.3 C 37.8,47.3 43,47 49,47 c 6,0 11.2,0.3 17.2,0.3 V 34.1 c 0,-1.4 1,-2.3 2.2,-2.3 3.2,0 7.1,0.3 8.8,0.6 2.7,0.4 5,3.2 5,6.1 z M 49,0 C 21.9,0 0,21.9 0,49 0,76.1 21.9,98 49,98 76.1,98 98,76.1 98,49 98,21.9 76.1,0 49,0 Z" /> -'title: Application Workflow with FTS Vector Indexes +'title: Application Workflow with Search Vector Indexes skinparam defaultTextAlignment center From e1b06778addc90c6831d8c988864642382c8bf3d Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Tue, 16 Sep 2025 12:33:27 -0400 Subject: [PATCH 2/2] Fixing a typo spotted by Nischal --- modules/vector-index/pages/vectors-and-indexes-overview.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/vector-index/pages/vectors-and-indexes-overview.adoc b/modules/vector-index/pages/vectors-and-indexes-overview.adoc index d6c2e5a8b..eaeddd904 100644 --- a/modules/vector-index/pages/vectors-and-indexes-overview.adoc +++ b/modules/vector-index/pages/vectors-and-indexes-overview.adoc @@ -128,7 +128,7 @@ This method is useful if the distance between vectors represents a real-world va image::euclidean-distance-example.svg["Three-dimensional plot showing two vectors with points along each vector joined by dotted lines, indicating the summing of corresponding points."] -NOTE: When you select Euclidean Distance or L2 as the metric for a vector index, Couchbase {product-name} internally uses the <<#euclidean-squared>> metric (explained in the sext section) to perform vector comparisons. +NOTE: When you select Euclidean Distance or L2 as the metric for a vector index, Couchbase {product-name} internally uses the <<#euclidean-squared>> metric (explained in the next section) to perform vector comparisons. This approach improves performance because it avoids performing a computationally expensive square root operation. Vector searches using the Euclidean Squared metric return the same relevant vectors and ranking of results as Euclidean Distance. If your query materializes or projects the actual distance between vectors, Couchbase {product-name} calculates the actual Euclidean Distance.