## Introduction

In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Mistral AI](https://mistral.ai/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the Search Vector Index, please take a look at [this.](https://developer.couchbase.com/tutorial-mistralai-couchbase-vector-search-with-search-vector-index)

Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella‚Ñ¢](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises.

Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral's open source and commercial LLMs. 

The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via:

- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time
- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), empowers code generation tasks, including fill-in-the-middle and code completion
- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers
- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools
- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specialized models
- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object
- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models

This tutorial demonstrates how to use Mistral AI's embedding capabilities with Couchbase's **Hyperscale and Composite Vector Indexes** for optimized vector search operations. Hyperscale and Composite Vector Indexes provide superior performance for vector operations compared to traditional search methods, especially for large-scale applications.


## Before you start

## Get Credentials for Mistral AI

Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials.

## Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.

To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

**Note: Hyperscale and Composite Vector Indexes require Couchbase Server 8.0 or above, unlike Search Vector Index which works with 7.6+**

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.


## Install necessary libraries


In [1]:
%pip install couchbase==4.4.0 mistralai==1.9.10 langchain-couchbase==0.5.0 langchain-core==0.3.76 python-dotenv==1.1.1


Note: you may need to restart the kernel to use updated packages.


# Imports


In [2]:
from datetime import timedelta
from mistralai import Mistral
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy, IndexType
from langchain_core.embeddings import Embeddings
from typing import List
from dotenv import load_dotenv
import os
import time

## Prerequisites


In [3]:
import getpass

## Load environment variables from .env file if it exists
load_dotenv()

## Load from environment variables or prompt for input
CB_HOST = os.getenv('CB_HOST') or input("Cluster URL:")
CB_USERNAME = os.getenv('CB_USERNAME') or input("Couchbase username:")
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:")
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input("Couchbase bucket:")
SCOPE_NAME = os.getenv('SCOPE_NAME') or input("Couchbase scope:")
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input("Couchbase collection:")


Cluster URL: couchbases://cb.b96outuetbo8z5l9.cloud.couchbase.com
Couchbase username: Administrator
Couchbase password: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Couchbase bucket: color-vector-sample
Couchbase scope: color
Couchbase collection: rgb


## Couchbase Connection


In [6]:
auth = PasswordAuthenticator(
    CB_USERNAME,
    CB_PASSWORD
)

options = ClusterOptions(auth)
options.apply_profile("wan_development")

In [7]:
cluster = Cluster(CB_HOST, options)
cluster.wait_until_ready(timedelta(seconds=40))

bucket = cluster.bucket(CB_BUCKET_NAME)
scope = bucket.scope(SCOPE_NAME)
collection = scope.collection(COLLECTION_NAME)


## Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging


In [8]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        ## Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
        except Exception as e:
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)

        bucket_manager = bucket.collections()

        ## Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            bucket_manager.create_scope(scope_name)

        ## Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            bucket_manager.create_collection(scope_name, collection_name)

        ## Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        ## Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
        except Exception as e:
            print(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)


<couchbase.collection.Collection at 0x1118984a0>

## Creating Mistral AI Embeddings Wrapper

Since Mistral AI doesn't have native LangChain integration, we need to create a custom wrapper class that implements the LangChain Embeddings interface. This will allow us to use Mistral AI's embedding model with Couchbase's Hyperscale and Composite vector indexes.


In [10]:
class MistralAIEmbeddings(Embeddings):
    """Custom Mistral AI Embeddings wrapper for LangChain compatibility."""
    
    def __init__(self, api_key: str, model: str = "mistral-embed"):
        self.client = Mistral(api_key=api_key)
        self.model = model
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed search docs."""
        try:
            response = self.client.embeddings.create(
                model=self.model,
                inputs=texts,
            )
            return [embedding.embedding for embedding in response.data]
        except Exception as e:
            raise ValueError(f"Error generating embeddings: {str(e)}")
    
    def embed_query(self, text: str) -> List[float]:
        """Embed query text."""
        try:
            response = self.client.embeddings.create(
                model=self.model,
                inputs=[text],
            )
            return response.data[0].embedding
        except Exception as e:
            raise ValueError(f"Error generating query embedding: {str(e)}")


## Mistral Connection


In [11]:
MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY') or getpass.getpass("Mistral API Key:")
embeddings = MistralAIEmbeddings(api_key=MISTRAL_API_KEY, model="mistral-embed")
mistral_client = Mistral(api_key=MISTRAL_API_KEY)


Mistral API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


## Understanding Hyperscale and Composite Vector Indexes

### Optimizing Vector Search with Hyperscale and Composite Vector Indexes

With Couchbase 8.0+, you can leverage the power of Hyperscale and Composite Vector Indexes, which offer significant performance improvements over Search Vector Index approaches for vector-first workloads. These indexes provide high-performance vector similarity search with advanced filtering capabilities and are designed to scale to billions of vectors.

#### Hyperscale and Composite Vector Indexes vs Search Vector Indexes: Choosing the Right Approach

| Feature               | Hyperscale and Composite Vector Index                                               | Search Vector Index                       |
| --------------------- | --------------------------------------------------------------- | ----------------------------------------- |
| **Best For**          | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates      |
| **Couchbase Version** | 8.0.0+                                                         | 7.6+                                      |
| **Filtering**         | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (Hyperscale) | Pre-filtering with flexible ordering |
| **Scalability**       | Up to billions of vectors (Hyperscale)                              | Up to 10 million vectors                  |
| **Performance**       | Optimized for concurrent operations with low memory footprint  | Good for mixed text and vector queries   |


#### Vector Index Types

Couchbase offers two distinct query-based vector index types, each optimized for different use cases:

##### Hyperscale Vector Indexes 

- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search
- **Use when**: You primarily perform vector-only queries without complex scalar filtering
- **Features**: 
  - High performance with low memory footprint
  - Optimized for concurrent operations
  - Designed to scale to billions of vectors
  - Supports post-scan filtering for basic metadata filtering

##### Composite Vector Indexes

  - **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering
- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data
- **Features**: 
  - Efficient pre-filtering where scalar attributes reduce the vector comparison scope
  - Best for well-defined workloads requiring complex filtering using Composite Vector Index features
  - Supports range lookups combined with vector search

#### Index Type Selection for This Tutorial

In this tutorial, we'll demonstrate creating a **Hyperscale Vector Index** and running vector similarity queries. Hyperscale Vector Index is ideal for semantic search scenarios where you want:

1. **High-performance vector search** across large datasets
2. **Low latency** for real-time applications
3. **Scalability** to handle growing vector collections
4. **Concurrent operations** for multi-user environments

The Hyperscale Vector Index will provide optimal performance for our OpenAI embedding-based semantic search implementation.

#### Alternative: Composite Vector Index

If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead:

```python
## Alternative: Create a Composite index for filtered searches
vector_store.create_index(
    index_type=IndexType.COMPOSITE,
    index_description="IVF,SQ8",
    distance_metric=DistanceStrategy.COSINE,
    index_name="pydantic_composite_index",
)
```

**Use Composite indexes when:**
- You need to filter by document metadata or attributes before vector similarity
- Your queries combine vector search with WHERE clauses
- You have well-defined filtering requirements that can reduce the search space

**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments.

#### Understanding Index Configuration (Couchbase 8.0 Feature)

Before creating our Hyperscale index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization.

##### Index Description Format: `'IVF[<centroids>],{PQ|SQ}<settings>'`

##### Centroids (IVF - Inverted File)

- Controls how the dataset is subdivided for faster searches
- **More centroids** = faster search, slower training time
- **Fewer centroids** = slower search, faster training time
- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size

###### Quantization Options

**Scalar Quantization (SQ):**
- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension)
- Lower memory usage, faster search, slightly reduced accuracy

**Product Quantization (PQ):**
- Format: `PQ<subquantizers>x<bits>` (e.g., `PQ32x8`)
- Better compression for very large datasets
- More complex but can maintain accuracy with smaller index size

##### Common Configuration Examples

- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default)
- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization
- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings).

For more information on Hyperscale and Composite Vector Indexes, see [Couchbase Hyperscale and Composite Vector Indexes Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html).

##### Our Configuration Choice

In this tutorial, we use `IVF,SQ8` which provides:
- **Auto-selected centroids** optimized for our dataset size
- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy
- **COSINE distance metric** ideal for semantic similarity search
- **Optimal performance** for most semantic search use cases

# Setting Up Couchbase Hyperscale Vector Store

Instead of using FTS (Full-Text Search), we'll use Couchbase's Hyperscale Index for vector operations. Hyperscale Indexes provide better performance for vector search operations.


In [12]:
vector_store = CouchbaseQueryVectorStore(
    cluster=cluster,
    bucket_name=CB_BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    distance_metric=DistanceStrategy.COSINE
)

print("Hyperscale Vector Store created successfully!")


Hyperscale Vector Store created successfully!


# Embedding Documents

Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block:


In [13]:
texts = [
    "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.",
    "It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
    input("custom embedding text")
]

# Store documents in the Hyperscale Vector store
vector_store.add_texts(texts)

print("Documents added to Hyperscale Vector store successfully!")


custom embedding text Couchbase is commonly used to store and search embeddings for GenAI and retrieval augmented generation applications.


Documents added to Hyperscale Vector store successfully!


## Understanding Semantic Search in Couchbase

Semantic search goes beyond traditional keyword matching by understanding the meaning and context behind queries. Here's how it works in Couchbase:

## How Semantic Search Works

1. **Vector Embeddings**: Documents and queries are converted into high-dimensional vectors using an embeddings model (in our case, Mistral AI's mistral-embed)

2. **Similarity Calculation**: When a query is made, Couchbase compares the query vector against stored document vectors using the COSINE distance metric

3. **Result Ranking**: Documents are ranked by their vector distance (lower distance = more similar meaning)

4. **Flexible Configuration**: Different distance metrics (cosine, euclidean, dot product) and embedding models can be used based on your needs

The `similarity_search_with_score` method performs this entire process, returning documents along with their similarity scores. This enables you to find semantically related content even when exact keywords don't match.

Now let's see semantic search in action and measure its performance with different optimization strategies.


## Vector Search Performance Optimization

Now let's measure and compare the performance benefits of different optimization strategies. We'll conduct a comprehensive performance analysis across two phases:

## Performance Testing Phases

1. **Phase 1 - Baseline Performance**: Test vector search without Hyperscale & Composite Vector indexes to establish baseline metrics

2. **Phase 2 - Hyperscale & Composite Vector Index-Optimized Search**: Create Hyperscale index and measure performance improvements

**Important Context:**

- Hyperscale & Composite Vector Index performance benefits scale with dataset size and concurrent load
- With our dataset (~3 documents), improvements may be modest
- Production environments with millions of vectors show significant Hyperscale and Composite Vector Index advantages
- The combination of Hyperscale and Composite Vector Index + embeddings provides optimal semantic search performance


## Phase 1: Baseline Performance (Without Hyperscale Vector Index or Composite Vector Index)

First, let's test the search performance without Hyperscale Vector index or Composite Vector Index. This will help us establish a baseline for comparison.


In [14]:
import logging

## Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Phase 1: Baseline Performance (Without Hyperscale or Composite Vector Index)
print("="*80)
print("PHASE 1: BASELINE PERFORMANCE (Without Hyperscale or Composite Vector Index)")
print("="*80)

query = "name a multipurpose database with distributed capability"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=3)
    baseline_time = time.time() - start_time

    logging.info(f"Baseline search completed in {baseline_time:.2f} seconds")

    # Display search results
    print(f"\nBaseline Search Results (completed in {baseline_time:.4f} seconds):")
    print("-" * 80)
    for i, (doc, distance) in enumerate(search_results, 1):
        print(f"[Result {i}] Vector Distance: {distance:.4f}")
        # Truncate for readability
        content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content
        print(f"Text: {content_preview}")
        print("-" * 80)

except Exception as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")


PHASE 1: BASELINE PERFORMANCE (Without Hyperscale or Composite Vector Index)


2026-02-02 12:17:21,919 - INFO - HTTP Request: POST https://api.mistral.ai/v1/embeddings "HTTP/1.1 200 OK"
2026-02-02 12:17:22,607 - INFO - Baseline search completed in 0.98 seconds



Baseline Search Results (completed in 0.9779 seconds):
--------------------------------------------------------------------------------
[Result 1] Vector Distance: 0.2870
Text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON'...
--------------------------------------------------------------------------------
[Result 2] Vector Distance: 0.3484
Text: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
--------------------------------------------------------------------------------
[Result 3] Vector Distance: 0.3865
Text: Couchbase is commonly used to store and search embeddings for GenAI and retrieval augmented generation applications.
--------------------------------------------------------------------------------


# Phase 2: Hyperscale Vector Index-Optimized Performance 

Now let's create a Hyperscale index and measure the performance improvements when searching.


In [17]:
## Create a Hyperscale index for optimal vector search performance
print("\nCreating Hyperscale index for index optimization...")
vector_store.create_index(
    index_type=IndexType.BHIVE, 
    index_name="mistral_hyperscale_index_optimized",
    index_description="IVF,SQ8"
)
print("Hyperscale index created successfully!")



Creating Hyperscale index for index optimization...


2026-02-02 12:18:39,480 - INFO - HTTP Request: POST https://api.mistral.ai/v1/embeddings "HTTP/1.1 200 OK"


Hyperscale index created successfully!


Note: To create a COMPOSITE index, the below code can be used.
Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but Hyperscale might be more efficient for pure semantic search across news articles.

vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="pydantic_ai_composite_index", index_description="IVF,SQ8")

In [18]:
# Phase 2: Hyperscale Index-Optimized Performance
print("\n" + "="*80)
print("PHASE 2: Hyperscale Index-OPTIMIZED PERFORMANCE ")
print("="*80)

query = "name a multipurpose database with distributed capability"

try:
    # Perform the semantic search with Hyperscale Vector Index
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=3)
    hyperscale_time = time.time() - start_time

    logging.info(f"Hyperscale Vector index-optimized search completed in {hyperscale_time:.2f} seconds")

    # Display search results
    print(f"\nHyperscale Vector index-Optimized Search Results (completed in {hyperscale_time:.4f} seconds):")
    print("-" * 80)
    for i, (doc, distance) in enumerate(search_results, 1):
        print(f"[Result {i}] Vector Distance: {distance:.4f}")
        # Truncate for readability
        content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content
        print(f"Text: {content_preview}")
        print("-" * 80)

except Exception as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")



PHASE 2: Hyperscale Index-OPTIMIZED PERFORMANCE 


2026-02-02 12:18:45,737 - INFO - HTTP Request: POST https://api.mistral.ai/v1/embeddings "HTTP/1.1 200 OK"
2026-02-02 12:18:46,873 - INFO - Hyperscale Vector index-optimized search completed in 1.51 seconds



Hyperscale Vector index-Optimized Search Results (completed in 1.5112 seconds):
--------------------------------------------------------------------------------
[Result 1] Vector Distance: 0.2870
Text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON'...
--------------------------------------------------------------------------------
[Result 2] Vector Distance: 0.3484
Text: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
--------------------------------------------------------------------------------
[Result 3] Vector Distance: 0.3865
Text: Couchbase is commonly used to store and search embeddings for GenAI and retrieval augmented generation applications.
--------------------------------------------------------------------------------


# Performance Summary

Let's analyze the performance improvements achieved through Hyperscale Vector Index optimization.


In [19]:
print("\n" + "="*80)
print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY")
print("="*80)

print(f"\nüìä Performance Comparison:")
print(f"{'Optimization Level':<35} {'Time (seconds)':<20} {'Status'}")
print("-" * 80)
print(f"{'Phase 1 - Baseline (No Index)':<35} {baseline_time:.4f}{'':16} ‚ö™ Baseline")
print(f"{'Phase 2 - Hyperscale & Composite Vector Index-Optimized (Hyperscale)':<35} {hyperscale_time:.4f}{'':16} ‚úÖ Optimized")

# Calculate improvement
if baseline_time > hyperscale_time:
    speedup = baseline_time / hyperscale_time
    improvement = ((baseline_time - hyperscale_time) / baseline_time) * 100
    print(f"\n‚ú® Index Performance Gain: {speedup:.2f}x faster ({improvement:.1f}% improvement)")
elif hyperscale_time > baseline_time:
    slowdown_pct = ((hyperscale_time - baseline_time) / baseline_time) * 100
    print(f"\n‚ö†Ô∏è  Note: Hyperscale Vector Index was {slowdown_pct:.1f}% slower than baseline in this run")
    print(f"   This can happen with small datasets. Hyperscale Vector Index benefits emerge with scale.")
else:
    print(f"\n‚öñÔ∏è  Performance: Comparable to baseline")

print("\n" + "-"*80)
print("KEY INSIGHTS:")
print("-"*80)
print("1. üöÄ Hyperscale & Composite Vector Index Optimization:")
print("   ‚Ä¢ Hyperscale vector indexes excel with large-scale datasets (millions+ vectors)")
print("   ‚Ä¢ Performance gains increase with dataset size and concurrent queries")
print("   ‚Ä¢ Optimal for production workloads with sustained traffic patterns")

print("\n2. üì¶ Dataset Size Impact:")
print(f"   ‚Ä¢ Current dataset: ~3 sample documents")
print("   ‚Ä¢ At this scale, performance differences may be minimal or variable")
print("   ‚Ä¢ Significant gains typically seen with 10M+ vectors")

print("\n3. üéØ When to Use Hyperscale & Composite Vector Index:")
print("   ‚Ä¢ Large-scale vector search applications")
print("   ‚Ä¢ High query-per-second (QPS) requirements")
print("   ‚Ä¢ Multi-user concurrent access scenarios")
print("   ‚Ä¢ Production environments requiring scalability")

print("\n" + "="*80)



VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY

üìä Performance Comparison:
Optimization Level                  Time (seconds)       Status
--------------------------------------------------------------------------------
Phase 1 - Baseline (No Index)       0.9779                 ‚ö™ Baseline
Phase 2 - Hyperscale & Composite Vector Index-Optimized (Hyperscale) 1.5112                 ‚úÖ Optimized

‚ö†Ô∏è  Note: Hyperscale Vector Index was 54.5% slower than baseline in this run
   This can happen with small datasets. Hyperscale Vector Index benefits emerge with scale.

--------------------------------------------------------------------------------
KEY INSIGHTS:
--------------------------------------------------------------------------------
1. üöÄ Hyperscale & Composite Vector Index Optimization:
   ‚Ä¢ Hyperscale vector indexes excel with large-scale datasets (millions+ vectors)
   ‚Ä¢ Performance gains increase with dataset size and concurrent queries
   ‚Ä¢ Optimal for production 

# Conclusion

This tutorial demonstrated how to use Mistral AI's embedding capabilities with Couchbase's Hyperscale and Composite Vector search, including comprehensive performance analysis. The combination of Mistral AI's embeddings and Couchbase's Hyperscale and Composite Vector Index provides a powerful, scalable foundation for building intelligent search applications.
