# Index
- [Vector Search: Understanding the True Meaning of Queries](#vector-search-understanding-the-true-meaning-of-queries)

- [Indexing and Search](#indexing-and-search)
    - [Measuring Vector Distance (Similarity Metrics)](#1-measuring-vector-distance-similarity-metrics)
    - [Fast and Scalable Vector Search](#2-fast-and-scalable-vector-search)
    - [Vertex AI Vector Search (formerly Matching Engine)](#vertex-ai-vector-search-formerly-matching-engine)

- [The Problem: AI Hallucination 😵‍💫](#the-problem-ai-hallucination-)
    - [What Causes AI Hallucinations?](#what-causes-ai-hallucinations)
    - [Traditional Solutions and Their Limitations](#traditional-solutions-and-their-limitations)
    - [The RAG Solution: An Open-Book Exam for AI 📖](#the-rag-solution-an-open-book-exam-for-ai-)
    - [How RAG Works with Vector Search](#how-rag-works-with-vector-search)

- [The Challenge: Beyond Semantic Search 🤔](#the-challenge-beyond-semantic-search-)
    - [What is Hybrid Search? 🤝](#what-is-hybrid-search-)
    - [How Hybrid Search Works ⚙️](#how-hybrid-search-works-)
    - [Implementation with Vertex AI Vector Search 🛠️](#implementation-with-vertex-ai-vector-search-)
    - [Example Result](#example-result)

# Vector Search: Understanding the True Meaning of Queries

## What is Vector Search?

Vector Search is a technology that enables search systems to understand the **semantic meaning of queries**, rather than just matching keywords. It focuses on **semantic similarity**, allowing systems to find results that are conceptually related, even if exact terms aren't used.

- Traditional **keyword search** excels at matching explicit terms but lacks the ability to understand context or underlying meaning. For example, it might find "summer tops" but miss related items like "swimming suits" or fail to infer intent, such as "attire for a beach party."

- Vector Search (with its focus on semantic search) provides a crucial solution by transforming data into **meaningful embeddings**, enabling a deeper understanding of user intent. 

## Benefits of Vector Search

1.  **Semantic Understanding:**
      * Finds results similar in meaning to a query, even without exact keyword matches.
      * Highly effective for natural language queries where precise or technical language may not be used.
2.  **Multimodal Search Capabilities:**
      * Can be applied to various data types, including text, images, and audio.
      * Enables applications where users can search using multiple data types (e.g., image search, voice search).
3.  **Personalization and Recommendation:**
      * Leverages context understanding to personalize search results and recommendations.
      * Helps users discover more relevant and interesting information.
4.  **Generative AI Integration:**
      * A critical component in generative AI applications for fast and efficient information retrieval.
      * Becoming a foundational element in AI and machine learning services.

Vector Search is transforming how people engage with information, leading to more relevant, efficient, and personalized search experiences as data volumes and user expectations grow.

## How Vector Search Works

Vector Search involves a three-step process:

![](docs/images/vector-search-steps.png)

1.  **Encode Data into Vectors:** AI models, known as **Embedding Models**, convert various data types (text, images, audio) into numerical representations called **vectors**. These vectors capture the semantic meaning of the data.
2.  **Create an Index:** An index is built from these vectors to enable fast and scalable search across billions of items.
3.  **Search the Vector Space:** When a query is made, it is also encoded into a vector. This query vector is then used to efficiently search the indexed vector space for other vectors (data items) that are semantically similar.

### Detailed View: Development vs. Serving

  * **At Development Time (Building the System):**
      * Generate embeddings for all data.
      * Build and deploy the vector index.
  * **At Serving Time (Responding to Queries):**
      * Encode the user's query into a vector.
      * Search the vector space using the query vector.
      * Serve the most relevant results.

## Core Challenges

To implement Vector Search, two major challenges must be addressed:

1.  **How to Encode Data:** Converting diverse, multimodal data into representations that accurately capture semantic meanings. (Answer: **Embeddings**)
2.  **How to Index and Search Data:** Building an efficient search space that enables fast and scalable lookups. (Answer: **Vector Search Indexing**)

-----

## Further Reading & Resources

To deepen your understanding of Vector Search, Embeddings, and related technologies, explore these resources:

### General Concepts & Explanations:

  * **Google Cloud Blog - What is Vector Search?**
      * A good starting point for understanding the fundamentals and applications.
      * [Link to a Google Cloud "What is Vector Search" article](https://www.google.com/search?q=https://cloud.google.com/learn/what-is-vector-search) (You can search for the most recent official one)
  * **Pinecone Blog - What is Vector Search?**
      * Pinecone is a dedicated vector database company, and their blog often has excellent, in-depth explanations.
      * [Link to a Pinecone "What is Vector Search" article](https://www.google.com/search?q=https://www.pinecone.io/learn/vector-search/)

### Vector Databases & Open-Source Frameworks:

These are specialized databases and libraries designed to store and query vectors efficiently.

  * **Pinecone:**
      * A popular managed vector database service.
      * [Website: pinecone.io](https://www.pinecone.io/)
  * **Weaviate:**
      * An open-source vector database that also includes built-in search capabilities.
      * [Website: weaviate.io](https://weaviate.io/)
      * [GitHub: Weaviate](https://github.com/weaviate/weaviate)
  * **Qdrant:**
      * Another open-source vector similarity search engine.
      * [Website: qdrant.tech](https://qdrant.tech/)
      * [GitHub: Qdrant](https://github.com/qdrant/qdrant)
  * **Faiss (Facebook AI Similarity Search):**
      * A library for efficient similarity search and clustering of dense vectors. It's not a database, but a powerful library often used as a backend for vector search systems.
      * [GitHub: Faiss](https://github.com/facebookresearch/faiss)
  * **Chroma:**
      * An open-source embedding database for building AI applications.
      * [Website: trychroma.com](https://www.trychroma.com/)
      * [GitHub: Chroma](https://github.com/chroma-core/chroma)
  * **Elasticsearch (with Vector Search capabilities):**
      * While primarily a full-text search engine, recent versions of Elasticsearch (and OpenSearch) have added native vector search capabilities.
      * [Elasticsearch Vector Search documentation](https://www.elastic.co/what-is/vector-search)

### Embeddings & Models:

  * **Hugging Face Hub (Models):**
      * A vast repository of pre-trained models, including many for generating embeddings (e.g., Sentence-BERT, OpenAI's embedding models).
      * [Website: huggingface.co/models](https://huggingface.co/models)
  * **OpenAI Embeddings:**
      * OpenAI offers powerful embedding models accessible via API, widely used for various semantic tasks.
      * [OpenAI Embeddings Documentation](https://platform.openai.com/docs/guides/embeddings)
  * **Google's Universal Sentence Encoder:**
      * A model for encoding text into high-dimensional vectors.
      * [TensorFlow Hub: Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder/4)

---
---

# Indexing and Search

Following the generation of embeddings (covered previously), the next critical steps in Vector Search are **indexing** and **searching** these vector spaces efficiently. This involves addressing two primary challenges:

1.  How to measure the distance (similarity) between vectors (see *Similarity Metrics* into [00a-docs-theory-on-embeddings.ipynb](00a-docs-theory-on-embeddings.ipynb)).
2.  How to search vectors in a fast and scalable way.


## 2\. Fast and Scalable Vector Search

Once distances can be measured, the challenge shifts to finding similar vectors efficiently within potentially vast vector spaces (millions to billions of embeddings).

### Search Algorithms:

1.  **Brute Force Algorithm:**

      * **Process:**
          1.  Calculate distances from the query vector to *all* other vectors in the space.
          2.  Sort all distances.
          3.  Find the top `k` nearest vectors.
      * **Complexity:** `O(N*d)` where `N` is the number of vectors and `d` is the number of dimensions.
      * **Drawback:** Impractical and computationally bottlenecked for large datasets (`N` in the millions/billions).

2.  **Approximate Nearest Neighbor (ANN) Algorithms:**

      * **Concept:** Accelerate search by trading a small amount of accuracy for significant speed improvements. They avoid exhaustive search by intelligently pruning the search space.
      * **How it works (general idea):** Divides the search space into smaller partitions, indexes them using data structures (like trees or hashes), and then searches only the most relevant partitions.
      * **Example (TreeAh - shallow tree and asymmetric hashing):** A production-ready algorithm that uses tree structures for indexing.

### ScaNN: Scalable Approximate Nearest Neighbor

In 2020, Google Research introduced **ScaNN** (Scalable Approximate Nearest Neighbor), a leading ANN algorithm powering services like Google Search and YouTube's recommendation system. 

ScaNN achieves fast and scalable vector search by combining:

1.  **Reduced Search Space (Space Pruning):**
      * **Multilevel Tree Search:** The vector space is divided into hierarchical partitions. A search tree represents this structure, with nodes as centroids of partitions.
      * **Pruning:** During a query, the tree is traversed (from root to branches to leaves), and irrelevant partitions are pruned, focusing the search on the most relevant sub-partitions.
2.  **Compressed Vector Size (Data Quantization):**
      * **Technique:** Compresses data points to save space and reduce indexing time (e.g., reducing a 9-dimensional vector from 9 floats to 12 bits).
3.  **Increased Ranking Efficiency (Business Logic Integration):**
      * **Filtering:** Incorporates business logic to filter results based on specific criteria (e.g., "resorts in the United States," "red dresses") *before* or *during* similarity ranking, restricting the search to a relevant subset of the dataset.

## Vertex AI Vector Search (formerly Matching Engine)

Vertex AI Vector Search is a fully managed similarity vector search service provided by Google Cloud.

  * **Foundation:** Utilizes an advanced version of the ScaNN algorithm.
  * **Benefits:** Offers fast searching, low latencies, and scalability to billions of vectors, often at a lower cost compared to similar services.

-----

## Further Reading & Resources

### Approximate Nearest Neighbor (ANN) Algorithms & Vector Databases:

  * **ScaNN: Efficient Vector Similarity Search (Google AI Blog):**
      * The official announcement and explanation of ScaNN.
      * [Link: ScaNN: Efficient Vector Similarity Search](https://www.google.com/search?q=https://ai.googleblog.com/2020/07/scann-efficient-vector-similarity-search.html)
  * **The Missing Piece: An Introduction to Approximate Nearest Neighbor (ANN) Search (Pinecone Blog):**
      * A good overview of ANN concepts.
      * [Link: Introduction to ANN Search](https://www.google.com/search?q=https://www.pinecone.io/learn/approximate-nearest-neighbor/)
  * **Faiss (Facebook AI Similarity Search):**
      * An open-source library for efficient similarity search. Essential for understanding ANN implementations.
      * [GitHub: Faiss](https://github.com/facebookresearch/faiss)
  * **HNSW (Hierarchical Navigable Small Worlds):**
      * A popular graph-based ANN algorithm widely used in vector databases.
      * [Paper: Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs](https://arxiv.org/abs/1603.09320)
  * **Vertex AI Vector Search Documentation:**
      * Official Google Cloud documentation for their managed vector search service.
      * [Link: Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) (Look for the most current link for "Vector Search" if "Matching Engine" is outdated)

-----

## The Problem: AI Hallucination 😵‍💫

A significant challenge with AI models like chatbots is **hallucination**, a situation where the AI confidently delivers a completely inaccurate response. This occurs because Large Language Models (LLMs) have an understanding limited to their training data, which can become outdated or lack specific organizational knowledge. This "grounding problem" undermines user trust in AI systems.

---

### What Causes AI Hallucinations?

LLMs are prone to hallucination for several key reasons:

* **Limited Knowledge:** Their understanding is confined to their training data. They lack awareness of your company's internal data, specific industry knowledge, or real-time information.
* **Inability to Verify:** They cannot check the accuracy of their own training data.
* **Lack of Context:** They often assume user prompts are factually correct and are unable to ask for clarifying information.

---

### Traditional Solutions and Their Limitations

Several methods have been used to combat hallucinations, but each has drawbacks:

* **Fine-tuning:** This involves retraining an LLM with new, specific data. While effective, it is often costly and requires extensive data and computational resources.
* **Human Review:** Having humans verify AI responses increases accuracy but is expensive, time-consuming, and not always scalable enough to catch every error.
* **Prompt Engineering:** Carefully crafting prompts can help steer the AI toward more accurate answers, but its effectiveness is limited, especially at scale.

---

## The RAG Solution: An Open-Book Exam for AI 📖

A more effective solution is **Retrieval-Augmented Generation (RAG)**. RAG is an architecture that combines the strengths of retrieval technology (like Vector Search) and generative AI models (like LLMs).

* **Retrieval Models (Vector Search):** Excellent at finding specific, factual information from a large set of documents.
* **Generative Models (LLMs):** Excellent at generating coherent, fluent, and creative text.

RAG bridges the gap between these two. It effectively gives the LLM an **"open-book exam,"** allowing it to look up information from an external, up-to-date knowledge base *before* generating an answer. This grounds the AI's response in verifiable facts, reducing the likelihood of hallucination.

---

### How RAG Works with Vector Search

**Vector Search** is the key technology that powers the retrieval function in a RAG system. The process works as follows:

1.  **Encode and Index:** New, trustworthy information (e.g., company policies, product docs, real-time alerts) is encoded into vector embeddings and stored in a vector database for efficient searching.
2.  **Query:** A user's question is also converted into a vector embedding.
3.  **Search and Retrieve:** The system uses Vector Search to find the most semantically similar and relevant documents from the vector database based on the user's query embedding.
4.  **Augment and Generate:** The original question, along with the retrieved factual information, is passed to the LLM.
5.  **Grounded Response:** The LLM then generates a final answer that incorporates the fresh, verified information, resulting in a more reliable and trustworthy response.


![](docs/images/rag-pipeline.png)


This creates a **grounded agent**—an AI that can perform fact-checks against a trusted source of information.

---

## The Next Step: Hybrid Search

While semantic search is powerful for understanding context, it can sometimes struggle with retrieving specific, exact terms (like a new product SKU) that weren't in its original training data. To address this, the next evolution is **hybrid search**, which integrates the contextual understanding of semantic search with the precision of traditional keyword search to significantly enhance retrieval performance.

## The Challenge: Beyond Semantic Search 🤔

While **semantic search** is excellent at understanding the meaning and context of words, it can struggle with **out-of-domain information**—data the embedding model hasn't been trained on, such as a brand-new product name or a specific barcode. This is where **Hybrid Search** comes in.

---

## What is Hybrid Search? 🤝

**Hybrid Search** combines the strengths of two search methods to achieve a more comprehensive and precise search experience:

* **Semantic Search:** Handles nuanced, contextual queries by understanding meaning.
* **Keyword Search:** Accurately captures specific, literal terms, especially those that are out-of-domain.

By merging these two, you get the best of both worlds. A well-known example is **Google Search**, which integrated semantic search with its existing keyword algorithms to significantly improve search quality.

### The Old Way vs. The New Way

Previously, building a hybrid search engine was a difficult task, requiring the maintenance of two separate engines and a complex process to merge and re-rank their results. Modern platforms like **Vertex AI Vector Search** have simplified this process, allowing for the creation of a single, powerful search system.

---

## How Hybrid Search Works ⚙️

Hybrid search follows the familiar `encode -> index -> search` process, but it runs two parallel tracks that are later combined.

### 1. The Keyword Search Track (Token-based)

This track focuses on matching exact words or tokens.

* **Encoding (Creating Sparse Embeddings):**
    * Text is broken into tokens (words or sub-words).
    * Instead of simple one-hot encoding, this method often uses a weighting algorithm like **TF-IDF (Term Frequency-Inverse Document Frequency)**.
    * TF-IDF assesses a word's importance within a document relative to a whole collection of documents, emphasizing significant terms.
    * The result is a high-dimensional vector with mostly zero values, known as a **sparse embedding**.

* **Indexing & Searching:**
    * A vector space is created to organize these sparse embeddings. Texts with similar keyword distributions are placed near each other, enabling efficient keyword matching.

### 2. The Semantic Search Track

This track runs in parallel and focuses on meaning.

* **Encoding (Creating Dense Embeddings):**
    * As covered previously, an embedding model (like those available through the Vertex AI Embeddings API) converts text into a low-dimensional, meaningful vector called a **dense embedding**.

### 3. Combining and Re-ranking the Results

This is the final, crucial step where the results from both tracks are merged.

* **Reciprocal Rank Fusion (RRF):**
    * Instead of just mixing the two result lists, RRF is a sophisticated method that intelligently combines them.
    * It elevates items that rank highly in *any* of the individual lists.
    * An item that ranks very high in just one list (e.g., a perfect keyword match) or ranks consistently well across both lists will be prioritized in the final results.

---

## Implementation with Vertex AI Vector Search 🛠️

Modern APIs abstract away much of the complexity, making implementation straightforward.

1.  **Generate Embeddings:**
    * **Sparse Embeddings:** Use a vectorizer library (like `scikit-learn`'s TF-IDF vectorizer) to convert your text data into sparse embeddings for keyword search.
    * **Dense Embeddings:** Use a service like the Vertex AI Embeddings API to generate dense embeddings for semantic search.

2.  **Store and Index:**
    * Combine both the dense and sparse embeddings for each data point into a single record (e.g., in a JSON file).
    * Use this file to create a single hybrid vector index in Vertex AI Vector Search.

3.  **Query the Index:**
    * When performing a search, create a **hybrid query object** that contains both the sparse embedding (for keywords) and the dense embedding (for semantics) of your search query.
    * The system executes the query, leveraging both embedding types and using RRF to fuse the results.

### Example Result

A hybrid search for "kids sunglasses" might return:

* **Top Result:** "Google Blue Kids Sunglasses" (high similarity for both dense and sparse embeddings).
* **Middle Result:** "Google White Classic Youth Tee" (lower rank because it doesn't contain the keyword "kids," but "youth tee" is semantically similar enough to be included).

This demonstrates how hybrid search enables the rapid finding of similar items based on both literal keywords and conceptual meaning.