<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Create and Configure a Vector Database to Store Document Embeddings**


Estimated time needed: **30** minutes


## Overview


Imagine you are working in a customer support center that receives a high volume of inquiries and tickets every day. Your task is to create a system that can quickly provide support agents with the most relevant information to resolve customer issues. Traditional methods of searching through FAQs or support documents can be slow and inefficient, leading to delayed responses and dissatisfied customers.

To address this challenge, you will use embedding models to convert support documents and past inquiry responses into numerical vectors that capture their semantic content. These vectors will be stored in a vector database, enabling fast and accurate similarity searches. For example, when a support agent receives a new inquiry about a product issue, the system can instantly retrieve similar past inquiries and their resolutions, helping the agent to provide a quicker and more accurate response.


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/veZYoygp9GqZrIw5f6SD0g/vector%20db.png" width="50%" alt="vector db"/>


In this lab, you will learn how to use vector databases to store embeddings generated from textual data using LangChain. The focus will be on two popular vector databases: Chroma DB and FAISS (Facebook AI Similarity Search). You will also learn how to perform similarity searches in these databases based on a query, enabling efficient retrieval of relevant information. By the end of this lab, you will be able to effectively use vector databases to store and query embeddings, enhancing your data analysis and retrieval capabilities.


## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-required-libraries">Installing required libraries</a></li>
            <li><a href="#Load-text">Load text</a></li>
            <li><a href="#Split-data">Split data</a></li>
            <li><a href="#Embedding model">Embedding model</a></li>
        </ol>
    </li>
    <li>
        <a href="#Vector-store">Vector store</a>
        <ol>
            <li><a href="#Chroma-DB">Chroma DB</a></li>
            <li><a href="#FIASS-DB">FIASS DB</a></li>
            <li><a href="#Managing-vector-store:-adding,-updating,-and-deleting-entries">Managing vector store: adding, updating, and deleting entries</a></li>
        </ol>
    </li>
</ol>

<a href="#Exercises">Exercises</a>
<ol>
    <li><a href="#Exercise-1---Use-another-query-to-conduct-similarity-search.">Exercise 1. Use another query to conduct similarity search.</a></li>
</ol>


## Objectives

After completing this lab you will be able to:

- Prepare and preprocess documents for embeddings.
- Generate embeddings using watsonx.ai's embedding model.
- Store these embeddings in Chroma DB and FAISS.
- Perform similarity searches to retrieve relevant documents based on new inquiries.


----


## Setup


For this lab, you will use the following libraries:

* [`ibm-watson-ai`](https://ibm.github.io/watsonx-ai-python-sdk/) for using LLMs from IBM's watsonx.ai.
* [`langchain`, `langchain-ibm`, `langchain-community`](https://www.langchain.com/) for using relevant features from Langchain.
* [`chromadb`](https://www.trychroma.com/) is a open-source vector database used to store embeddings.
* [`faiss-cpu`](https://pypi.org/project/faiss-cpu/) is used to support the using of FAISS vector database.


### Installing required libraries

The following required libraries are __not__ preinstalled in the Skills Network Labs environment. __You must run the following cell__ to install them:

**Note:** The version is being pinned here to specify the version. It's recommended that you do this as well. Even if the library is updated in the future, the installed library could still support this lab work.

This might take approximately 1-2 minutes.

As `%%capture` is used to capture the installation, you won't see the output process. After the installation is completed, you will see a number beside the cell.


In [3]:
!pip install -r requirements.txt

Collecting aiohttp==3.12.13 (from -r requirements.txt (line 9))
  Downloading aiohttp-3.12.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.6 kB)
Collecting fastapi==0.115.13 (from -r requirements.txt (line 62))
  Downloading fastapi-0.115.13-py3-none-any.whl.metadata (27 kB)
Collecting fsspec==2025.5.1 (from -r requirements.txt (line 72))
  Downloading fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Collecting google-auth==2.40.3 (from -r requirements.txt (line 74))
  Downloading google_auth-2.40.3-py2.py3-none-any.whl.metadata (6.2 kB)
Collecting hf-xet==1.1.5 (from -r requirements.txt (line 88))
  Downloading hf_xet-1.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (879 bytes)
Collecting multidict==6.5.0 (from -r requirements.txt (line 167))
  Downloading multidict-6.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting oauthlib==3.3.1 (from -r requirements.txt (line 182))
  Downloading oauthlib-3

In [1]:
xxx - skip
#%%capture
!pip install --user "ibm-watsonx-ai==1.0.4"
!pip install  --user "langchain==0.2.1"
!pip install  --user "langchain-ibm==0.1.7"
!pip install  --user "langchain-community==0.2.1"
!pip install --user "chromadb==0.4.24"
!pip install  --user "faiss-cpu==1.8.0"
print("Installation completed.")

Collecting ibm-watsonx-ai==1.0.4
  Downloading ibm_watsonx_ai-1.0.4-py3-none-any.whl.metadata (5.7 kB)
Collecting pandas<2.2.0,>=0.24.2 (from ibm-watsonx-ai==1.0.4)
  Downloading pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting ibm-cos-sdk<2.14.0,>=2.12.0 (from ibm-watsonx-ai==1.0.4)
  Downloading ibm-cos-sdk-2.13.6.tar.gz (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.6/58.6 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ibm-cos-sdk-core==2.13.6 (from ibm-cos-sdk<2.14.0,>=2.12.0->ibm-watsonx-ai==1.0.4)
  Downloading ibm-cos-sdk-core-2.13.6.tar.gz (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ibm-cos-sdk-s3transfer==2.13.6 (from ibm-cos-sdk<2.14.0,>=2.12.0->ibm-watsonx-ai==1.0.4)
  Downloadi

After you install the libraries, restart your kernel. You can do that by clicking the **Restart the kernel** icon.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/build-a-hotdog-not-hotdog-classifier-guided-project/images/Restarting_the_Kernel.png" width="50%" alt="Restart kernel">


-----


The following steps are prerequisite tasks for conducting this project's topic - vector store. These steps include:

- Loading the source document.
- Splitting the document into chunks.
- Building an embedding model.
  
The details of these steps have been introduced in previous lessons.


### Load text


A text file has been prepared as the source document for the downstream vector database task.

Now, let's download and load it using LangChain's `TextLoader`.


In [1]:
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/BYlUHaillwM8EUItaIytHQ/companypolicies.txt"

--2025-06-21 11:35:07--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/BYlUHaillwM8EUItaIytHQ/companypolicies.txt
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.45.118.108
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.45.118.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15660 (15K) [text/plain]
Saving to: ‘companypolicies.txt’


2025-06-21 11:35:08 (88.5 MB/s) - ‘companypolicies.txt’ saved [15660/15660]



In [2]:
from langchain_community.document_loaders import TextLoader

In [3]:
loader = TextLoader("companypolicies.txt")
data = loader.load()

You can have a look at this document.


In [4]:
data



### Split data


The next step is to split the document using LangChain's text splitter. Here, you will use the `RecursiveCharacterTextSplitter, which is well-suited for this generic text. The following parameters have been set:

- `chunk_size = 100`
- `chunk_overlap = 20`
- `length_function = len`


In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [6]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len,
)

In [7]:
chunks = text_splitter.split_documents(data)

Let's take a look at how many chunks you get.


In [8]:
len(chunks)

215

So, in total, you get 215 chunks.


### Embedding model


The following code demonstrates how to build an embedding model using the `watsonx.ai` package.

For this project, the `ibm/slate-125m-english-rtrvr` embedding model will be used.


In [None]:
xxx - skip
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from langchain_ibm import WatsonxEmbeddings

In [None]:
xxx - skip
embed_params = {
    EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
    EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="skills-network",
    params=embed_params,
)

In [11]:
import os
from getpass import getpass
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from langchain_ibm import WatsonxEmbeddings

try:
    from google.colab import userdata
#    watsonx_api_key = userdata.get('WATSONX_APIKEY')
 # In Google Colab, use the built-in secrets feature
    watsonx_api_key = userdata.get('IBM_API_KEY')
    ibm_project_id = userdata.get('IBM_PROJECT_ID')
except:
    watsonx_api_key = getpass("Enter your WATSONX_APIKEY: ")

# Configure embedding parameters
embed_params = {
    EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
    EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

print(watsonx_api_key)

# Create WatsonxEmbeddings with API key
watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id=ibm_project_id,
    apikey=watsonx_api_key,  # Add the API key here
    params=embed_params,
)

print("✓ WatsonxEmbeddings configured successfully")

am7HHaQuCo5s3jC39twjFp5U2VihaRi_fKpPc2UbH_8-
✓ WatsonxEmbeddings configured successfully


The embedding model is formed into the `watsonx_embedding` object.


## Vector store


In this section, you will be guided on how to use two commonly used vector databases: Chroma DB and FAISS DB. You will also see how to perform a similarity search based on an input query using these databases.


### Chroma DB


#### Build the database


First, you need to import `Chroma` from Langchain vector stores.


In [12]:
from langchain.vectorstores import Chroma

Next, you need to create an ID list that will be used to assign each chunk a unique identifier, allowing you to track them later in the vector database. The length of this list should match the length of the chunks.

Note: The IDs should be in string format.


In [14]:
ids = [str(i) for i in range(0, len(chunks))]
print(ids)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '15

The next step is to use the embedding model to create embeddings for each chunk and then store them in the Chroma database.

The following code demonstrates how to do this.


In [15]:
# skip as we must rate limit
#vectordb = Chroma.from_documents(chunks, watsonx_embedding, ids=ids)

Status code: 429, body: {"errors":[{"code":"rate_limit_reached_requests","message":"Rate limit of 2 requests per 1s was reached for instance id 491921c2-e8d0-47b4-84f9-fc00517474e3 (user RedHat-7121551, plan lite)","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-embeddings"}],"trace":"301dec410afc512a9f6b44536e080152","status_code":429}
Status code: 429, body: {"errors":[{"code":"rate_limit_reached_requests","message":"Rate limit of 2 requests per 1s was reached for instance id 491921c2-e8d0-47b4-84f9-fc00517474e3 (user RedHat-7121551, plan lite)","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-embeddings"}],"trace":"3531475570c2f8efdf2a2254af58273d","status_code":429}


ApiRequestFailure: Failure during generate. (POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10)
Status code: 429, body: {"errors":[{"code":"rate_limit_reached_requests","message":"Rate limit of 2 requests per 1s was reached for instance id 491921c2-e8d0-47b4-84f9-fc00517474e3 (user RedHat-7121551, plan lite)","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-embeddings"}],"trace":"301dec410afc512a9f6b44536e080152","status_code":429}

In [16]:
import time
from typing import List

class RateLimitedEmbeddings:
    def __init__(self, embedding_model, requests_per_second=1.5, batch_size=10):
        """
        Wrapper around your embedding model that respects rate limits

        Args:
            embedding_model: Your watsonx embedding model
            requests_per_second: Maximum requests per second (set below limit)
            batch_size: Number of texts to embed in each request
        """
        self.embedding_model = embedding_model
        self.delay = 1.0 / requests_per_second  # Delay between requests
        self.batch_size = batch_size
        self.last_request_time = 0

    def _wait_if_needed(self):
        """Wait if we need to respect rate limits"""
        current_time = time.time()
        time_since_last = current_time - self.last_request_time

        if time_since_last < self.delay:
            sleep_time = self.delay - time_since_last
            print(f"⏳ Rate limiting: waiting {sleep_time:.2f}s...")
            time.sleep(sleep_time)

        self.last_request_time = time.time()

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed documents with rate limiting and batching"""
        if not texts:
            return []

        all_embeddings = []
        total_batches = (len(texts) + self.batch_size - 1) // self.batch_size

        print(f"📊 Processing {len(texts)} texts in {total_batches} batches (batch size: {self.batch_size})")

        for i in range(0, len(texts), self.batch_size):
            batch = texts[i:i + self.batch_size]
            batch_num = i // self.batch_size + 1

            print(f"🔄 Processing batch {batch_num}/{total_batches} ({len(batch)} texts)...")

            # Wait if needed to respect rate limits
            self._wait_if_needed()

            try:
                # Make the actual embedding request
                batch_embeddings = self.embedding_model.embed_documents(batch)
                all_embeddings.extend(batch_embeddings)
                print(f"✅ Batch {batch_num} completed")

            except Exception as e:
                if "rate_limit" in str(e).lower() or "429" in str(e):
                    print(f"⚠️ Rate limit hit in batch {batch_num}, waiting longer...")
                    time.sleep(2.0)  # Wait 2 seconds on rate limit
                    # Retry the batch
                    batch_embeddings = self.embedding_model.embed_documents(batch)
                    all_embeddings.extend(batch_embeddings)
                    print(f"✅ Batch {batch_num} completed after retry")
                else:
                    raise e

        print(f"🎉 All {len(texts)} texts embedded successfully!")
        return all_embeddings

    def embed_query(self, text: str) -> List[float]:
        """Embed a single query with rate limiting"""
        self._wait_if_needed()
        return self.embedding_model.embed_query(text)

# Wrap your existing embedding model
rate_limited_embedding = RateLimitedEmbeddings(
    embedding_model=watsonx_embedding,
    requests_per_second=1.5,  # Stay below 2 req/s limit
    batch_size=10  # Embed 10 texts per request
)

# Now use the rate-limited version
print("Creating vector database with rate limiting...")
vectordb = Chroma.from_documents(
    chunks,
    rate_limited_embedding,  # Use the rate-limited wrapper
    ids=ids
)

Creating vector database with rate limiting...
📊 Processing 215 texts in 22 batches (batch size: 10)
🔄 Processing batch 1/22 (10 texts)...
✅ Batch 1 completed
🔄 Processing batch 2/22 (10 texts)...
⏳ Rate limiting: waiting 0.03s...
✅ Batch 2 completed
🔄 Processing batch 3/22 (10 texts)...
⏳ Rate limiting: waiting 0.20s...
✅ Batch 3 completed
🔄 Processing batch 4/22 (10 texts)...
⏳ Rate limiting: waiting 0.18s...
✅ Batch 4 completed
🔄 Processing batch 5/22 (10 texts)...
⏳ Rate limiting: waiting 0.23s...
✅ Batch 5 completed
🔄 Processing batch 6/22 (10 texts)...
⏳ Rate limiting: waiting 0.26s...
✅ Batch 6 completed
🔄 Processing batch 7/22 (10 texts)...
⏳ Rate limiting: waiting 0.22s...
✅ Batch 7 completed
🔄 Processing batch 8/22 (10 texts)...
⏳ Rate limiting: waiting 0.25s...
✅ Batch 8 completed
🔄 Processing batch 9/22 (10 texts)...
⏳ Rate limiting: waiting 0.27s...
✅ Batch 9 completed
🔄 Processing batch 10/22 (10 texts)...
⏳ Rate limiting: waiting 0.18s...
✅ Batch 10 completed
🔄 Processin

Now that you have built the vector store named `vectordb`, you can use the method `.collection.get()` to print some of the chunks indexed by their IDs.

Note: Although the chunks are stored in the database in embedding format, when you retrieve and print them by their IDs, the database will return the chunk text information instead of the embedding vectors.


In [17]:
for i in range(3):
    print(vectordb._collection.get(ids=str(i)))

{'ids': ['0'], 'embeddings': None, 'metadatas': [{'source': 'companypolicies.txt'}], 'documents': ['1.\tCode of Conduct'], 'uris': None, 'data': None}
{'ids': ['1'], 'embeddings': None, 'metadatas': [{'source': 'companypolicies.txt'}], 'documents': ['Our Code of Conduct outlines the fundamental principles and ethical standards that guide every'], 'uris': None, 'data': None}
{'ids': ['2'], 'embeddings': None, 'metadatas': [{'source': 'companypolicies.txt'}], 'documents': ['that guide every member of our organization. We are committed to maintaining a workplace that is'], 'uris': None, 'data': None}


You can also use the method `._collection.count()` to see the length of the vector database, which should be the same as the length of chunks.


In [18]:
vectordb._collection.count()

215

#### Similarity search


Similarity search in a vector database involves finding items that are most similar to a given query item based on their vector representations.

In this process, data objects are converted into vectors (which you've already done), and the search algorithm identifies and retrieves those with the closest vector distances to the query, enabling efficient and accurate identification of similar items in large datasets.


LangChain supports similarity search in vector stores using the method `.similarity_search()`.

The following is an example of how to perform a similarity search based on the query "Email policy."

By default, it will return the top four closest vectors to the query.


In [19]:
query = "Email policy"
docs = vectordb.similarity_search(query)
docs

[Document(metadata={'source': 'companypolicies.txt'}, page_content='internet and email usage, including those related to copyright and data protection.'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='to this policy. Non-compliance may lead to appropriate disciplinary action, which could include'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='This policy serves as a framework for handling discipline and termination. The organization'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='Policy Purpose: The Smoking Policy has been established to provide clear guidance and expectations')]

You can specify `k = 1` to just retrieve the top one result.


In [20]:
vectordb.similarity_search(query, k = 1)

[Document(metadata={'source': 'companypolicies.txt'}, page_content='internet and email usage, including those related to copyright and data protection.')]

### FIASS DB


FIASS is another vector database that is supported by LangChain.

The process of building and using FAISS is similar to Chroma DB.

However, there may be differences in the retrieval results between FAISS and Chroma DB.


#### Build the database


Build the database and store the embeddings to the database here.


In [21]:
#from langchain_community.vectorstores import FAISS

In [24]:
# - rate limited
#from langchain_community.vectorstores import FAISS
#faissdb = FAISS.from_documents(chunks, watsonx_embedding, ids=ids)

In [23]:
import time
from typing import List
from langchain_community.vectorstores import FAISS

class RateLimitedEmbeddings:
    def __init__(self, embedding_model, requests_per_second=1.5, batch_size=10):
        """
        Wrapper around your embedding model that respects rate limits

        Args:
            embedding_model: Your watsonx embedding model
            requests_per_second: Maximum requests per second (set below limit)
            batch_size: Number of texts to embed in each request
        """
        self.embedding_model = embedding_model
        self.delay = 1.0 / requests_per_second  # Delay between requests
        self.batch_size = batch_size
        self.last_request_time = 0

    def _wait_if_needed(self):
        """Wait if we need to respect rate limits"""
        current_time = time.time()
        time_since_last = current_time - self.last_request_time

        if time_since_last < self.delay:
            sleep_time = self.delay - time_since_last
            print(f"⏳ Rate limiting: waiting {sleep_time:.2f}s...")
            time.sleep(sleep_time)

        self.last_request_time = time.time()

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed documents with rate limiting and batching"""
        if not texts:
            return []

        all_embeddings = []
        total_batches = (len(texts) + self.batch_size - 1) // self.batch_size

        print(f"📊 Processing {len(texts)} texts in {total_batches} batches (batch size: {self.batch_size})")

        for i in range(0, len(texts), self.batch_size):
            batch = texts[i:i + self.batch_size]
            batch_num = i // self.batch_size + 1

            print(f"🔄 Processing batch {batch_num}/{total_batches} ({len(batch)} texts)...")

            # Wait if needed to respect rate limits
            self._wait_if_needed()

            try:
                # Make the actual embedding request
                batch_embeddings = self.embedding_model.embed_documents(batch)
                all_embeddings.extend(batch_embeddings)
                print(f"✅ Batch {batch_num} completed")

            except Exception as e:
                if "rate_limit" in str(e).lower() or "429" in str(e):
                    print(f"⚠️ Rate limit hit in batch {batch_num}, waiting longer...")
                    time.sleep(2.0)  # Wait 2 seconds on rate limit
                    # Retry the batch
                    batch_embeddings = self.embedding_model.embed_documents(batch)
                    all_embeddings.extend(batch_embeddings)
                    print(f"✅ Batch {batch_num} completed after retry")
                else:
                    raise e

        print(f"🎉 All {len(texts)} texts embedded successfully!")
        return all_embeddings

    def embed_query(self, text: str) -> List[float]:
        """Embed a single query with rate limiting"""
        self._wait_if_needed()
        return self.embedding_model.embed_query(text)

# Create rate-limited embedding wrapper
rate_limited_embedding = RateLimitedEmbeddings(
    embedding_model=watsonx_embedding,
    requests_per_second=1.5,  # Stay below 2 req/s limit
    batch_size=8  # Embed 8 texts per request
)

# Create FAISS database with rate limiting
print("Creating FAISS vector database with rate limiting...")
faissdb = FAISS.from_documents(
    chunks,
    rate_limited_embedding,
    ids=ids
)
print("✅ FAISS database created successfully!")

Creating FAISS vector database with rate limiting...
📊 Processing 215 texts in 27 batches (batch size: 8)
🔄 Processing batch 1/27 (8 texts)...
✅ Batch 1 completed
🔄 Processing batch 2/27 (8 texts)...
⏳ Rate limiting: waiting 0.16s...
✅ Batch 2 completed
🔄 Processing batch 3/27 (8 texts)...
⏳ Rate limiting: waiting 0.21s...
✅ Batch 3 completed
🔄 Processing batch 4/27 (8 texts)...
⏳ Rate limiting: waiting 0.21s...
✅ Batch 4 completed
🔄 Processing batch 5/27 (8 texts)...
⏳ Rate limiting: waiting 0.29s...
✅ Batch 5 completed
🔄 Processing batch 6/27 (8 texts)...
⏳ Rate limiting: waiting 0.28s...
✅ Batch 6 completed
🔄 Processing batch 7/27 (8 texts)...
⏳ Rate limiting: waiting 0.30s...
✅ Batch 7 completed
🔄 Processing batch 8/27 (8 texts)...
⏳ Rate limiting: waiting 0.28s...
✅ Batch 8 completed
🔄 Processing batch 9/27 (8 texts)...
⏳ Rate limiting: waiting 0.26s...
✅ Batch 9 completed
🔄 Processing batch 10/27 (8 texts)...
⏳ Rate limiting: waiting 0.27s...
✅ Batch 10 completed
🔄 Processing bat



✅ Batch 27 completed
🎉 All 215 texts embedded successfully!
✅ FAISS database created successfully!


Next, print the first three information pieces in the database based on IDs.


In [25]:
for i in range(3):
    print(faissdb.docstore.search(str(i)))

page_content='1.	Code of Conduct' metadata={'source': 'companypolicies.txt'}
page_content='Our Code of Conduct outlines the fundamental principles and ethical standards that guide every' metadata={'source': 'companypolicies.txt'}
page_content='that guide every member of our organization. We are committed to maintaining a workplace that is' metadata={'source': 'companypolicies.txt'}


#### Similarity search


Let's do a similarity search again using FIASS DB on the same query.


In [28]:
#query = "Email policy"
#docs = faissdb.similarity_search(query)
#docs

In [27]:
import time
from typing import List
from langchain_community.vectorstores import FAISS

class RateLimitedEmbeddings:
    def __init__(self, embedding_model, requests_per_second=1.5, batch_size=10):
        """
        Wrapper around your embedding model that respects rate limits
        """
        self.embedding_model = embedding_model
        self.delay = 1.0 / requests_per_second
        self.batch_size = batch_size
        self.last_request_time = 0

    def _wait_if_needed(self):
        """Wait if we need to respect rate limits"""
        current_time = time.time()
        time_since_last = current_time - self.last_request_time

        if time_since_last < self.delay:
            sleep_time = self.delay - time_since_last
            print(f"⏳ Rate limiting: waiting {sleep_time:.2f}s...")
            time.sleep(sleep_time)

        self.last_request_time = time.time()

    def __call__(self, text):
        """Make the class callable for FAISS compatibility"""
        return self.embed_query(text)

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed documents with rate limiting and batching"""
        if not texts:
            return []

        all_embeddings = []
        total_batches = (len(texts) + self.batch_size - 1) // self.batch_size

        print(f"📊 Processing {len(texts)} texts in {total_batches} batches (batch size: {self.batch_size})")

        for i in range(0, len(texts), self.batch_size):
            batch = texts[i:i + self.batch_size]
            batch_num = i // self.batch_size + 1

            print(f"🔄 Processing batch {batch_num}/{total_batches} ({len(batch)} texts)...")

            self._wait_if_needed()

            try:
                batch_embeddings = self.embedding_model.embed_documents(batch)
                all_embeddings.extend(batch_embeddings)
                print(f"✅ Batch {batch_num} completed")

            except Exception as e:
                if "rate_limit" in str(e).lower() or "429" in str(e):
                    print(f"⚠️ Rate limit hit in batch {batch_num}, waiting longer...")
                    time.sleep(2.0)
                    batch_embeddings = self.embedding_model.embed_documents(batch)
                    all_embeddings.extend(batch_embeddings)
                    print(f"✅ Batch {batch_num} completed after retry")
                else:
                    raise e

        print(f"🎉 All {len(texts)} texts embedded successfully!")
        return all_embeddings

    def embed_query(self, text: str) -> List[float]:
        """Embed a single query with rate limiting"""
        print(f"🔍 Embedding query: '{text[:50]}{'...' if len(text) > 50 else ''}'")
        self._wait_if_needed()

        try:
            result = self.embedding_model.embed_query(text)
            print(f"✅ Query embedded successfully")
            return result
        except Exception as e:
            if "rate_limit" in str(e).lower() or "429" in str(e):
                print(f"⚠️ Rate limit hit on query, waiting 2s...")
                time.sleep(2.0)
                result = self.embedding_model.embed_query(text)
                print(f"✅ Query embedded after retry")
                return result
            else:
                raise e

# Create rate-limited embedding wrapper
rate_limited_embedding = RateLimitedEmbeddings(
    embedding_model=watsonx_embedding,
    requests_per_second=1.5,
    batch_size=8
)

# Create FAISS database with rate limiting
print("Creating FAISS vector database with rate limiting...")
faissdb = FAISS.from_documents(
    chunks,
    rate_limited_embedding,
    ids=ids
)
print("✅ FAISS database created successfully!")

# Now this will work
query = "Email policy"
docs = faissdb.similarity_search(query)
print(f"Found {len(docs)} similar documents")
docs

Creating FAISS vector database with rate limiting...
📊 Processing 215 texts in 27 batches (batch size: 8)
🔄 Processing batch 1/27 (8 texts)...
✅ Batch 1 completed
🔄 Processing batch 2/27 (8 texts)...
⏳ Rate limiting: waiting 0.16s...
✅ Batch 2 completed
🔄 Processing batch 3/27 (8 texts)...
⏳ Rate limiting: waiting 0.19s...
✅ Batch 3 completed
🔄 Processing batch 4/27 (8 texts)...
✅ Batch 4 completed
🔄 Processing batch 5/27 (8 texts)...
⏳ Rate limiting: waiting 0.30s...
✅ Batch 5 completed
🔄 Processing batch 6/27 (8 texts)...
⏳ Rate limiting: waiting 0.28s...
✅ Batch 6 completed
🔄 Processing batch 7/27 (8 texts)...
⏳ Rate limiting: waiting 0.30s...
✅ Batch 7 completed
🔄 Processing batch 8/27 (8 texts)...
⏳ Rate limiting: waiting 0.26s...
✅ Batch 8 completed
🔄 Processing batch 9/27 (8 texts)...
⏳ Rate limiting: waiting 0.29s...
✅ Batch 9 completed
🔄 Processing batch 10/27 (8 texts)...
⏳ Rate limiting: waiting 0.27s...
✅ Batch 10 completed
🔄 Processing batch 11/27 (8 texts)...
⏳ Rate limit



✅ Batch 27 completed
🎉 All 215 texts embedded successfully!
✅ FAISS database created successfully!
🔍 Embedding query: 'Email policy'
⏳ Rate limiting: waiting 0.15s...
✅ Query embedded successfully
Found 4 similar documents


[Document(metadata={'source': 'companypolicies.txt'}, page_content='internet and email usage, including those related to copyright and data protection.'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='to this policy. Non-compliance may lead to appropriate disciplinary action, which could include'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='This policy serves as a framework for handling discipline and termination. The organization'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='Policy Purpose: The Smoking Policy has been established to provide clear guidance and expectations')]

The retrieve results based on the similarity search seem to be the same as with the Chroma DB.

You can try with other queries or documents to see if they follow the same situation.


### Managing vector store: Adding, updating, and deleting entries


There might be situations where new documents come into your RAG application that you want to add to the current vector database, or you might need to delete some existing documents from the database. Additionally, there may be updates to some of the documents in the database that require updating.

The following sections will guide you on how to perform these tasks. You will use the Chroma DB as an example.


#### Add


Imagine you have a new piece of text information that you want to add to the vector database. First, this information should be formatted into a document object.


In [29]:
text = "Instructlab is the best open source tool for fine-tuning a LLM."

In [30]:
from langchain_core.documents import Document

Form the text into a `Document` object named `new_chunk`.


In [31]:
new_chunk =  Document(
    page_content=text,
    metadata={
        "source": "ibm.com",
        "page": 1
    }
)

Then, the new chunk should be put into a list as the vector database only accepts documents in a list.


In [32]:
new_chunks = [new_chunk]

Before you add the document to the vector database, since there are 215 chunks with IDs from 0 to 214, if you print ID 215, the document should show no values. Let's validate it.


In [33]:
print(vectordb._collection.get(ids=['215']))

{'ids': [], 'embeddings': None, 'metadatas': [], 'documents': [], 'uris': None, 'data': None}


Next, you can use the method `.add_documents()` to add this `new_chunk`. In this method, you should assign an ID to the document. Since there are already IDs from 0 to 214, you can assign ID 215 to this document. The ID should be in string format and placed in a list.


In [34]:
vectordb.add_documents(
    new_chunks,
    ids=["215"]
)

📊 Processing 1 texts in 1 batches (batch size: 10)
🔄 Processing batch 1/1 (1 texts)...
✅ Batch 1 completed
🎉 All 1 texts embedded successfully!


['215']

Now you can count the length of the vector database again to see if it has increased by one.


In [35]:
vectordb._collection.count()

216

You can then print this newly added document from the database by its ID.


In [36]:
print(vectordb._collection.get(ids=['215']))

{'ids': ['215'], 'embeddings': None, 'metadatas': [{'page': 1, 'source': 'ibm.com'}], 'documents': ['Instructlab is the best open source tool for fine-tuning a LLM.'], 'uris': None, 'data': None}


#### Update


Imagine you want to update the content of a document that is already stored in the database. The following code demonstrates how to do this.


Still, you need to form the updated text into a `Document` object.


In [37]:
update_chunk =  Document(
    page_content="Instructlab is a perfect open source tool for fine-tuning a LLM.",
    metadata={
        "source": "ibm.com",
        "page": 1
    }
)

Then, you can use the method `.update_document()` to update the specific stored information indexing by its ID.


In [38]:
vectordb.update_document(
    '215',
    update_chunk,
)

📊 Processing 1 texts in 1 batches (batch size: 10)
🔄 Processing batch 1/1 (1 texts)...
✅ Batch 1 completed
🎉 All 1 texts embedded successfully!


In [39]:
print(vectordb._collection.get(ids=['215']))

{'ids': ['215'], 'embeddings': None, 'metadatas': [{'page': 1, 'source': 'ibm.com'}], 'documents': ['Instructlab is a perfect open source tool for fine-tuning a LLM.'], 'uris': None, 'data': None}


As you can see, the document information has been updated.


#### Delete


If you want to delete documents from the vector database, you can use the method `_collection.delete()` and specify the document ID to delete it.


In [40]:
vectordb._collection.delete(ids=['215'])

In [41]:
print(vectordb._collection.get(ids=['215']))

{'ids': [], 'embeddings': None, 'metadatas': [], 'documents': [], 'uris': None, 'data': None}


As you can see, now that document is empty.


# Exercises


### Exercise 1 - Use another query to conduct similarity search.

Can you use another query to conduct the similarity search?


In [42]:
query = "Smoking policy"
docs = vectordb.similarity_search(query)
docs

[Document(metadata={'source': 'companypolicies.txt'}, page_content='Smoking Restrictions: Smoking inside company buildings, offices, meeting rooms, and other enclosed'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='Designated Smoking Areas: Smoking is only permitted in designated smoking areas, as marked by'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='No Smoking in Company Vehicles: Smoking is not permitted in company vehicles, whether they are'),
 Document(metadata={'source': 'companypolicies.txt'}, page_content='Policy Purpose: The Smoking Policy has been established to provide clear guidance and expectations')]

<details>
    <summary>Click here for solution</summary>

```python
query = "Smoking policy"
docs = vectordb.similarity_search(query)
docs
```

</details>


## Authors


[Kang Wang](https://author.skills.network/instructors/kang_wang)

Kang Wang is a Data Scientist in IBM. He is also a PhD Candidate in the University of Waterloo.

[Cal Page](https://www.linkedin.com/in/cal-page-1084311/)

Cal Page is a software engineering wizard who added rate limiting for IBM access along with pulling the token keys from the secret area on juptyr.

### Other Contributors


[Joseph Santarcangelo](https://author.skills.network/instructors/joseph_santarcangelo)

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.


```{## Change Log}
```


```{|Date (YYYY-MM-DD)|Version|Changed By|Change Description||-|-|-|-||2024-07-24|0.1|Kang Wang|Create the lab|}
```



Copyright © IBM Corporation. All rights reserved.


#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
#    pip-compile
#
aiohappyeyeballs==2.6.1
    # via aiohttp
aiohttp==3.12.13
    # via
    #   langchain
    #   langchain-community
aiosignal==1.3.2
    # via aiohttp
annotated-types==0.7.0
    # via pydantic
anyio==4.9.0
    # via
    #   httpx
    #   starlette
    #   watchfiles
asgiref==3.8.1
    # via opentelemetry-instrumentation-asgi
attrs==25.3.0
    # via aiohttp
backoff==2.2.1
    # via posthog
bcrypt==4.3.0
    # via chromadb
build==1.2.2.post1
    # via chromadb
cachetools==5.5.2
    # via google-auth
certifi==2025.6.15
    # via
    #   httpcore
    #   httpx
    #   ibm-watsonx-ai
    #   kubernetes
    #   pulsar-client
    #   requests
charset-normalizer==3.4.2
    # via requests
chroma-hnswlib==0.7.3
    # via chromadb
chromadb==0.4.24
    # via -r requirements.in
click==8.2.1
    # via
    #   typer
    #   uvicorn
coloredlogs==15.0.1
    # via onnxruntime
dataclasses-json==0.6.7
    # via langchain-community
distro==1.9.0
    # via posthog
durationpy==0.10
    # via kubernetes
faiss-cpu==1.8.0
    # via -r requirements.in
fastapi==0.115.13
    # via chromadb
filelock==3.18.0
    # via huggingface-hub
flatbuffers==25.2.10
    # via onnxruntime
frozenlist==1.7.0
    # via
    #   aiohttp
    #   aiosignal
fsspec==2025.5.1
    # via huggingface-hub
google-auth==2.40.3
    # via kubernetes
googleapis-common-protos==1.70.0
    # via opentelemetry-exporter-otlp-proto-grpc
greenlet==3.2.3
    # via sqlalchemy
grpcio==1.73.0
    # via
    #   chromadb
    #   opentelemetry-exporter-otlp-proto-grpc
h11==0.16.0
    # via
    #   httpcore
    #   uvicorn
hf-xet==1.1.5
    # via huggingface-hub
httpcore==1.0.9
    # via httpx
httptools==0.6.4
    # via uvicorn
httpx==0.28.1
    # via langsmith
huggingface-hub==0.33.0
    # via tokenizers
humanfriendly==10.0
    # via coloredlogs
ibm-cos-sdk==2.13.6
    # via ibm-watsonx-ai
ibm-cos-sdk-core==2.13.6
    # via
    #   ibm-cos-sdk
    #   ibm-cos-sdk-s3transfer
ibm-cos-sdk-s3transfer==2.13.6
    # via ibm-cos-sdk
ibm-watsonx-ai==1.0.4
    # via
    #   -r requirements.in
    #   langchain-ibm
idna==3.10
    # via
    #   anyio
    #   httpx
    #   requests
    #   yarl
importlib-metadata==8.7.0
    # via
    #   ibm-watsonx-ai
    #   opentelemetry-api
importlib-resources==6.5.2
    # via chromadb
jmespath==1.0.1
    # via
    #   ibm-cos-sdk
    #   ibm-cos-sdk-core
jsonpatch==1.33
    # via langchain-core
jsonpointer==3.0.0
    # via jsonpatch
kubernetes==33.1.0
    # via chromadb
langchain==0.2.1
    # via
    #   -r requirements.in
    #   langchain-community
langchain-community==0.2.1
    # via -r requirements.in
langchain-core==0.2.43
    # via
    #   langchain
    #   langchain-community
    #   langchain-ibm
    #   langchain-text-splitters
langchain-ibm==0.1.7
    # via -r requirements.in
langchain-text-splitters==0.2.4
    # via langchain
langsmith==0.1.147
    # via
    #   langchain
    #   langchain-community
    #   langchain-core
lomond==0.3.3
    # via ibm-watsonx-ai
markdown-it-py==3.0.0
    # via rich
marshmallow==3.26.1
    # via dataclasses-json
mdurl==0.1.2
    # via markdown-it-py
mmh3==5.1.0
    # via chromadb
mpmath==1.3.0
    # via sympy
multidict==6.5.0
    # via
    #   aiohttp
    #   yarl
mypy-extensions==1.1.0
    # via typing-inspect
numpy==1.26.4
    # via
    #   chroma-hnswlib
    #   chromadb
    #   faiss-cpu
    #   langchain
    #   langchain-community
    #   onnxruntime
    #   pandas
oauthlib==3.3.1
    # via
    #   kubernetes
    #   requests-oauthlib
onnxruntime==1.22.0
    # via chromadb
opentelemetry-api==1.34.1
    # via
    #   chromadb
    #   opentelemetry-exporter-otlp-proto-grpc
    #   opentelemetry-instrumentation
    #   opentelemetry-instrumentation-asgi
    #   opentelemetry-instrumentation-fastapi
    #   opentelemetry-sdk
    #   opentelemetry-semantic-conventions
opentelemetry-exporter-otlp-proto-common==1.34.1
    # via opentelemetry-exporter-otlp-proto-grpc
opentelemetry-exporter-otlp-proto-grpc==1.34.1
    # via chromadb
opentelemetry-instrumentation==0.55b1
    # via
    #   opentelemetry-instrumentation-asgi
    #   opentelemetry-instrumentation-fastapi
opentelemetry-instrumentation-asgi==0.55b1
    # via opentelemetry-instrumentation-fastapi
opentelemetry-instrumentation-fastapi==0.55b1
    # via chromadb
opentelemetry-proto==1.34.1
    # via
    #   opentelemetry-exporter-otlp-proto-common
    #   opentelemetry-exporter-otlp-proto-grpc
opentelemetry-sdk==1.34.1
    # via
    #   chromadb
    #   opentelemetry-exporter-otlp-proto-grpc
opentelemetry-semantic-conventions==0.55b1
    # via
    #   opentelemetry-instrumentation
    #   opentelemetry-instrumentation-asgi
    #   opentelemetry-instrumentation-fastapi
    #   opentelemetry-sdk
opentelemetry-util-http==0.55b1
    # via
    #   opentelemetry-instrumentation-asgi
    #   opentelemetry-instrumentation-fastapi
orjson==3.10.18
    # via
    #   chromadb
    #   langsmith
overrides==7.7.0
    # via chromadb
packaging==24.2
    # via
    #   build
    #   huggingface-hub
    #   ibm-watsonx-ai
    #   langchain-core
    #   marshmallow
    #   onnxruntime
    #   opentelemetry-instrumentation
pandas==2.1.4
    # via ibm-watsonx-ai
posthog==5.4.0
    # via chromadb
propcache==0.3.2
    # via
    #   aiohttp
    #   yarl
protobuf==5.29.5
    # via
    #   googleapis-common-protos
    #   onnxruntime
    #   opentelemetry-proto
pulsar-client==3.7.0
    # via chromadb
pyasn1==0.6.1
    # via
    #   pyasn1-modules
    #   rsa
pyasn1-modules==0.4.2
    # via google-auth
pydantic==2.11.7
    # via
    #   chromadb
    #   fastapi
    #   langchain
    #   langchain-core
    #   langsmith
pydantic-core==2.33.2
    # via pydantic
pygments==2.19.1
    # via rich
pypika==0.48.9
    # via chromadb
pyproject-hooks==1.2.0
    # via build
python-dateutil==2.9.0.post0
    # via
    #   ibm-cos-sdk-core
    #   kubernetes
    #   pandas
    #   posthog
python-dotenv==1.1.0
    # via uvicorn
pytz==2025.2
    # via pandas
pyyaml==6.0.2
    # via
    #   chromadb
    #   huggingface-hub
    #   kubernetes
    #   langchain
    #   langchain-community
    #   langchain-core
    #   uvicorn
requests==2.32.2
    # via
    #   chromadb
    #   huggingface-hub
    #   ibm-cos-sdk-core
    #   ibm-watsonx-ai
    #   kubernetes
    #   langchain
    #   langchain-community
    #   langsmith
    #   posthog
    #   requests-oauthlib
    #   requests-toolbelt
requests-oauthlib==2.0.0
    # via kubernetes
requests-toolbelt==1.0.0
    # via langsmith
rich==14.0.0
    # via typer
rsa==4.9.1
    # via google-auth
shellingham==1.5.4
    # via typer
six==1.17.0
    # via
    #   kubernetes
    #   lomond
    #   posthog
    #   python-dateutil
sniffio==1.3.1
    # via anyio
sqlalchemy==2.0.41
    # via
    #   langchain
    #   langchain-community
starlette==0.46.2
    # via fastapi
sympy==1.14.0
    # via onnxruntime
tabulate==0.9.0
    # via ibm-watsonx-ai
tenacity==8.5.0
    # via
    #   chromadb
    #   langchain
    #   langchain-community
    #   langchain-core
tokenizers==0.21.1
    # via chromadb
tqdm==4.67.1
    # via
    #   chromadb
    #   huggingface-hub
typer==0.16.0
    # via chromadb
typing-extensions==4.14.0
    # via
    #   anyio
    #   chromadb
    #   fastapi
    #   huggingface-hub
    #   langchain-core
    #   opentelemetry-api
    #   opentelemetry-exporter-otlp-proto-grpc
    #   opentelemetry-sdk
    #   opentelemetry-semantic-conventions
    #   pydantic
    #   pydantic-core
    #   sqlalchemy
    #   typer
    #   typing-inspect
    #   typing-inspection
typing-inspect==0.9.0
    # via dataclasses-json
typing-inspection==0.4.1
    # via pydantic
tzdata==2025.2
    # via pandas
urllib3==2.5.0
    # via
    #   ibm-cos-sdk-core
    #   ibm-watsonx-ai
    #   kubernetes
    #   requests
uvicorn[standard]==0.34.3
    # via chromadb
uvloop==0.21.0
    # via uvicorn
watchfiles==1.1.0
    # via uvicorn
websocket-client==1.8.0
    # via kubernetes
websockets==15.0.1
    # via uvicorn
wrapt==1.17.2
    # via opentelemetry-instrumentation
yarl==1.20.1
    # via aiohttp
zipp==3.23.0
    # via importlib-metadata
