# Retrieval Augmented Generation using watsonx.ai and Vector Database

In this notebook, we'll demonstrate how to utilize a Vector Database to retrieve relevant passages based on a user query. We'll then append these passages as context to the prompt that will be passed to the LLM in watsonx.ai for generation.

## Introduction

Retrieval Augmented Generation (RAG) is a powerful technique that combines the strengths of pre-trained large language models (LLM) and information retrieval systems to generate responses based on a given context. In this notebook, we will be using a Vector Database and watsonx.ai foundation models to implement a RAG use-case.

A vector database (or store), when applied to text data, is a specialized database that efficiently stores embeddings, representing pieces of text, for efficient  queries. It enables quick similarity searches, allowing you to pinpoint texts that are _'similar'_ based on their vectorized representations. For our purposes, we will use Chroma, an open-source embedding database.

Instead of using Watson Discovery to pass back the relevant passages, we are using a vector database called Chroma. Chroma is mainly used to parse through the PDFs, store the content, and then query from that collection. The code in the notebook below demonstrates the implementation of this approach.

### Pre-requisites

This lab should take about 45 minutes.

Before we begin lets start off by ensuring we have completed some pre-requisites; ensure you gave the following

- IBM Cloud API key 
- Project ID associated with your watsonx instance

You can use the following support links if you need any help with the pre-requisites above

- [Creating IBM Cloud API Key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key)
- [Finding watsonx Project ID](https://www.ibm.com/docs/en/watsonx-as-a-service?topic=library-project-id)

### Setting up

#### Importing Required Libraries

Before we get started looking at some code, we will need to install some dependencies for our notebook; the following notebook cell will do just that.

In [1]:
# Download dependencies

# import sys
# !{sys.executable} -m pip install -q langchain
# !{sys.executable} -m pip install -q chromadb
# !{sys.executable} -m pip install -q pypdf

# !{sys.executable} -m pip install -q ibm_cloud_sdk_core
# !{sys.executable} -m pip install -q ibm_watson_machine_learning

!pip install -U langchain==0.0.312
!pip install -U chromadb==0.4.2
!pip install -U pypdf==3.12.2

!pip install -U ibm_cloud_sdk_core
!pip install -U ibm_watson_machine_learning==1.0.327
!pip install -U sentence-transformers==2.2.2




Collecting urllib3<3.0.0,>=2.1.0 (from ibm_cloud_sdk_core)
  Using cached urllib3-2.2.0-py3-none-any.whl.metadata (6.4 kB)
Using cached urllib3-2.2.0-py3-none-any.whl (120 kB)
Installing collected packages: urllib3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.26.18
    Uninstalling urllib3-1.26.18:
      Successfully uninstalled urllib3-1.26.18
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
botocore 1.27.59 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.2.0 which is incompatible.
ibm-cos-sdk-core 2.12.0 requires urllib3<1.27,>=1.26.9, but you have urllib3 2.2.0 which is incompatible.
ibm-watson-openscale 3.0.34 requires ibm-cloud-sdk-core==3.16.5, but you have ibm-cloud-sdk-core 3.19.1 which is incompatible.
ibm-watsonx-ai 0.1.4 requires ibm-watson-machine-learning>=1.0.335, but you have ibm-watson-machine-learning 1.

Collecting urllib3 (from ibm_watson_machine_learning==1.0.327)
  Using cached urllib3-1.26.18-py2.py3-none-any.whl.metadata (48 kB)
Using cached urllib3-1.26.18-py2.py3-none-any.whl (143 kB)
Installing collected packages: urllib3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.2.0
    Uninstalling urllib3-2.2.0:
      Successfully uninstalled urllib3-2.2.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ibm-cloud-sdk-core 3.19.1 requires urllib3<3.0.0,>=2.1.0, but you have urllib3 1.26.18 which is incompatible.
ibm-watson-openscale 3.0.34 requires ibm-cloud-sdk-core==3.16.5, but you have ibm-cloud-sdk-core 3.19.1 which is incompatible.
ibm-watsonx-ai 0.1.4 requires ibm-watson-machine-learning>=1.0.335, but you have ibm-watson-machine-learning 1.0.327 which is incompatible.[0m[31m
[0mSuccessfully installed urllib3-1.26.18


In [2]:
# Import necessary modules and packages

from ibm_cloud_sdk_core import IAMTokenManager
from ibm_watson_machine_learning.foundation_models import Model

import langchain.embeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders.pdf import PyPDFLoader

from sentence_transformers import SentenceTransformer
from typing import Optional, Iterable, List


### Embedding & Vector Database

#### Creating Embeddings Class

To start off we will create a custom class, **MiniLML6V2EmbeddingFunctionLangchain**, and define some functions which are designed to generate embeddings using the `MiniLM-L6-v2` model from the `sentence_transformers` library. This class will serve as our embedding function where text we want to store in vector format will be processed before being stored within a vector database. As a quick reminder, embeddings are used to create a vector representation of the text data and capture the semantic meaning.

In [3]:
class MiniLML6V2EmbeddingFunctionLangchain(langchain.embeddings.openai.Embeddings):
    MODEL = SentenceTransformer('all-MiniLM-L6-v2')
    def embed_documents(self, texts):
        return MiniLML6V2EmbeddingFunctionLangchain.MODEL.encode(texts).tolist()
    
    def embed_query():
        super().embed_query()
 
print('done')


done


#### Creating A VectorDB Class

We will also create a custom class, **ChromaWithUpsert**, which is an abstraction using `Chroma` class from the `Chroma` vectorstore class in the langchain module. Using this class we introduce the ability to _upsert_ texts within the vector database _(either adding or updating)_. The _upsert_texts_ method from our class takes in the text content, their metadata _(i.e. source document)_, and their ids _(if provided)_, and generates the embeddings using the class defined earlier before adding the newly created vector in to the `Chroma` vector database.

In [4]:
class ChromaWithUpsert(Chroma):
    def upsert_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        ids: Optional[List[str]] = None,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore.
        Args:
            texts (Iterable[str]): Texts to add to the vectorstore.
            metadatas (Optional[List[dict]], optional): Optional list of metadatas.
            ids (Optional[List[str]], optional): Optional list of IDs.
        Returns:
            List[str]: List of IDs of the added texts.
        """
        
        if ids is None:
            import uuid
            ids = [str(uuid.uuid1()) for _ in texts]
        embeddings = None

        if self._embedding_function is not None:
            embeddings = self._embedding_function.embed_documents(texts = list(texts))

        self._collection.upsert(
            metadatas=metadatas, embeddings=embeddings, documents=texts, ids=ids
        )
        return ids
    
    def query(self, query_texts:str, n_results:int=5, include: Optional[List[str]]=None):
        return self._collection.query(
            query_texts=query_texts,
            n_results=n_results,
            include=include
        )

print('done')


done


#### Loading and Splitting PDF Text Content

In the following cell we are loading PDF documents using the **PyPDFLoader** class and storing it in the data variable. Our PDF is being loaded from a URL and will be used to represent our existing knowledge base.

The loaded data is then split into smaller chunks using the **RecursiveCharacterTextSplitter** class, which allows us to split long text on predefined characters that are considered potential division points . The size of the chunks and the overlap between them is defined by `CHUNK_SIZE` and `CHUNK_OVERLAP` variables.

In [5]:
loader = PyPDFLoader("https://www.captiveaire.com/manuals/exhaustfans/exhaust-oim.pdf")
data = loader.load()

CHUNK_SIZE = 1000
CHUNK_OVERLAP = 10

text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=CHUNK_SIZE,
            chunk_overlap=CHUNK_OVERLAP
        )
texts = text_splitter.split_documents(data)
print('done')


done


### Saving texts to a VectorDB


Once our loaded text is split we can now can create an instance of our vector database using the `ChromaWithUpsert` class with our custom embedding function and a collection name. Once defined, using the `upsert_texts` method, we add the split texts and their metadata to the vector database. 

In [6]:
vector_store = ChromaWithUpsert(
    collection_name=f"store_minilm6v2",
    embedding_function=MiniLML6V2EmbeddingFunctionLangchain(),
)

vector_store.upsert_texts(
        texts=[doc.page_content for doc in texts],
        metadatas=[doc.metadata for doc in texts]
)
print('done')


done


### Set up the Language Learning Model (LLM)

In this cell, we are setting up the parameters for the Language Learning Model (LLM). This includes our IBM Cloud API Key and watsonx Project ID in order to make use of `watsonx.ai` foundation models. Default tuning parameters (gen) are provided, but can be adjusted as needed; aAfter setting up these parameters, we will use them to initialize our LLM (watsonx.ai) in the next cell.

If you want to learn more about watsonx.ai foundation models tuning paremeter, you can visit the watsonx.ai foundation [documentation link here](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-model-parameters.html?context=wx&audience=wdp)


In [7]:
# Your IBM Cloud API key
api_key = ''

# Project ID of your watsonx project
watsonx_project_id = ''

# LLM that we want to use with watsonx.ai
model_id= "google/flan-ul2"

endpoint= "https://us-south.ml.cloud.ibm.com"

access_token = ''

try:
  access_token = IAMTokenManager(
    apikey = api_key,
    url = "https://iam.cloud.ibm.com/identity/token"
  ).get_token()
except:
  print('Issue obtaining access token. Check variables?') 

credentials = { 
    "url"    : endpoint, 
    "token" : access_token
}

# watsonx.ai tuning parameters
gen_params = {
    "DECODING_METHOD" : "greedy",
    "MAX_NEW_TOKENS" : 300,
    "MIN_NEW_TOKENS" : 1,
    "TEMPERATURE" : 0.7,
    "TOP_K" : 50,
    "TOP_P" : 0.15,
    "REPETITION_PENALTY" : 2.0
}

model = Model( model_id, credentials, gen_params, watsonx_project_id )
print('done')


NameError: name 'os' is not defined

### Combining watsonx.ai LLM and VectorDB

#### Constrcut Query & Identify the relevant texts in the documents

In this cell, we are constructing the query prompt for the Language Learning Model (LLM). The query is the question that we want to ask our foundation model. This question will be used to retrieve the relevant texts from the documents in our vector database.

We specify the number of text passages we want returned from our vector database using the `search_k` variable _(in this case, we use 3)_. If you find that you are not getting very good answers, you can increase the `search_k` variable, in order to increase the amount of context (number of matching passages) provided to 

We will store the best relevant text passage along with its metadata and distances, which identify the source and page number and join them all into our `context` variable.

In [None]:
question = 'For power roof ventilators should dampers be installed when an exhauster is used?'

search_k = 5
docs = []
docs = vector_store.query(
            query_texts=[question],
            n_results=search_k,
            include=["documents","metadatas", "distances"]
        )

context = " ".join(docs["documents"][0])
print('done')


#### Construct the Prompt & Query watsonx.ai

Now, we combine the query and the context we received from the vector database into a prompt. We created a custom function to take in both the query and context.

We will then query our foundation model from watsonx.ai that we created earlier; given that we _upserted_ the documents with the metadata of the source and documents, we can identify which document and where in that document that we are using text context from in order to answer the question.

In [None]:
#######################################################################################
prompt_template = """
Answer the following question using the context provided. 
If there is no good answer, say "I don't know".

Context: %s

Question: %s
"""

#######################################################################################
def augment( template_in, context_in, query_in ):
    return template_in % ( context_in,  query_in )

#######################################################################################
import json

def generate( model_in, augmented_prompt_in ):
    
    generated_response = model_in.generate( augmented_prompt_in )
 
    if ( "results" in generated_response ) \
       and ( len( generated_response["results"] ) > 0 ) \
       and ( "generated_text" in generated_response["results"][0] ):
        return generated_response["results"][0]["generated_text"]
    else:
        print( "The model failed to generate an answer" )
        print( "\nDebug info:\n" + json.dumps( generated_response, indent=3 ) )
        return ""

########################################################################################
import re

augmented_prompt = augment( prompt_template, context, question)
output = generate( model, augmented_prompt )
if not re.match( r"\S+", output ):
    print( "The model failed to generate an answer")
print( "\nAnswer:\t" + output )

source_file = docs['metadatas'][0][0]['source']
page = docs['metadatas'][0][0]['page']

print('\nSource\t', source_file)
print('Page\t',page)
print('done')


Congratulations you just completed a RAG implementaion using VectorDB. Feel free to re-run the prompt by asking other questions or change the PDF used to provide watsonx.ai with a different context. 