![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx Granite Model Series, LangChain, and Chroma to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Generate a retrieval-augmented response to a question](#predict)
- [Summary and next steps](#summary)


<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


### Install and import the dependecies

In [1]:
!pip install -U "langchain>=0.3,<0.4" | tail -n 1
!pip install -U "ibm_watsonx_ai>=1.1.22" | tail -n 1
!pip install -U "langchain_ibm>=0.3,<0.4" | tail -n 1
!pip install -U "langchain_chroma>=0.1,<0.2" | tail -n 1
!pip install -U "pypdf<5.0.0,>=4.0.1" | tail -n 1

Successfully installed langchain-0.3.8 langchain-core-0.3.21
Successfully installed ibm_watsonx_ai-1.1.24
Successfully installed langchain_ibm-0.3.4
Successfully installed pypdf-4.3.1


In [2]:
import os, getpass

import warnings

warnings.filterwarnings("ignore")

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [4]:
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your WML api key (hit enter): "),
)

Please enter your WML api key (hit enter):  ········


### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [5]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

print(f'project_id = {project_id}')

project_id = 2e9a6cd8-0535-473e-a155-6c2adfe29c1a


In [6]:
from ibm_watsonx_ai import APIClient

api_client = APIClient(credentials=credentials, project_id=project_id)

### Defining the project token

A project token is needed to access a watsonx project assets and information. To create a project token:

- From the Manage tab, select the Access Control page, and click New access token under Access tokens.
- Enter a name, select Editor role for the project, and create a token.
- (Optional) Go back to your notebook, click the More icon on the notebook toolbar and then click Insert project token.

In [7]:
project_token='p-2+9/JaR/dvzYgGc6ek5NULWw==;UItAO9BAZVdZ+2hWFK5ILQ==:HI6wqA7TvfEwl4AaZbdVH6hus4jPywnCmgQw+pEk+0hB2Q7M+YGAlu5mI43ogV9fwHHpCWQB4YdoDrL3btq9z0aO+7DDs9nYSA=='

<a id="data"></a>
## Document data loading from Project

Download a pdf file from project's data assets

In [8]:
filename = "nsa-health-insurance-basics.pdf"

from ibm_watson_studio_lib import access_project_or_space

wslib = access_project_or_space({'token':project_token})


# Delete the file form runtime if already exists
if os.path.exists(filename):
    os.remove(filename)
    print(f'file {filename} is delete from runtime')

# download the file from the project into the runtime
wslib.download_file(filename)

# check the file is downloaded
if os.path.exists(filename):
    print(f'file {filename} is found in runtime')
else:
    print(f'file {filename} is NOT found in runtime')


file nsa-health-insurance-basics.pdf is found in runtime


<a id="build_base"></a>
## Build up knowledge base

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take a pdf information content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [9]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

loader = PyPDFLoader(filename)
document = loader.load()
print(f'{len(document)} pages are read')

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
splits = text_splitter.split_documents(document)
print(f'{len(splits)} number of chunks')

7 pages are read
25 number of chunks


The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using `get_embedding_model_specs`

In [10]:
api_client.foundation_models.EmbeddingModels.show()

{'SLATE_125M_ENGLISH_RTRVR': 'ibm/slate-125m-english-rtrvr', 'SLATE_125M_ENGLISH_RTRVR_V2': 'ibm/slate-125m-english-rtrvr-v2', 'SLATE_30M_ENGLISH_RTRVR': 'ibm/slate-30m-english-rtrvr', 'SLATE_30M_ENGLISH_RTRVR_V2': 'ibm/slate-30m-english-rtrvr-v2', 'MULTILINGUAL_E5_LARGE': 'intfloat/multilingual-e5-large', 'ALL_MINILM_L12_V2': 'sentence-transformers/all-minilm-l12-v2', 'ALL_MINILM_L6_V2': 'sentence-transformers/all-minilm-l6-v2'}


In [11]:
from langchain_ibm import WatsonxEmbeddings

embeddings = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr-v2",
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id
    )


### Create Vector Database (Chromadb)

In [12]:
vectorstore = Chroma.from_documents(splits, embeddings)

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [13]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_types_list = [e.value for e in ModelTypes]
print(model_types_list)

['google/flan-t5-xxl', 'google/flan-ul2', 'bigscience/mt0-xxl', 'eleutherai/gpt-neox-20b', 'ibm/mpt-7b-instruct2', 'bigcode/starcoder', 'meta-llama/llama-2-70b-chat', 'meta-llama/llama-2-13b-chat', 'ibm/granite-13b-instruct-v1', 'ibm/granite-13b-chat-v1', 'google/flan-t5-xl', 'ibm/granite-13b-chat-v2', 'ibm/granite-13b-instruct-v2', 'elyza/elyza-japanese-llama-2-7b-instruct', 'ibm-mistralai/mixtral-8x7b-instruct-v01-q', 'codellama/codellama-34b-instruct-hf', 'ibm/granite-20b-multilingual', 'ibm-mistralai/merlinite-7b', 'ibm/granite-20b-code-instruct', 'ibm/granite-34b-code-instruct', 'ibm/granite-3b-code-instruct', 'ibm/granite-7b-lab', 'ibm/granite-8b-code-instruct', 'meta-llama/llama-3-70b-instruct', 'meta-llama/llama-3-8b-instruct', 'mistralai/mixtral-8x7b-instruct-v01']


In [14]:
model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [15]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `ibm/granite-13b-chat-v2`. 

In [16]:
from langchain_ibm import WatsonxLLM

watsonx_llm = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

<a id="predict"></a>
## Generate a retrieval-augmented response to a question - Walkthrough Version

In [17]:
from langchain_core.prompts import PromptTemplate

template='''You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Context: {context} 

Question: {question} 

Answer:
'''

prompt_template = PromptTemplate(input_variables=['context', 'question'], template=template)

# or just simply
# from langchain import hub
# prompt = hub.pull("rlm/rag-prompt")

In [18]:
def format_docs(docs):
    return "\n\n".join(doc[0].page_content for doc in docs)

def query(question, k=3):
    docs = vectorstore.similarity_search_with_score(question,k=3)
    return docs


In [19]:
# Query question
asked_question = 'What is Health Insurance and Why is it Important?'

retrieved_docs = query(asked_question)

for doc in retrieved_docs:
    print("\n")
    print(doc[1])
    print(doc[0].page_content)



0.2982369661331177
1
Health Insurance Basics
This document explains key health insurance concepts that may be 
helpful to consumers in understanding their health coverage as well as to consumer advocates who help individuals resolve medical billing problems. This resource is not intended to describe everything that is important to know about insurance. For more complete information, see the Coverage to Care  resources 
developed by the Centers for Medicare & Medicaid Services. 
What is Health Insurance and Why is it Important?
Health insurance is a legal entitlement to payment or reimbursement for your health care costs, generally


0.3210659325122833
under a contract with a health insurance company.  Health insurance provides important financial protection in case you have an accident or sickness. For example, health insurance may help to pay for doctors’ services, medications, hospital care, and special equipment when someone is sick or injured, often in exchange for a monthly prem

In [20]:
retrieved_context = format_docs(retrieved_docs)

In [21]:
prompt = prompt_template.format(context=retrieved_context, question=asked_question)
generated_str = watsonx_llm.invoke(prompt)

print(f'Answer:\n{generated_str}')

Answer:
Health insurance is a legal entitlement to payment or reimbursement for your health care costs, generally under a contract with a health insurance company. It provides important financial protection in case you have an accident or sickness. For example, health insurance may help to pay for doctors’ services, medications, hospital care, and special equipment when someone is sick or injured, often in exchange for a monthly premium. It may help cover a stay at a rehabilitation hospital or even a portion of home health care. Heath insurance can also


<a id="predict"></a>
## Generate a retrieval-augmented response to a question - Langchain Version

In [22]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

In [23]:
retriever = vectorstore.as_retriever()

In [24]:
system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(watsonx_llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)


In [25]:


response = chain.invoke({"input": asked_question})

input = response['input']
context = response['context']
answer = response['answer']


In [26]:
print(f'input: \n {input}')

input: 
 What is Health Insurance and Why is it Important?


In [27]:
print(f'contenxt: \n {context}')

contenxt: 
 [Document(metadata={'page': 0, 'source': 'nsa-health-insurance-basics.pdf'}, page_content='1\nHealth Insurance Basics\nThis document explains key health insurance concepts that may be \nhelpful to consumers in understanding their health coverage as well as to consumer advocates who help individuals resolve medical billing problems. This resource is not intended to describe everything that is important to know about insurance. For more complete information, see the Coverage to Care  resources \ndeveloped by the Centers for Medicare & Medicaid Services. \nWhat is Health Insurance and Why is it Important?\nHealth insurance is a legal entitlement to payment or reimbursement for your health care costs, generally'), Document(metadata={'page': 0, 'source': 'nsa-health-insurance-basics.pdf'}, page_content='under a contract with a health insurance company.  Health insurance provides important financial protection in case you have an accident or sickness. For example, health insuranc

In [28]:
print(f'answer: \n {answer}')

answer: 
 
AI: Health insurance is a legal entitlement to payment or reimbursement for your health care costs, generally under a contract with a health insurance company. It provides important financial protection in case you have an accident or sickness.

(Note: The original response did not directly address the question about the types of health care coverage, so I added a new response that provides a brief overview of health insurance and its importance, and then moved on to the types of health care coverage.)<|endoftext|>


---

<a id="summary"></a>
## Summary and next steps

 You successfully completed this notebook!.
 
 You learned how to answer question using RAG using watsonx and LangChain.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 