![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, LangChain and Elasticsearch to create and deploy RAG function

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content

This notebook contains the steps and code to demonstrate support of creating and deploying Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, model testing, deploying a RAG solution for general use.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

#### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data (test) loading](#data)
- [Set up connectivity information to Elasticsearch](#elastic_conn)
- [Set up VectorStore with Elasticsearch credentials](#vectorstore)
- [Create and deploy RAG solution](#deploy)
- [Calculate rougeL metric ](#evaluate)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pack for Data administrator and ask him for your account credentials


### Install and import dependecies

In [1]:
%%capture
!pip install wget | tail -n 1
!pip install rouge-score | tail -n 1
!pip install -U "ibm_watsonx_ai>=1.1.22" | tail -n 1
!pip install -U "langchain>=0.3,<0.4" | tail -n 1
!pip install -U "langchain-elasticsearch>=0.3,<0.4" | tail -n 1

In [2]:
import os, getpass, wget

from IPython.display import display, Markdown
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from rouge_score import rouge_scorer

from ibm_watsonx_ai import APIClient, Credentials
from ibm_watsonx_ai.foundation_models import Embeddings, ModelInference
from ibm_watsonx_ai.foundation_models.extensions.rag import RAGPattern, VectorStore
from ibm_watsonx_ai.foundation_models.extensions.rag.utils import verbose_search
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes, ModelTypes
from ibm_watsonx_ai.foundation_models.prompts import PromptTemplate, PromptTemplateManager
from ibm_watsonx_ai.helpers import DataConnection
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

### Connection to watsonx.ai Runtime

Authenticate the watsonx.ai Runtime service on IBM Cloud Pack for Data. You need to provide platform `url`, your `username` and `api_key`.

In [3]:
credentials_dict = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your api key and hit enter: ")
}

In [5]:
credentials = Credentials.from_dict(credentials_dict)

### Defining the project id
The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.


In [5]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id and hit enter: ")

### Defining the space id
Deployed functions are available on deployment spaces. RAG we will create, will be a deployed function. You need to provide space id.

In [6]:
space_id = input("Please enter your space_id and hit enter: ")

### Initialize client
Create an instance of `APIClient` and set the default project.

In [6]:
client = APIClient(credentials, project_id=project_id)

### Defining the prompt id

We will use PromptTemplate to create a template for our RAG LLM query. If you don't have the PromptTemplate created in your project, this code will create an example one.

In [7]:
prompt_id = input("Please enter your prompt template asset id and hit enter, if not provided, a new one would be created: ") or None

if prompt_id is None:
    PROMPT_INSTRUCTION = \
    """
    Use the following pieces of documents to answer the question
    at the end. If you don't know the answer, just say that you
    don't know, don't try to make up an answer. Use three sentences
    maximum. Keep the answer as concise as possible. do not include
    question in your response.Your answers should not include any
    harmful, unethical, racist, sexist, toxic, dangerous, or illegal
    content. Please ensure that your responses are socially unbiased
    and positive in nature.\nPlease provide a concise professional
    response.
    """
    prompt_mgr = PromptTemplateManager(credentials=credentials, project_id=project_id)
    prompt_template = PromptTemplate(name="RAG_prompt_template",
                                     model_id=ModelTypes.LLAMA_2_13B_CHAT,
                                     input_variables=["question", "reference_documents"],
                                     instruction=PROMPT_INSTRUCTION,
                                     input_text="{reference_documents}\nQuestion:{question}\nAnswer:")
    stored_prompt_template = prompt_mgr.store_prompt(prompt_template=prompt_template)
    prompt_id = stored_prompt_template.prompt_id

### Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use IBM's <a href="https://www.ibm.com/products/watsonx-ai/foundation-models#Embedding+model+library">IBM_SLATE_30M_ENG</a> model to embed both the knowledge base passages and user queries.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses <a href="https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example" target="_blank" rel="noopener no referrer">Elasticsearch</a>, a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 512 tokens for `IBM Slate 30M`).

### Load knowledge base documents

Load set of documents used further to build knowledge base and store them as a project asset.

In [8]:
filename = 'psgs.tsv'
url = f'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/{filename}'
if not os.path.isfile(filename):
    wget.download(url)

asset_details = client.data_assets.create(name=filename, file_path=filename)

Creating data asset...
SUCCESS


### Read and prepare documents
Read documents using `DataConnection` and prepare them for vector database ingestion by combining title and text.

In [9]:
data_connection = DataConnection(data_asset_id=client.data_assets.get_id(asset_details))
data_connection.set_client(client)
documents = data_connection.read(csv_separator='\t')

In [10]:
documents['indextext'] = documents['title'].astype(str) + "\n" + documents['text']
documents = documents[:1000]
documents.head()

Unnamed: 0,id,text,title,indextext
0,1.0,History of Idaho - wikipedia History of Idaho ...,History of Idaho,History of Idaho\nHistory of Idaho - wikipedia...
1,2.0,"1957 . Location Cataldo , Idaho Built 1848 Arc...",History of Idaho,"History of Idaho\n1957 . Location Cataldo , Id..."
2,3.0,"of the Columbia was created in June 1816 , and...",History of Idaho,History of Idaho\nof the Columbia was created ...
3,4.0,"Canyon , he concluded that water transport was...",History of Idaho,"History of Idaho\nCanyon , he concluded that w..."
4,5.0,"1842 , Father Pierre - Jean De Smet , with Fr....",History of Idaho,"History of Idaho\n1842 , Father Pierre - Jean ..."


### Create an embedding function for VectorStore

Note that you can feed a custom embedding function to be used by Elasticsearch. The performance of Elasticsearch may differ depending on the embedding model used. 

In [11]:
embeddings = Embeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG,
    credentials=credentials,
    project_id=project_id
)

<a id="elastic_conn"></a>
## Set up connectivity information to Elasticsearch

**This notebook focuses on self-managed cluster using <a href="https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-getting-started" target="_blank" rel="noopener no referrer">IBM Cloud® Databases for Elasticsearch.</a>**

The following cell retrieves the Elasticsearch users, password, host and port from the environment if available and prompts you otherwise.

You can provide a connection asset ID to read all required connection data from it. Before doing so, make sure that connection asset was created in your project.

In [13]:
es_connection_id = input("Provide connection asset ID in your project. Skip this, if you wish to type credentials by hand and hit enter: ") or None
es_connection_params = {}

if es_connection_id is None:
    try:
        esuser = os.environ["ESUSER"]
    except KeyError:
        esuser = input("Please enter your Elasticsearch user name and hit enter: ")
    try:
        espassword = os.environ["ESPASSWORD"]
    except KeyError:
        espassword = getpass.getpass("Please enter your Elasticsearch password and hit enter: ")
    try:
        eshost = os.environ["ESHOST"]
    except KeyError:
        eshost = input("Please enter your Elasticsearch hostname and hit enter: ")
    try:
        esport = os.environ["ESPORT"]
    except KeyError:
        esport = input("Please enter your Elasticsearch port number and hit enter: ")
    try:
        esca = os.environ["ESCA"]
    except KeyError:
        esca = input("Please enter your Elasticsearch certificate contents (base64 encoded) and hit enter: ")

    elasticsearch_data_source_type_id = (
        client.connections.get_datasource_type_id_by_name("elasticsearch")
    )
    details = client.connections.create(
        {
            client.connections.ConfigurationMetaNames.NAME: "ES Connection",
            client.connections.ConfigurationMetaNames.DESCRIPTION: "connection description",
            client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: elasticsearch_data_source_type_id,
            client.connections.ConfigurationMetaNames.PROPERTIES: {
                "url": f"{eshost}:{esport}",
                "username": esuser,
                "password": espassword,
                "use_anonymous_access": "false",
                "ssl_certificate": esca,
            },
        }
    )

    es_connection_id = client.connections.get_id(details)

Creating connections...
SUCCESS


### Promote assets from project to space

Some of the assets need to be promoted from project to space. Since RAGPattern, and eventually our RAG function, will be using deployed space, those resources are required for it to work correctly.

In [18]:
assets_to_promote = [es_connection_id, prompt_id]
promoted_connection_id, promoted_prompt_id = [
    client.spaces.promote(id, project_id, space_id) for id in assets_to_promote
]

In [19]:
client.set.default_space(space_id)

Unsetting the project_id ...


'SUCCESS'

<a id="vectorstore"></a>
## Set up VectorStore with Elasticsearch credentials 

Create a VectorStore class that automatically detects the database type (in our case it will be Elasticsearch) and allows us to add, search and delete documents.

It works as a wrapper for LangChain VectorStore classes. You can customize the settings as long as it is supported. Consult the LangChain documentation for more information about <a href="https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.elasticsearch.ElasticsearchStore.html" target="_blank" rel="noopener no referrer">ElasticsearchStore</a> connector.

Provide the name of your Elasticsearch index for subsequent operations:

In [20]:
index_name = input("Please enter Elasticsearch index name and hit enter: ")

In [21]:
vector_store = VectorStore(
    client=client,
    embeddings=embeddings,
    connection_id=promoted_connection_id,
    index_name=index_name,
)

<a id="elasticsearchstore_index"></a>
### Embed and index documents with Elasticsearch

**Note: Could take several minutes if you don't have pre-built indices**

In [22]:
texts = documents.indextext.tolist()
metadatas = [{'title': title, 'id': doc_id} for (title, doc_id) in zip(documents.title, documents.id)]
docs_to_add = [Document(page_content=text, metadata=metadata) for text, metadata in zip(texts, metadatas)]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)
docs_to_add_split = text_splitter.split_documents(docs_to_add)

ids = vector_store.add_documents(docs_to_add_split)

Verify the number of documents loaded into the Elasticsearch index.

In [23]:
doc_count = vector_store.count()
doc_count

4051

Let's search for an example document as a sample. Note the embedding in the vector field, that was generated with the sentence transformer.

In [24]:
vector_store.search("United States of America", k=5, verbose=True)

**Question:** United States of America

Unnamed: 0,page_content,id,title
0,", D.C. States / Territories Alabama Alaska Ame...",639.0,"United States Senate elections, 2018"
1,"United States , 1797 -- 1801 1st Vice Presiden...",918.0,Founding Fathers of the United States
2,who led the American Revolution against the au...,878.0,Founding Fathers of the United States
3,-- 1793 ) U.S. Minister to France ( 1785 -- 17...,927.0,Founding Fathers of the United States
4,"the United States of America , which was recog...",521.0,British colonization of the Americas


[Document(metadata={'title': 'United States Senate elections, 2018', 'id': 639.0}, page_content=', D.C. States / Territories Alabama Alaska American Samoa Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Guam Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Puerto Rico'),
 Document(metadata={'title': 'Founding Fathers of the United States', 'id': 918.0}, page_content='United States , 1797 -- 1801 1st Vice President of the United States , 1789 -- 1797 U.S. Ambassador to the United Kingdom , 1785 -- 1788 U.S. Ambassador to the Netherlands , 1782 -- 1788 Delegate , Second Continental Congress , 1775 -- 1778 Delegate , First Continental Congress , 1774 Founding of the United States Braintree Instructions ( 1765 ) Boston Massacre defense Continental Associat

<a id="deploy"></a>
## Create and deploy RAG solution

`RAGPattern` class from Watsonx SDK allows us to deploy a RAG function on a deployment space.

### Initialize ModelInference object

Model will use the defined prompt and retrieved documents from vector store to generate answers.

In [25]:
generate_params = {
    GenParams.DECODING_METHOD: "greedy",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 200
}

model = ModelInference(
    model_id=ModelTypes.LLAMA_2_13B_CHAT,
    params=generate_params,
    credentials=credentials,
    space_id=space_id
)

### Initialize RAGPattern class

In [26]:
pattern = RAGPattern(
    space_id=space_id,
    prompt_id=promoted_prompt_id,
    vector_store=vector_store,
    model=model,
    api_client=client
)

### Preview function code

Deployed function can be displayed using ``pretty_print`` method.  
Set `insert_to_cell` to `True` to insert function code into next notebook cell.

In [30]:
pattern.inference_function.pretty_print(insert_to_cell=False)

def default_inference_function(params={'space_id': '93ee84d1-b7dd-42b4-b2ca-121bc0c86315', 'project_id': None, 'retriever': {'method': 'simple', 'number_of_chunks': 5}, 'vector_store': {'connection_id': '580ca8f2-caea-4239-b3f0-691312b2c428', 'embeddings': {'__class__': 'Embeddings', '__module__': 'ibm_watsonx_ai.foundation_models.embeddings.embeddings', 'model_id': 'ibm/slate-30m-english-rtrvr', 'params': None, 'project_id': 'd2436d2e-5814-4370-bd06-1754670e7d46', 'space_id': None, 'verify': None}, 'index_name': 'elastic_index_name', 'datasource_type': 'elasticsearch', 'distance_metric': None}, 'prompt_template_text': "\n    Use the following pieces of documents to answer the question\n    at the end. If you don't know the answer, just say that you\n    don't know, don't try to make up an answer. Use three sentences\n    maximum. Keep the answer as concise as possible. do not include\n    question in your response.Your answers should not include any\n    harmful, unethical, racist, se

### Test the function locally

To test our solution we can query the function locally without deploying.

In [31]:
questions_and_answers = {
    'what are the names of founding fathers of the united states?': "Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton",
    'who played in the super bowl in 2013?': 'Baltimore Ravens::San Francisco 49ers',
    'when did bucharest become the capital of romania?': '1862'
}

Define a helper function for formatting the response:

In [32]:
def print_rag_response(response):
    for question, (answer, reference_docs) in zip(questions_and_answers.keys(), response['predictions'][0]['values']):
        verbose_search(question, [Document(**d) for d in reference_docs])
        display(Markdown(f'**Answer:** {answer}'))

Questions have to be provided in the payload that have format provided below.

In [33]:
payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [{
        "values": list(questions_and_answers.keys())
    }]
}

In [34]:
response = pattern.query(payload)
print_rag_response(response)

**Question:** what are the names of founding fathers of the united states?

Unnamed: 0,page_content,id,title
0,Founding Fathers of the United States,878.0,Founding Fathers of the United States
1,Founding Fathers of the United States,879.0,Founding Fathers of the United States
2,Founding Fathers of the United States,880.0,Founding Fathers of the United States
3,Founding Fathers of the United States,881.0,Founding Fathers of the United States
4,Founding Fathers of the United States,882.0,Founding Fathers of the United States


**Answer:** The names of the founding fathers of the united states are george washington, john adams, thomas jefferson, benjamin franklin, james madison, alexander hamilton, john jay, and james monroe.














































































































































**Question:** who played in the super bowl in 2013?

Unnamed: 0,page_content,id,title
0,Super Bowl XLVII - wikipedia Super Bowl XLVII ...,818.0,Super Bowl XLVII
1,Opponents Announced '' . NewOrleansSaints.com ...,856.0,Super Bowl XLVII
2,"responded to the claim on Twitter in jest , tw...",848.0,Super Bowl XLVII
3,: Super Bowl 2012 National Football League sea...,876.0,Super Bowl XLVII
4,"February 4 , 2013 . Jump up ^ `` Lights go out...",866.0,Super Bowl XLVII


**Answer:**  The Baltimore Ravens played against the San Francisco 49ers in Super Bowl XLVII in 2013.

**Question:** when did bucharest become the capital of romania?

Unnamed: 0,page_content,id,title
0,destroying a third of the city . Ottoman massa...,948.0,Bucharest
1,"to become joyful ) , while an early 19th - cen...",946.0,Bucharest
2,"route to the Eastern Front , Bucharest suffere...",949.0,Bucharest
3,Bucharest,942.0,Bucharest
4,Bucharest,943.0,Bucharest


**Answer:**  Bucharest became the capital of Romania in 1862, after Wallachia and Moldavia were united to form the Principality of Romania.

### Deploy RAGPattern

Deployment can be done by using `deploy` method of our created RAGPattern object. In order to do so, provide additional meta props, just like those required for deployed function. Check `client._functions.ConfigurationMetaNames.show()` and `client.deployments.ConfigurationMetaNames.show()` for more info.

In [35]:
pattern_deployment_details = pattern.deploy("RAG_deployment")
pattern_deployment_id = client.deployments.get_id(pattern_deployment_details)



######################################################################################

Synchronous deployment creation for id: '9d129ea3-03e6-4a23-b4c8-cba91170c9b6' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
...
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='4589383a-7739-4768-9af8-9e2a7d564903'
-----------------------------------------------------------------------------------------------




### Test the deployed function

RAG service is now deployed on our space. To test our solution we can run the cell below. Questions have to be provided in the payload that have format provided below.

In [36]:
response = client.deployments.score(pattern_deployment_id, meta_props=payload)
print_rag_response(response)

**Question:** what are the names of founding fathers of the united states?

Unnamed: 0,page_content,id,title
0,Founding Fathers of the United States,878.0,Founding Fathers of the United States
1,Founding Fathers of the United States,879.0,Founding Fathers of the United States
2,Founding Fathers of the United States,880.0,Founding Fathers of the United States
3,Founding Fathers of the United States,881.0,Founding Fathers of the United States
4,Founding Fathers of the United States,882.0,Founding Fathers of the United States


**Answer:** The names of the founding fathers of the united states are george washington, john adams, thomas jefferson, benjamin franklin, james madison, alexander hamilton, john jay, and james monroe.














































































































































**Question:** who played in the super bowl in 2013?

Unnamed: 0,page_content,id,title
0,Super Bowl XLVII - wikipedia Super Bowl XLVII ...,818.0,Super Bowl XLVII
1,Opponents Announced '' . NewOrleansSaints.com ...,856.0,Super Bowl XLVII
2,"responded to the claim on Twitter in jest , tw...",848.0,Super Bowl XLVII
3,: Super Bowl 2012 National Football League sea...,876.0,Super Bowl XLVII
4,"February 4 , 2013 . Jump up ^ `` Lights go out...",866.0,Super Bowl XLVII


**Answer:**  The Baltimore Ravens played against the San Francisco 49ers in Super Bowl XLVII in 2013.

**Question:** when did bucharest become the capital of romania?

Unnamed: 0,page_content,id,title
0,destroying a third of the city . Ottoman massa...,948.0,Bucharest
1,"to become joyful ) , while an early 19th - cen...",946.0,Bucharest
2,"route to the Eastern Front , Bucharest suffere...",949.0,Bucharest
3,Bucharest,942.0,Bucharest
4,Bucharest,943.0,Bucharest


**Answer:**  Bucharest became the capital of Romania in 1862, after Wallachia and Moldavia were united to form the Principality of Romania.

<a id="evaluate"></a>
## Calculate rougeL metric 
Calculate rougeL recall score to verify expected answer presence in generated response.

In [37]:
text_responses = [v[0] for v in response['predictions'][0]['values']]
targets = [answer for answer in questions_and_answers.values()]

In [38]:
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)
scores = [scorer.score(target, prediction) for target, prediction in zip(targets, text_responses)]
mean_rougeL = sum([s['rougeL'].recall for s in scores]) / len(questions_and_answers)

print(f"Mean rougeL recall score: {mean_rougeL}")

Mean rougeL recall score: 0.8571428571428571


<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!

Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors:
**Dominik Zimny**, Software Engineer at watsonx.ai

**Mateusz Szewczyk**, Software Engineer at watsonx.ai

Copyright © 2024-2025 IBM. This notebook and its source code are released under the terms of the MIT License.