![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, Chroma, and LangChain to answer questions in lightweight clusters

#### Disclaimers

- Projects and Spaces are not available in lightweight clusters.

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Langchain - Initialize the WastonxLLM from langchain-ibm](#langchain_init)
- [Generate a retrieval-augmented response to a question](#predict)
- [Summary and next steps](#summary)


<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pack for Data administrator and ask him for your account credentials


### Install and import the `ibm-watsonx-ai` and dependecies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [None]:
!pip install -U langchain | tail -n 1
!pip install -U langchain-ibm | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install "chromadb==0.3.26" | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1

### Client initialization

To authenthicate to IBM Cloud Pack for Data (with lighweight engine), you need to provide platform `url`, your `username` and `password`. 

**Note**: There is no need to create space or project and pass it to the credentials below.

In [20]:
username = 'PASTE YOUR USERNAME HERE'
password = 'PASTE YOUR PASSWORD HERE'
url = 'PASTE THE PLATFORM URL HERE'

In [21]:
from ibm_watsonx_ai import APIClient, Credentials


credentials = Credentials(
    username=username,
    password=password,
    url=url,
    instance_id="openshift",
    version="5.0"
)

# Initialize client
client = APIClient(credentials=credentials)

### List foundation and embedding models
To be able to see available foundation and embedding models, we will use previously initialized APIClient object. 

Get foundation models:

In [22]:
client.foundation_models.TextModels.show()

{'FLAN_T5_XL': 'google/flan-t5-xl'}


Get embedding models:

In [23]:
client.foundation_models.EmbeddingModels.show()

{'SLATE_125M_ENGLISH_RTRVR': 'ibm/slate-125m-english-rtrvr'}


<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [24]:
import wget
import os

filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
    wget.download(url, out=filename)

<a id="build_base"></a>
## Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [25]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Let's see content of first document.

In [26]:
texts[0].page_content

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'

Check the number of all separated documents.

In [27]:
len(texts)

42

The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. 

**Note**: To list available embedding models use ```client.foundation_models.EmbeddingModels.show()```

In [28]:
from ibm_watsonx_ai.foundation_models import Embeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes

embeddings = Embeddings(
    model_id=EmbeddingTypes.IBM_SLATE_125M_ENG,
    api_client=client
)

Example embedding query.

In [29]:
embeddings.embed_query("working with embeddings")[:5]

[-0.024377016, -0.025587596, -0.016414203, 0.03369574, -0.013791243]

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Defining model
You need to specify `model_id` that will be used for inferencing. 

**Note**: To list available foundation models use ```client.foundation_models.TextModels.show()```

In [40]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.FLAN_T5_XL

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [31]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `flan_t5_xl`. 

In [32]:
from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(model_id=model_id, params=parameters, api_client=client)

Test the model with an example question

In [33]:
model.generate_text("What is green and small?")

'frog'

In [34]:
from time import sleep

for x in model.generate_text_stream("What is GDP?"):
    print(x, end="")
    sleep(0.7)

gross domestic product

<a id="langchain_init"></a>
## Langchain - Initialize the WastonxLLM from langchain-ibm

No need to provide credentials - use previously created model (ModeInference)

In [35]:
from langchain_ibm import WatsonxLLM

watsonx_llm = WatsonxLLM(watsonx_model=model)

Check how far the Sun is from Earth.

In [36]:
watsonx_llm.invoke("How far is sun?")

'93 million miles'

<a id="predict"></a>
## Generate a retrieval-augmented response to a question using lagchain-ibm 
Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [37]:
from langchain.chains import RetrievalQA

vector_store = Chroma.from_documents(texts, embeddings)
qa = RetrievalQA.from_chain_type(llm=watsonx_llm, chain_type="stuff", retriever=vector_store.as_retriever())

Get answer from the loaded dataset.

In [38]:
query = "What did the president say about Ketanji Brown Jackson?"
qa.invoke(query)

{'query': 'What did the president say about Ketanji Brown Jackson?',
 'result': 'One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence'}

---

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!.
 
You learned how to answer question using RAG using watsonx & LangChain & lighweight engine.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Author

**Wojciech Rebisz**, Software Engineer at Watson Machine Learning.

Copyright © 2024-2025 IBM. This notebook and its source code are released under the terms of the MIT License.