# Use watsonx and Langchain to answer questions about Think 2024 (RAG)

In this tutorial, we’ll use LangChain to walk through a step-by-step Retrieval Augmented Generation ([RAG](https://research.ibm.com/blog/retrieval-augmented-generation-RAG)) example in Python. For our use case, we’ll be setting up a RAG system for [IBM Think 2024](https://www.ibm.com/events/think). IBM Think 2024 is a conference where IBM announces new products, technologies, and partnerships.

RAG is a technique in natural language processing (NLP) that combines information retrieval and generative models to produce more accurate, relevant and contextually aware responses. 

In traditional language generation tasks, [large language models](https://www.ibm.com/topics/large-language-models) (LLMs) like OpenAI’s GPT-3.5 (Generative Pre-trained Transformer) or [IBM’s Granite Models](https://www.ibm.com/products/watsonx-ai/foundation-models) are used to construct responses based on an input prompt. However, these models may struggle to produce responses that are contextually relevant, factually accurate or up to date. The models may not know the latest information on IBM Think 2024. RAG applications address this limitation by incorporating a retrieval step before response generation. During retrieval, [vector search](https://www.ibm.com/topics/vector-search) can be used to identify contextually pertinent information, such as relevant information or documents from a large corpus of text, typically stored in a [vector database](https://www.ibm.com/topics/vector-database). Finally, an LLM is used to generate a response based on the retrieved context.

LangChain is a powerful, open-source framework that facilitates the development of applications using LLMs for various NLP tasks. In the context of RAG, LangChain plays a critical role by combining the strengths of retrieval-based methods and generative models to enhance the capabilities of NLP systems.

For this tutorial, we will fetch content from several IBM.com websites which will make up a knowledge base from which we will provide an LLM with context to answer some questions about Think 2024.

This Jupyter Notebook is available on [GitHub](https://github.com/Erika-Russi/think/blob/main/tutorials/langchain/langchain-rag.ipynb).

# Prerequisites

You need an [IBM Cloud account](https://cloud.ibm.com/registration?utm_source=ibm_developer&utm_content=in_content_link&utm_id=tutorials_awb-implement-xgboost-in-python&cm_sp=ibmdev-_-developer-_-trial) to create a [watsonx.ai](https://www.ibm.com/products/watsonx-ai?utm_source=ibm_developer&utm_content=in_content_link&utm_id=tutorials_awb-implement-xgboost-in-python&cm_sp=ibmdev-_-developer-_-product) project.

# Steps

## Step 1. Set up your environment

While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook. Jupyter Notebooks are widely used within [data science](https://www.ibm.com/topics/data-science) to combine code, text, images, and [data visualizations](https://www.ibm.com/topics/data-visualization) to formulate a well-formed analysis.

1. Log in to [watsonx.ai](https://dataplatform.cloud.ibm.com/registration/stepone) using your IBM Cloud account.

2. Create a [watsonx.ai project](https://www.ibm.com/docs/en/watsonx/saas?topic=projects-creating-project#create-a-project).
    - Note the `project_id` in project > General > Project ID. You'll need this for the tutorial.

3. Create a [Jupyter Notebook](https://www.ibm.com/docs/en/watsonx/saas?topic=editor-creating-notebooks).

This step will open a Notebook environment where you can copy the code from this tutorial to implement a RAG application for Think 2024.

## Step 2. Set up a Watson Machine Learning (WML) Service Instance and API Key

1. Create a [Watson Machine Learning (WML) Service](https://cloud.ibm.com/catalog/services/watson-machine-learning) instance (a free plan is offered and information about how to create the instance can be found here).

2. Generate an API Key in WML. Manage > Access (IAM) > API Keys (left hand panel) > Create +. Save this API key for use in this tutorial.

3. Associate the WML service to the project you created in [watsonx.ai](https://dataplatform.cloud.ibm.com/). Project > Services & Integrations > + Associate Service > Add "Watson Machine Learning"

## Step 3. Install and import relevant libraries

We'll need a few libraries for this tutorial. Make sure to import the ones below, and if they're not installed, you can resolve this with a quick pip install.

In [1]:
#installations
%pip install langchain
%pip install langchain_chroma
%pip install langchain-community
%pip install beautifulsoup4
%pip install lxml
%pip install sentence-transformers
%pip install selenium
%pip install unstructured
%pip install "ibm-watson-machine-learning>=1.0.327"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;4

In [2]:
#imports
import os
import json
import getpass

from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams


from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import WatsonxLLM
from langchain.vectorstores import Chroma

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

from langchain_community.document_loaders import SeleniumURLLoader

from langchain_text_splitters import RecursiveCharacterTextSplitter

In [3]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

In [4]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

##  Step 4. Indexing

We’ll index our Think 2024 specific articles from URLs to create a knowledge base as a vectorstore. The content from these URLs will be our data sources and context for this exercise. The first step to building vector embeddings is to clean and process the raw dataset. This may involve the removal of noise and standardization of the text. For our example, we won’t do any cleaning since the text is already cleaned and standardized.

First, let's establish `URLS_DICTIONARY`. `URLS_DICTIONARY` helps us map the URLs from which we will be extracting the content. Let's also set up a name for our collection: `askibm_think_2024`.

In [5]:
URLS_DICTIONARY = {
    "ibm.com_events_think_faq.html": "https://www.ibm.com/events/think/faq",
    "events_think_agenda.html": "https://www.ibm.com/events/think/agenda",
    "products_watsonx_ai.html": "https://www.ibm.com/products/watsonx-ai",
    "products_watsonx_ai_foundation_models.html": "https://www.ibm.com/products/watsonx-ai/foundation-models",
    "watsonx_pricing.html": "https://www.ibm.com/watsonx/pricing",
    "watsonx.html": "https://www.ibm.com/watsonx",
    "products_watsonx_data.html": "https://www.ibm.com/products/watsonx-data",
    "products_watsonx_assistant.html": "https://www.ibm.com/products/watsonx-assistant",
    "products_watsonx_code_assistant.html": "https://www.ibm.com/products/watsonx-code-assistant",
    "products_watsonx_orchestrate.html": "https://www.ibm.com/products/watsonx-orchestrate",
    "products_watsonx_governance.html": "https://www.ibm.com/products/watsonx-governance",
    "granite_code_models_open_source.html": "https://research.ibm.com/blog/granite-code-models-open-source",
    "red_hat_enterprise_linux_ai.html": "https://www.redhat.com/en/about/press-releases/red-hat-delivers-accessible-open-source-generative-ai-innovation-red-hat-enterprise-linux-ai",
    "model_choice.html": "https://www.ibm.com/blog/announcement/enterprise-grade-model-choices/",
    "democratizing.html": "https://www.ibm.com/blog/announcement/democratizing-large-language-model-development-with-instructlab-support-in-watsonx-ai/",
    "ibm_consulting_expands_ai.html": "https://newsroom.ibm.com/Blog-IBM-Consulting-Expands-Capabilities-to-Help-Enterprises-Scale-AI",
    "ibm_data_product_hub.html": "https://www.ibm.com/products/data-product-hub",
    "ibm_price_performance_data.html": "https://www.ibm.com/blog/announcement/delivering-superior-price-performance-and-enhanced-data-management-for-ai-with-ibm-watsonx-data/",
    "ibm_bi_adoption.html": "https://www.ibm.com/blog/a-new-era-in-bi-overcoming-low-adoption-to-make-smart-decisions-accessible-for-all/",
    "watsonx_code_assistant_for_z.html": "https://www.ibm.com/blog/announcement/ibm-watsonx-code-assistant-for-z-accelerate-the-application-lifecycle-with-generative-ai-and-automation/",
    "code_assistant_for_java.html": "https://www.ibm.com/blog/announcement/watsonx-code-assistant-java/",
    "code_assistant_for_orchestrate.html": "https://www.ibm.com/blog/announcement/watsonx-orchestrate-ai-z-assistant/",
    "accelerating_gen_ai.html": "https://newsroom.ibm.com/Blog-How-IBM-Cloud-is-Accelerating-Business-Outcomes-with-Gen-AI",
    "watsonx_open_source.html": "https://newsroom.ibm.com/2024-05-21-IBM-Unveils-Next-Chapter-of-watsonx-with-Open-Source,-Product-Ecosystem-Innovations-to-Drive-Enterprise-AI-at-Scale",
    "ibm_concert.html": "https://www.ibm.com/products/concert",
    "ibm_consulting_advantage_news.html": "https://newsroom.ibm.com/2024-01-17-IBM-Introduces-IBM-Consulting-Advantage,-an-AI-Services-Platform-and-Library-of-Assistants-to-Empower-Consultants",
    "ibm_consulting_advantage_info.html": "https://www.ibm.com/consulting/info/ibm-consulting-advantage"
}
COLLECTION_NAME = "askibm_think_2024"

Next, let's load our documents using the LangChain SeleniumURLLoader for the list of URLs we have. We'll print a sample document at the end to see how it's been loaded.

In [6]:
documents = []

for url in list(URLS_DICTIONARY.values()):
    loader = SeleniumURLLoader(urls=[url])
    data = loader.load()
    documents += data

#show sample document
documents[0]

Document(page_content="FAQ\n\nView session catalog\n\nEvent information\n\nThink 2024 will be held in Boston, MA.\xa0 It will open with an exclusive, partner-only event on 20 May, filled with inspiration, networking, knowledge-sharing and business value for IBM’s global ecosystem of partners. Then, on 21–23 May, we will bring together senior business and technology leaders from across industries for two and a half days of exploration, discussion and innovation.\n\nThink 2024 will host senior business and technology leaders from across industries. Content will be geared toward C-level, line of business and senior IT leaders.\n\nThink 2024 programming will be held at the Boston Convention & Exhibition Center (BCEC), with some activities at the Omni Boston Hotel at the Seaport.\n\nAt IBM, we are committed to sustainability and environmentally responsible event planning. We are proud to partner with two distinguished venues, each known for their exemplary sustainable practices. Our event w

Based on the sample document, it looks like there's a lot of white space and new line characters that we can get rid of. Let's clean that up and add some metadata to our documents, including an id number and the source of the content.

In [8]:
doc_id = 0
for doc in documents:
    doc.page_content = " ".join(doc.page_content.split()) # remove white space

    doc.metadata["id"] = doc_id #make a document id and add it to the document metadata

    print(doc.metadata)
    doc_id += 1

{'source': 'https://www.ibm.com/events/think/faq', 'title': 'Think 2024 FAQ | IBM', 'description': 'Think FAQ Questions and Answers.', 'language': 'en-us', 'id': 0}
{'source': 'https://www.ibm.com/events/think/agenda', 'title': 'IBM Think 2024 Agenda', 'description': "IBM's Think 2024 Agenda on  20–23 May 2024, Boston, MA.", 'language': 'en-us', 'id': 1}
{'source': 'https://www.ibm.com/products/watsonx-ai', 'title': 'IBM watsonx.ai', 'description': 'Build in our new studio for foundation models, generative AI and machine learning with watsonx.ai.', 'language': 'en-us', 'id': 2}
{'source': 'https://www.ibm.com/products/watsonx-ai/foundation-models', 'title': 'Foundation Models - IBM watsonx.ai', 'description': 'Explore the family of language and code foundation models within the watsonx platform. ', 'language': 'en-us', 'id': 3}
{'source': 'https://www.ibm.com/watsonx/pricing', 'title': 'IBM watsonx | Pricing', 'description': 'IBM watsonx.ai pricing', 'language': 'en-us', 'id': 4}
{'sou

Let's see how our sample document looks now after we cleaned it up:

In [9]:
documents[0]

Document(page_content="FAQ View session catalog Event information Think 2024 will be held in Boston, MA. It will open with an exclusive, partner-only event on 20 May, filled with inspiration, networking, knowledge-sharing and business value for IBM’s global ecosystem of partners. Then, on 21–23 May, we will bring together senior business and technology leaders from across industries for two and a half days of exploration, discussion and innovation. Think 2024 will host senior business and technology leaders from across industries. Content will be geared toward C-level, line of business and senior IT leaders. Think 2024 programming will be held at the Boston Convention & Exhibition Center (BCEC), with some activities at the Omni Boston Hotel at the Seaport. At IBM, we are committed to sustainability and environmentally responsible event planning. We are proud to partner with two distinguished venues, each known for their exemplary sustainable practices. Our event will take place at the 

We need to split up our text into smaller, more manageable pieces known as "chunks". LangChain's `RecursiveCharacterTextSplitter` takes a large text and splits it based on a specified chunk size using a predefined set of characters. The default characters are ["\n\n", "\n", " ", ""].

The process starts by attempting to split the text using the first character, \n\n. If the resulting chunks are still too large, it moves to the next character, \n, and tries splitting again. This continues with each character in the set until the chunks are smaller than the specified chunk size.

We settled on a chunk size of 512 after experimenting with a chunk size of 1000. When the chunks were that large, our model was getting too much context for question-answering, so we changed it to smaller chunks. Feel free to experiment with chunk size further!


In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

Next, we choose an embedding model to be trained on our Think 2024 dataset. The trained embedding model is used to generate embeddings for each data point in the dataset. For text data, popular open-source embedding models include Word2Vec, GloVe, FastText or pre-trained transformer-based models like BERT or RoBERTa. OpenAIembeddings may also be used by leveraging the OpenAI embeddings API endpoint and getting an `openai_api_key`, however, there is a cost associated with this usage.

Unfortunately, because the embedding models are so large, vector embedding often demands significant computational resources like a GPU. We can greatly lower the costs linked to embedding vectors, while preserving performance and accuracy by using Huggingface embeddings.

Huggingface is an NLP library that provides a vast array of pre-trained models and embeddings. These embeddings, generated from models like BERT, GPT and RoBERTa, encapsulate semantic information from text. Unlike traditional embedding methods that necessitate training from scratch, Huggingface embeddings offer precomputed representations that can be immediately used for various NLP tasks.

In [11]:
embeddings = HuggingFaceEmbeddings()

  from tqdm.autonotebook import tqdm, trange


Let's load our content into a local instance of a vector database, using Chroma.

In [12]:
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings)

Python-dotenv could not parse statement starting at line 3
Python-dotenv could not parse statement starting at line 3
Python-dotenv could not parse statement starting at line 3
Python-dotenv could not parse statement starting at line 3
Python-dotenv could not parse statement starting at line 3
Python-dotenv could not parse statement starting at line 3


Let's do a quick search of our vector database to test it out! Using `similarity_search_with_score` allows us to return the documents and the distance score of the query to them. The returned distance score is Euclidean distance. Therefore, a lower score is better.

In [13]:
prompt = "What is IBM concert?"
search = vectorstore.similarity_search_with_score(prompt)
search

[(Document(page_content='Book a live demo IBM Concert Simplify and optimize your app management and technology operations with generative AI-driven insights Book a live demo Prioritize and act on the most significant risks to your business applications Explore risk management with IBM Concert Keep applications continuously compliant, even as they evolve Explore compliance management with IBM Concert What is IBM Concert? Get the overview IBM® Concert® puts you in control, so you can simplify and optimize your operations to focus on', metadata={'description': 'IBM Concert is a generative AI-driven technology automation platform that puts you in control, enabling you to simplify and optimize your operations—across your environments.', 'id': 24, 'language': 'en-us', 'source': 'https://www.ibm.com/products/concert', 'title': 'IBM Concert'}),
  0.585679292678833),
 (Document(page_content='responsive in addressing issues and solving problems before they happen. Concert will initially focus on

In [14]:
prompt = "Where is Think 2024?"
search = vectorstore.similarity_search_with_score(prompt)
search

[(Document(page_content='industries. Content will be geared toward C-level, line of business and senior IT leaders. Think 2024 programming will be held at the Boston Convention & Exhibition Center (BCEC), with some activities at the Omni Boston Hotel at the Seaport. At IBM, we are committed to sustainability and environmentally responsible event planning. We are proud to partner with two distinguished venues, each known for their exemplary sustainable practices. Our event will take place at the Boston Convention & Exhibition', metadata={'description': 'Think FAQ Questions and Answers.', 'id': 0, 'language': 'en-us', 'source': 'https://www.ibm.com/events/think/faq', 'title': 'Think 2024 FAQ | IBM'}),
  0.6141491532325745),
 (Document(page_content='Think 2024 Think 2024 What you can expect at a Think event May 20, 2024 Monday: IBM Partner Plus Day Programming for Business Partners May 21, 2024 Tuesday: Think programming kicks off Event registration begins. Scaling your business with AI a

## Step 5. Set up a retriever

We'll set up our vector store as a retriever. The retrieved information from the vector store serves as additional context or knowledge that can be used by a generative model.

In [15]:
retriever = vectorstore.as_retriever()


## Step 6. Generate a response with a Generative Model

Finally, we’ll generate a response. The generative model (like GPT-4 or IBM Granite) uses the retrieved information to produce a more accurate and contextually relevant response to our questions about Think 2024.

First, we'll establish which LLM we're going to use to generate the response. For this tutorial, we'll use IBM's Granite 13B Chat model. 

In [16]:
model_id = ModelTypes.GRANITE_13B_CHAT_V2

In [17]:
parameters = {
    GenParams.DECODING_METHOD: 'greedy',
    GenParams.MIN_NEW_TOKENS: 10,
    GenParams.MAX_NEW_TOKENS: 512,
    GenParams.REPETITION_PENALTY:1.2,
    GenParams.RETURN_OPTIONS: {'input_tokens': True,'generated_tokens': True, 'token_logprobs': True, 'token_ranks': True, }
}

In [18]:
llm = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

  warn_deprecated(


We'll set up a `prompttemplate` to ask multiple questions. The "context" will be derived from our retriever (our vector database) with the relevant documents and the "question" will be derived from the user query.

In [19]:
template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

Let's set up a helper function to format the docs accordingly:

In [20]:
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

And now we can set up a chain with our context, our prompt and out LLM. The generative model processes the augmented context along with the user's question to produce a response.

In [21]:
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

And now we can ask multiple questions:

In [22]:
chain.invoke("Where is Think 2024?")

'\nAnswer: The Think 2024 event will be held at the Boston Convention & Exhibition Center (BCEC) in Boston, Massachusetts.'

In [23]:
chain.invoke("What is IBM Concert?")

'A) A system that helps users manage their personal finances'

In [24]:
chain.invoke("What is IBM Think 2024?")

"A: IBM's annual conference that brings together senior business and technology leaders from across industries for discussions and innovations related to AI, hybrid cloud, and automation."

And that's it! Feel free to ask even more questions!

You can imagine a situation where we can create chatbots to field these questions.

We encourage you to check out the [LangChain documentation page](https://python.langchain.com/v0.2/docs/tutorials/rag/) for more information and tutorials on RAG.