![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content

This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

#### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data (test) loading](#data)
- [Foundation Models on watsonx](#models)
- [Basic information how to connect to Elasticsearch](#elastic_conn)
- **[Set up ElasticsearchStore (Langchain)](#elasticsearchstore)**
    - [Embed and index documents with Elasticsearch](#elasticsearchstore_index)
    - [Generate a retrieval-augmented response to a question](#predict)



<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics" target="_blank" rel="noopener no referrer">here</a>).


### Install and import dependecies

In [None]:
!pip install langchain | tail -n 1
!pip install elasticsearch | tail -n 1
!pip install sentence_transformers | tail -n 1
!pip install pandas | tail -n 1
!pip install rouge_score | tail -n 1
!pip install nltk | tail -n 1
!pip install wget | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install "ibm-watson-machine-learning>=1.0.327" | tail -n 1

In [2]:
import os, getpass
import pandas as pd
from typing import Optional, Any, Iterable, List

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see
[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

In [3]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [4]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## Data (test) loading

Download the test dataset. This dataset is used to calculate the metrics score for selected model, defined prompts and parameters.

In [5]:
import wget

questions_test_filename = 'questions_test.csv'
questions_train_filename = 'questions_train.csv'
questions_test_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_test.csv'
questions_train_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_train.csv'


if not os.path.isfile(questions_test_filename): 
    wget.download(questions_test_url, out=questions_test_filename)


if not os.path.isfile(questions_train_filename): 
    wget.download(questions_train_url, out=questions_train_filename)

In [6]:
filename_test = './questions_test.csv'
filename_train =  './questions_train.csv'

test_data = pd.read_csv(filename_test)
train_data = pd.read_csv(filename_train)

Inspect data sample

In [7]:
train_data.head()

Unnamed: 0,qid,question,answers
0,1961,where does diffusion occur in the excretory sy...,diffusion
1,7528,when did the us join world war one,"April 6 , 1917"
2,8685,who played wilma in the movie the flintstones,Elizabeth Perkins
3,6716,when was the office of the vice president created,1787
4,2916,where does carbon fixation occur in c4 plants,in the mesophyll cells


### Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use [SentenceTransformers](https://www.google.com/search?client=safari&rls=en&q=sentencetransformers&ie=UTF-8&oe=UTF-8) [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) to embed both the knowledge base passages and user queries. `all-MiniLM-L6-v2` is a performant open-source model that is small enough to run locally.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses [Elasticsearch](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example), a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search. It is built on top of the Apache Lucene library, which offers good speed and performance with all-MiniLM-L6-v2 embedding model.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 256 tokens for `all-MiniLM-L6-v2`).

### Load knowledge base documents

Load set of documents used further to build knowledge base. 

In [8]:
knowledge_base_dir = "./knowledge_base"

In [9]:
my_path = f"{os.getcwd()}/knowledge_base"
if not os.path.isdir(my_path):
   os.makedirs(my_path)

In [10]:
documents_filename = 'knowledge_base/psgs.tsv'
documents_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/psgs.tsv'


if not os.path.isfile(documents_filename): 
    wget.download(documents_url, out=documents_filename)

In [51]:
documents = pd.read_csv(f"{knowledge_base_dir}/psgs.tsv", sep='\t', header=0)
documents['indextext'] = documents['title'].astype(str) + "\n" + documents['text']
documents = documents[:1000]

### Create an embedding function

Note that you can feed a custom embedding function to be used by Elasticsearch. The performance of Elasticsearch may differ depending on the embedding model used.

In [None]:
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.embeddings.base import Embeddings

emb_func = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

<a id="models"></a>
## Foundation Models on watsonx

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [13]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.FLAN_UL2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [14]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 50
}

### Initialize the `Model` class.

In [15]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)

<a id="elastic_conn"></a>
## Basic information how to connect to Elasticsearch (applies to both scenarios)

**This notebook focuses on self-managed cluster using <a href="https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-getting-started" target="_blank" rel="noopener no referrer">IBM Cloud® Databases for Elasticsearch.</a>**

By default Elasticsearch will start with security features like authentication and TLS enabled. To connect to the Elasticsearch cluster you’ll need to configure the Python Elasticsearch client to use HTTPS with the generated CA certificate in order to make requests successfully. Details can be found <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new" target="_blank" rel="noopener no referrer">here</a>. In this notebook certificate fingerprints will be used for authentication. 

**Verifying HTTPS with certificate fingerprints (Python 3.10 or later)** If you don’t have access to the generated CA file from Elasticsearch you can use the following script to output the root CA fingerprint of the Elasticsearch instance with openssl s_client <a href="https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later" target="_blank" rel="noopener no referrer"> (docs)</a>:

Replace the values of 'hostname' and '9200' to the
corresponding host and port values for the cluster.
    
    openssl s_client -connect hostname:9200 -showcerts </dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin
    
The output of openssl x509 will look something like this:

SHA256 Fingerprint=A5:2D:D9:35:11:E8:C6:04:5E:21:F1:66:54:B7:7C:9E:E0:F3:4A:EA:26:D9:F4:03:20:B5:31:C4:74:67:62:28

Copy this value and this is your `ssl_assert_fingerprint`

<a id="elasticsearchstore"></a>
## Set up ElasticsearchStore (Langchain)

Upserting a document means update the document even if it exists in the database. Otherwise re-inserting a document throws an error. This is useful for experimentation purpose.

In [46]:
from langchain.vectorstores.elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch

class ElasticWrapper:
    
    def __init__(
            self,
            url: str,
            es_user: str,
            es_password: str,
            index_name: str,
            embedding_function: Embeddings,
            ssl_assert_fingerprint: str,
            name: Optional[str] = "watsonx_rag_collection"
    ):
        self._embedding_function = embedding_function
        self._name = name
        self._index_name = index_name
        print("Connecting to Elasticsearch...")
        es_connection = Elasticsearch([url], basic_auth=(es_user, es_password), request_timeout=None, 
                                    ssl_assert_fingerprint=ssl_assert_fingerprint)
        
        self._db = ElasticsearchStore(
            es_connection=es_connection,
            index_name=self._index_name,
            embedding=embedding_function
        )
        print("Connected")
        

    def upsert_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore.
        Args:
            :param texts (Iterable[str]): Texts to add to the vectorstore.
            :param metadatas (Optional[List[dict]], optional): Optional list of metadatas.
            :param ids (Optional[List[str]], optional): Optional list of IDs.
        Returns:
            List[str]: List of IDs of the added texts.
        """
        # Adding metadata to documents
        print("Uploading texts...")
        ids = self._db.add_texts(
            texts, metadatas=metadatas, index_name=self._index_name, ids=ids 
        )
        print("Uploading completed")
        return ids
   

    def query(self, query_texts:str, n_results:int=5):
        """
        Returns the closests vector to the question vector
        :param query_texts: the question
        :param n_results: number of results to generate
        :return: the closest result to the given question
        """
        return self._db.similarity_search_with_score(query=query_texts, k=n_results)
    
    def get_retriver(self):
        """
        Returns the closests vector to the question vector
        :param query_texts: the question
        :param n_results: number of results to generate
        :return: the closest result to the given question
        """
        return self._db.as_retriever()

In [47]:
elasticsearch_store = ElasticWrapper(
    name=f"nq910_minilm6v2",
    url="<ENTER ELASTICSEARCH INSTANCE URL>",
    es_user="<ENTER ELASTICSEARCH USER>",
    es_password="<ENTER ELASTICSEARCH INSTANCE PASSWORD>",
    index_name="test_index",
    ssl_assert_fingerprint="<ENTER CERTIFICATE FINGERPRINT>",
    embedding_function=emb_func  # you can have something here using /embed endpoint
)

Connecting to Elasticsearch...
Connected


<a id="elasticsearchstore_index"></a>
### Embed and index documents with Elasticsearch

**Note: Could take several minutes if you don't have pre-built indices**

In [18]:
_ = elasticsearch_store.upsert_texts(
    texts=documents.indextext.tolist(),
    # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadata=[{'title': title, 'id': doc_id}
                for (title, doc_id) in
                zip(documents.title, documents.id)],  # filter on these!
    ids=[str(i) for i in documents.id],  # unique for each doc
)

Uploading texts...
Uploading completed


<a id="predict"></a>
## Generate a retrieval-augmented response to a question

`RetrievalQA` is a chain to do question answering.

**Hint:** To use Chain interface from LangChain with watsonx.ai models you must call `model.to_langchain()` method. 

It returns `WatsonxLLM` wrapper compatible with LangChain CustomLLM specification.

### Select questions

Get questions from the previously loaded test dataset.

In [49]:
questions_and_answers = [
            ('names of founding fathers of the united states?', "Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton"),
            ('who played in the super bowl in 2013?', 'Baltimore Ravens::San Francisco 49ers'),
            ('when did bucharest become the capital of romania?', '1862')
            ]

### Retrieve relevant context

Fetch paragraphs similar to the question

In [48]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=model.to_langchain(), chain_type="stuff", retriever=elasticsearch_store.get_retriver(), return_source_documents=True)

In [50]:
results = []

for question, _ in questions_and_answers:
    result = qa({"query": question})
    results.append(result)

Get the set of chunks for one of the questions.

In [45]:
for idx, result in enumerate(results):
    print("=========")
    print("Question = ", result['query'])
    print("Answer = ", result['result'])
    print("Expected Answer(s) (may not be appear with exact wording in the dataset) = ", questions_and_answers[idx][1])
    print("\n")
    print("Source documents:")
    print(*(x.page_content for x in result['source_documents']), sep='\n')
    print("\n")
    

Question =  names of founding fathers of the united states?
Answer =  John Adams , Benjamin Franklin , Alexander Hamilton , John Jay , Thomas Jefferson , James Madison , and George Washington
Expected Answer(s) (may not be appear with exact wording in the dataset) =  Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton


Source documents:
Founding Fathers of the United States
^ Burstein , Andrew . `` Politics and Personalities : Garry Wills takes a new look at a forgotten founder , slavery and the shaping of America '' , Chicago Tribune ( November 09 , 2003 ) : `` Forgotten founders such as Pickering and Morris made as many waves as those whose faces stare out from our currency . '' ^ Jump up to : Rafael , Ray . The Complete Idiot 's Guide to the Founding Fathers : And the Birth of Our Nation ( Penguin , 2011 ) . Jump up ^ `` Founding Fathers : Virginia '' . FindLaw Constitutional Law Center . 2008 . Retrieved 2008 - 11 - 14 . 

---

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.