![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
Use watsonx, and Elasticsearch Python SDK to answer questions (RAG)

### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

# Notebook content

This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

# Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Data (test) loading](#data)
- [Foundation Models on watsonx](#models)
- [Basic information how to connect to Elasticsearch (applies to both scenarios)](#elastic_conn)
- **[Retrieval augmented generation using Elasticsearch (Python Client)](#elastic)**
    - [Create index](#mapping)
    - [Index data into Elasticsearch](#index_data)
    - [Run semantic knn search queries](#knn)
    - [Calculate rougeL metric](#elastic_score)
    



<a id="setup"></a>
# Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


## Install and import dependecies

In [None]:
!pip install "langchain==0.0.340" | tail -n 1
!pip install elasticsearch | tail -n 1
!pip install sentence_transformers | tail -n 1
!pip install pandas | tail -n 1
!pip install rouge_score | tail -n 1
!pip install nltk | tail -n 1
!pip install wget | tail -n 1
!pip install evaluate | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install "ibm-watsonx-ai>=0.1.2" | tail -n 1

In [None]:
import os
import getpass
from dotenv import load_dotenv
import pandas as pd

## watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [None]:
try:
    load_dotenv()
    api_key = os.getenv("API_KEY")
except Exception:
    api_key = getpass.getpass("Please enter your api key (hit enter): ")
ibm_cloud_url = "https://us-south.ml.cloud.ibm.com"
project_id = os.getenv("PROJECT_ID")

if api_key is None or ibm_cloud_url is None or project_id is None:
    raise Exception("Ensure you copied the .env file that you created earlier into the same directory as this notebook")
else:
    creds = {
        "url": ibm_cloud_url,
        "apikey": api_key 
    }

<a id="data"></a>
# Data (test) loading

Download the test dataset. This dataset is used to calculate the metrics score for selected model, defined prompts and parameters.

In [None]:
import wget

questions_test_filename = 'questions_test.csv'
questions_train_filename = 'questions_train.csv'
questions_test_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_test.csv'
questions_train_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_train.csv'


if not os.path.isfile(questions_test_filename): 
    wget.download(questions_test_url, out=questions_test_filename)


if not os.path.isfile(questions_train_filename): 
    wget.download(questions_train_url, out=questions_train_filename)

In [None]:
filename_test = './questions_test.csv'
filename_train =  './questions_train.csv'

test_data = pd.read_csv(filename_test)
train_data = pd.read_csv(filename_train)

Inspect data sample

In [None]:
train_data.head()

## Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use <a href="https://www.sbert.net/" target="_blank" rel="noopener no referrer">SentenceTransformers</a> <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank" rel="noopener no referrer">all-MiniLM-L6-v2</a> to embed both the knowledge base passages and user queries. `all-MiniLM-L6-v2` is a performant open-source model that is small enough to run locally.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses <a href="https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example" target="_blank" rel="noopener no referrer">Elasticsearch</a>, a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search. It is built on top of the Apache Lucene library, which offers good speed and performance with all-MiniLM-L6-v2 embedding model.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 256 tokens for `all-MiniLM-L6-v2`).

## Load knowledge base documents

Load set of documents used further to build knowledge base. 

In [None]:
knowledge_base_dir = "./knowledge_base"

In [None]:
my_path = f"{os.getcwd()}/knowledge_base"
if not os.path.isdir(my_path):
   os.makedirs(my_path)

In [None]:
documents_filename = 'knowledge_base/psgs.tsv'
documents_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/psgs.tsv'


if not os.path.isfile(documents_filename): 
    wget.download(documents_url, out=documents_filename)

In [None]:
documents = pd.read_csv(f"{knowledge_base_dir}/psgs.tsv", sep='\t', header=0)
documents['indextext'] = documents['title'].astype(str) + "\n" + documents['text']
documents = documents[:1000]

## Create an embedding function

Note that you can feed a custom embedding function to be used by Elasticsearch. The performance of Elasticsearch may differ depending on the embedding model used.

In [None]:
from langchain.embeddings import SentenceTransformerEmbeddings

emb_func = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

<a id="models"></a>
# Foundation Models on watsonx

## Defining model
You need to specify `model_id` that will be used for inferencing:

In [None]:
# List available models
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
print([model.name for model in ModelTypes])

In [None]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.MIXTRAL_8X7B_INSTRUCT_V01_Q

## Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [None]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 50
}

## Initialize the `ModelInference` class.

In [None]:
from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=model_id,
    params=parameters,
    credentials=creds,
    project_id=project_id
)

<a id="elastic_conn"></a>
# Basic information how to connect to Elasticsearch 

**This notebook focuses on self-managed cluster using <a href="https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-getting-started" target="_blank" rel="noopener no referrer">IBM Cloud® Databases for Elasticsearch.</a>**

The following cell retrieves the Elasticsearch users, password, host and port from the environment if available and prompts you otherwise.

Connect to Elasticsearch

In [None]:
from elasticsearch import Elasticsearch

elastic_client = Elasticsearch(
    hosts = [os.getenv('ES_URL')],
    verify_certs=False,
    api_key=os.getenv('ES_API_KEY'),
    request_timeout=None
)

elastic_client.health_report()

<a id="elastic"></a>
Retrieval augmented generation using Elasticsearch (Python SDK)

In this scenario the same embedding function `all-MiniLM-L6-v2` will be used.

In [None]:
dims = emb_func.client.get_sentence_embedding_dimension()
dims

<a id="mapping"></a>
## Create index
To create Elasticsearch index necessary mappings need to be created. This will enable index the data into Elasticsearch.

Field `dense_vector` is a special type that allows to store dense vectors in this case `embedding` in Elasticsearch.

In [None]:
import uuid

index_name = "elastic_knn_index"
index_name = f"{index_name}_{str(uuid.uuid4())[:5]}"

mapping = {
        "properties": {
                "text": {
                        "type": "text"
                    },
                "embedding": {
                        "type": "dense_vector",
                        "dims": dims,
                        "index": True,
                        "similarity": "l2_norm"
                    }
            }
    }

In [None]:
if elastic_client.indices.exists(index=index_name):
    elastic_client.indices.delete(index=index_name)
    
elastic_client.indices.create(index=index_name, mappings=mapping)

<a id="index_data"></a>
## Index data into Elasticsearch

The following function generates the required bulk actions that can be passed to Elasticsearch's Bulk API, so we can index multiple documents efficiently. To perform semantic search, we need to encode queries with the same embedding model used to encode the documents at index time.

In [None]:
texts = documents.indextext.tolist()
embedded_docs = emb_func.embed_documents(texts)

In [None]:
from elasticsearch.helpers import bulk

document_list = []
batch_size=500
requests = []
for i, (text, vector) in enumerate(zip(texts, embedded_docs)):
    document = {"_id": i, "embedding": vector, 'text': text}
    document_list.append(document)
    if i % batch_size == batch_size-1:
        success, failed = bulk(elastic_client, document_list, index=index_name)
        document_list = []

elastic_client.indices.refresh(index=index_name)

## Select questions

Get questions from the previously loaded test dataset.

In [None]:
questions_and_answers = [
            ('names of founding fathers of the united states?', "Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton"),
            ('who played in the super bowl in 2013?', 'Baltimore Ravens::San Francisco 49ers'),
            ('when did bucharest become the capital of romania?', '1862')
            ]

<a id="knn"></a>
# Run semantic search queries

Now it's time to run queries against our Elasticsearch index using our encoded question. We'll be doing a k-nearest neighbors search, using the Elasticsearch kNN query option. Argument k stands for a number of nearest neighbors to return as top hits. Set minimal similarity score to 0.45 

In [None]:
relevant_contexts = []

for question_text, _ in questions_and_answers:
    embedded_question = emb_func.embed_query(question_text)
    relevant_chunks = elastic_client.search(
          index=index_name,
          knn={
            "field": "embedding",
            "query_vector": embedded_question,
            "k": 4,
            "num_candidates": 50,
            },
          _source=[
                    "text"
                  ]                       
    )
    relevant_contexts.append(relevant_chunks)

In [None]:
relevant_context = relevant_contexts[0]
hits = relevant_context['hits']['hits']
for hit in hits:
    print("=========")
    print("Paragraph index : ", hit["_id"])
    print("Paragraph : ", hit["_source"]['text'])
    print("Distance : ",  hit["_score"])
            

## Feed the context and the questions to `watsonx.ai` model.

In [None]:
def make_prompt(context, question_text):
    return (f"Please answer the following.\n"
          + f"{context}:\n\n"
          + f"{question_text}")

In [None]:
prompt_texts = []

for relevant_context, (question_text, _) in zip(relevant_contexts, questions_and_answers):
    hits = [hit for hit in relevant_context["hits"]["hits"]]
    context = "\n\n\n".join([rel_ctx["_source"]['text'] for rel_ctx in hits])
    prompt_text = make_prompt(context, question_text)
    prompt_texts.append(prompt_text)

In [None]:
print(prompt_texts[0])

## Generate a retrieval-augmented response with watsonx.ai model

In [None]:
results = []

for prompt_text in prompt_texts:
    results.append(model.generate_text(prompt_text))

In [None]:
for idx, result in enumerate(results):
    print('*******************************************')
    print("Question = ", questions_and_answers[idx][0])
    print("Answer = ", result)
    print(">>>> Expected Answer(s) (may not be appear with exact wording in the dataset) = ",  questions_and_answers[idx][1])
    print("\n")

<a id="score"></a>
# Calculate rougeL metric 
In this sample notebook `evaluate` module from HuggingFace was used for rougeL calculation.

In [None]:
from evaluate import load

rouge = load('rouge')
scores = rouge.compute(predictions=results, references=[answer for _, answer in questions_and_answers])
print(scores)

---

Copyright © 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.