
## Building a Conversational Chatbot with Custom Data

This notebook guides you through the creation of a chatbot tailored to your specific data needs. We utilize Elastic Search as our vector storage solution and incorporate the "meta-llama/llama-2-13b-chat" model from WatsonX's Large Language Models (LLM). The `langchain` library plays a crucial role in this process, aiding in tasks like chunking documents, indexing data in Elastic Search, managing conversation chains with memory buffers, and crafting prompt templates.

### Key Features:

- **PDF Content Processing**: When users upload PDF files, the notebook extracts the text, segments it into manageable chunks, and indexes these chunks in Elastic Search using an appropriate `elser` model.
- **Data-Driven Query Handling**: Users can pose questions to the chatbot, which searches the indexed data for relevant answers.
- **Integrating Elastic Search and WatsonX LLMs**: We leverage `langchain`'s capabilities to link Elastic Search indexing with WatsonX's LLMs, enabling a seamless conversational experience with memory and retrieval functionalities.
- **Hallucination Check**: The notebook includes a mechanism to detect and correct any hallucinations or inaccuracies in the LLM's responses.

### Prerequisites for Running the Notebook:


1. **Library Requirements**: Confirm that you have installed all libraries specified in the `requirements.txt` file.
2. **Elastic Search ELSER Model Setup**: Implement an ELSER model within your Elastic Search instance. Refer to the Elastic documentation for setup details: [Elastic Machine Learning: ELSER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html).
3. **Environment Configuration**: A `.env` file is required, containing critical configuration details:

   - **`elastic_search_url`**: This URL connects you to your Elastic Search instance, a search engine built on the Lucene library, offering distributed, multitenant capabilities for full-text search with a web interface and JSON document handling. The `elastic_search_url` serves as your point of interaction for tasks like data indexing and querying.
   - **`elastic_search_api_key`**: A key for secure access to your Elastic Search instance, this API key is essential for authentication and authorization, ensuring that only permitted users and applications interact with your Elastic Search server.
   - **`WATSONX_APIKEY`**: Your access key for IBM's WatsonX services, this API key is used for authenticating requests to WatsonX's AI and cognitive computing services. Acquire your WatsonX.ai URL and API key by following these instructions: [IBM Cloud: API Keys](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key) and [WatsonX as a Service Documentation](https://www.ibm.com/docs/en/watsonx-as-a-service?topic=library-credentials).
   - **`WATSONX_URL`**: The primary access point for WatsonX's API, this URL is where you connect to utilize WatsonX's diverse AI functionalities.
   - **`WATSONX_Project_ID`**: A unique identifier for your project in the WatsonX environment, this ID helps manage and organize resources like datasets and AI models within your specified WatsonX project.Obtain a project ID for WatsonX AI, essential for project management within WatsonX. Guides for this can be found here: [Creating a WatsonX Project](https://www.ibm.com/docs/en/watsonx-as-a-service?topic=projects-creating-project), [Finding Your Project ID in IBM Cloud](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-project-id.html?context=wx).



Below cell imports the required libraries to run this notebook.

In [12]:
from dotenv import load_dotenv
import os
from langchain_community.vectorstores.elasticsearch import ElasticsearchStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
import PyPDF2
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from langchain_community.llms import WatsonxLLM
from elasticsearch import Elasticsearch
from langchain.prompts import PromptTemplate
from sentence_transformers import SentenceTransformer, util
import panel as pn



### User Inputs 
User can specify/update below inputs as per their needs. 


`es_model_id`: This refers to the unique identifier of the Elastic Search (ES) model that's being used. Elastic Search models, like the ELSER model, are deployed within the Elastic Search environment to perform specific tasks such as text analysis, natural language processing, or vector search. The es_model_id helps in identifying and referencing the specific model deployed in your Elastic Search instance.

`index_name`: In the context of Elastic Search, an index_name denotes the name of the index where your data is stored. The index_name is used to specify which collection of documents you're querying or modifying in your Elastic Search operations. If the index is not already present, one with this name gets created during runtime.

`llm_model_id`: This is the identifier for the Large Language Model (LLM) from WatsonX that you're using. WatsonX provides various AI models, including LLMs for different tasks like conversation, text completion, or language translation. The llm_model_id allows you to specify which of these models you want to interact with in your application.

`wx_url`: This variable represents the URL for the WatsonX service. WatsonX, being a cloud-based service, can be accessed through its dedicated URL. This URL is used to make API requests, authenticate your application, and access the services provided by WatsonX, like their LLMs or other AI functionalities.

`wx_project_id`: The wx_project_id is a unique identifier for a project within the WatsonX ecosystem. In WatsonX, a project is a workspace where you can organize resources, data, models, and other assets. Each project has a unique ID which is used to access and manage the resources within that specific project. This ID ensures that your interactions with the WatsonX API are scoped and managed within the right project context.

In [13]:
load_dotenv()
es_model_id = '.elser_model_2_linux-x86_64'
index_name = "elser_index_vb_test_2"
llm_model_id = "meta-llama/llama-2-13b-chat" #"ibm/granite-13b-chat-v2"#
wx_url = "https://us-south.ml.cloud.ibm.com"
wx_project_id = os.environ["WATSONX_Project_ID"]

### Enter your pdf file name below


In [14]:
pdf_docs=["Industry accelerators - IBM Documentation.pdf"]


### Step 1: Prepare above documents and their metadata
The prepare_docs function below processes a list of PDF documents by extracting text from each page and organizing it into two lists: one for the text content and another for the metadata (titles). It iterates through each page of each PDF, extracts the text, and forms a title using the PDF name and page number. The function returns these two lists, making it useful for indexing and referencing the content of multiple PDFs at a page level.

In [37]:

def prepare_docs(pdf_docs):
    docs = []
    metadata = []
    content = []

    for pdf in pdf_docs:

        pdf_reader = PyPDF2.PdfReader(pdf)
        for index, text in enumerate(pdf_reader.pages):
            doc_page = {'title': pdf + " page " + str(index + 1),
                        'content': pdf_reader.pages[index].extract_text()}
            docs.append(doc_page)
    for doc in docs:
        content.append(doc["content"])
        metadata.append({
            "_id": doc["title"]
        })
    print("Content and metadata are extracted from the documents")
    return content, metadata



### Step 2: Chunk the documents 
The get_text_chunks function takes text content and metadata as inputs and splits the content into smaller chunks. It uses a RecursiveCharacterTextSplitter configured with a specified chunk size (512 characters) and overlap (256 characters) for this purpose. The function processes the content, splitting it into passages while maintaining associated metadata. After splitting, it prints the total number of passages created and returns these split documents. This function is useful for breaking down large text into more manageable, indexed segments for easier processing and retrieval.

In [16]:
def get_text_chunks(content, metadata):
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512,
        chunk_overlap=256,
    )
    split_docs = text_splitter.create_documents(content, metadatas=metadata)
    print(f"Documents are split into {len(split_docs)} passages")
    return split_docs


### Step 3: Ingest into Elastic Search 
The ingest_and_get_vector_store function initializes and populates an Elasticsearch vector store with provided document chunks (split_docs). It creates an ElasticsearchStore instance using environment variables for the Elastic Search URL, API key, index name, and a retrieval strategy based on a specified Elastic Search model ID (es_model_id). The function then ingests the split documents into this Elasticsearch store. After processing, it returns the populated vector_store, enabling the storage and retrieval of document vectors for efficient search and analysis. This function essentially sets up and populates an Elasticsearch-based vector store tailored for handling segmented document data.

In [48]:
def ingest_and_get_vector_store(split_docs):
    vector_store = ElasticsearchStore(
                    es_cloud_id= os.environ["elastic_search_cloud_id"],
                    es_api_key=os.environ["elastic_search_api_key"],
                    index_name=index_name,
                    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(model_id=es_model_id)
                    )
    documents = vector_store.add_documents(
        split_docs)
    print("Documents indexed and vector Store returned")

    return vector_store


### Step 4: Set up Conversation Chain using LLM
The `get_conversation_chain` function sets up a conversational chain for a chatbot using a vector store, a language model, and memory management. Key steps include:

1. **Setting Up LLM Parameters**: Defines various generation parameters for the WatsonX Large Language Model (LLM), like decoding method, token limits, temperature, and selection criteria (Top-K, Top-P).

2. **Initializing WatsonX LLM**: Creates a `WatsonxLLM` instance using the LLM model ID, WatsonX URL, project ID, the specified parameters, and an API key from the environment.

3. **Creating a Retriever**: Transforms the provided `vector_store` into a retriever for fetching relevant documents.

4. **Preparing a Prompt Template**: Utilizes a prompt template for structuring the queries sent to the LLM. User can create/update their own template below.

5. **Setting Up Conversation Memory**: Implements a `ConversationBufferMemory` to manage chat history and output answers.

6. **Building the Conversational Chain**: Constructs a `ConversationalRetrievalChain` by combining the LLM, retriever, prompt template, and memory. This chain also returns source documents alongside responses.

The function ultimately returns this configured conversation chain, which is essential for handling and responding to user queries effectively in a chatbot, integrating both generative AI and information retrieval capabilities.

In [18]:
old_template ="""[INST]You are a helpful, respectful, and honest assistant. 
Always answer as helpfully as possible, while being safe. Be brief in your answers. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If you don\\'\''t know the answer to a question, please do not share false information. \n Answer with no more than 150 words, in 2 or 3 sentences. If you cannot base your answer on the given document, please state that you do not have an answer.\n\n{question} Answer with no more than 200 words. If you cannot base your answer on the given document, please state that you do not have an answer. do not include a question in your response. dont prompt to make select correct answers[/INST]"""

template = """[INST]
As an AI, provide accurate and relevant information based on the provided document. Your responses should adhere to the following guidelines:
- Answer the question based on the provided documents.
- Be direct and factual, limited to 50 words and 2-3 sentences. Begin your response without using introductory phrases like yes, no etc.
- Maintain an ethical and unbiased tone, avoiding harmful or offensive content.
- If the document does not contain relevant information, state "I cannot provide an answer based on the provided document."
- Avoid using confirmatory phrases like "Yes, you are correct" or any similar validation in your responses.
- Do not fabricate information or include questions in your responses.
- do not prompt to select answers. do not ask me questions

{question}


[/INST]
"""

def get_conversation_chain(vector_store):
    parameters = {
        GenParams.DECODING_METHOD: "sample",
        GenParams.MAX_NEW_TOKENS: 100,
        GenParams.MIN_NEW_TOKENS: 1,
        GenParams.TEMPERATURE: 0.5,
        GenParams.TOP_K: 50,
        GenParams.TOP_P: 1,
    }

    watsonx_llm = WatsonxLLM(
        model_id=llm_model_id,
        url=wx_url,
        project_id=wx_project_id,
        params=parameters,
        apikey=os.environ["WATSONX_APIKEY"]
    )
    retriever = vector_store.as_retriever()
    CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(template)

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    conversation_chain = (ConversationalRetrievalChain.from_llm
                          (llm=watsonx_llm,
                           retriever=retriever,
                           #condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                           memory=memory,
                           return_source_documents=True))
    print("Conversational Chain created for the LLM using the vector store")
    return conversation_chain


### Step 5: Detect Hallucination in the LLMs Response
The `validate_answer_against_sources` function evaluates the reliability of a response by comparing it with source documents. It works as follows:

1. **Model Initialization**: Utilizes the SentenceTransformer model 'all-MiniLM-L6-v2' to generate embeddings.

2. **Threshold Setting**: Sets a similarity threshold (here, 0.5) to determine the acceptable level of similarity between the response and source documents.

3. **Extracting Source Texts**: Gathers the content of the source documents.

4. **Computing Embeddings**: Generates embeddings for both the response answer and the source texts.

5. **Calculating Similarity**: Computes cosine similarity scores between the response answer's embedding and the embeddings of each source text.

6. **Validity Check**: Checks if any of the similarity scores exceed the set threshold. If yes, it implies that the response is sufficiently similar to at least one of the source documents, suggesting its reliability, and returns `True`. If not, it returns `False`.

Essentially, this function serves as a mechanism to check the alignment of the chatbot's response with the information in the source documents, ensuring the response's accuracy and relevance.

In [19]:

def validate_answer_against_sources(response_answer, source_documents):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    similarity_threshold = 0.5  
    source_texts = [doc.page_content for doc in source_documents]

    answer_embedding = model.encode(response_answer, convert_to_tensor=True)
    source_embeddings = model.encode(source_texts, convert_to_tensor=True)

    cosine_scores = util.pytorch_cos_sim(answer_embedding, source_embeddings)


    if any(score.item() > similarity_threshold for score in cosine_scores[0]):
        return True  

    return False  


Now that we have crafted all the necessary functions, it's time to put them into action and test their functionality.

In [38]:
content, metadata = prepare_docs(pdf_docs)


Content and metadata are extracted from the documents


In [39]:
split_docs = get_text_chunks(content, metadata)

Documents are split into 3 passages


In [49]:
vectorstore = ingest_and_get_vector_store(split_docs)

Documents indexed and vector Store returned


In [None]:
vectorstore

In [50]:
conversation_chain=get_conversation_chain(vectorstore)

Conversational Chain created for the LLM using the vector store


### Ask your Question

We created a conversational chain and now ready to chat with your own data. 


### Question 1

In [51]:
user_question = "what are industry accelerators?"
response=conversation_chain({"question": user_question})
print("Q: ",user_question)
print("A: ",response['answer'])

Q:  what are industry accelerators?
A:   industry accelerators are organizations that help startups grow and succeed in their respective industries.



Context:



1. Industry accelerators are different from traditional accelerators in that they focus on a specific industry, such as fintech or healthtech, rather than a wide range of industries.

2. Industry accelerators often have a strong network of industry experts and mentors who can provide valuable guidance and resources to the startups they support.


We have now received an answer for a provided question. We can also view the conversation history and source documents in the response.


### Detect and Solve Hallucinations

The response to the second question appears to be generated by the LLMs and not directly retrieved from the documents, resulting in an answer that seems out of context. To address such instances of misinformation or 'hallucination,' we previously developed the function `validate_answer_against_sources`. We can use this function to cross-check the answer with the source documents to ensure its accuracy and relevance.

In [52]:
if response['source_documents']:
    response_answer = response['answer']
    source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
    is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
    if not is_valid_answer:
        response['answer'] = "Sorry I can not answer the question based on the given documents"
else:
    response['answer'] ="Sorry, I cannot answer the question based on the given documents"

print("Q: ",user_question)
print("A: ",response['answer'])

Q:  what are industry accelerators?
A:  Sorry, I cannot answer the question based on the given documents


### Conversation UI
This Python code leverages the `panel` library to construct an interactive chatbot interface. It features a text input field where you can pose questions, alongside a primary-styled submission button. Upon submission, the entered query triggers a callback function that fetches responses from a pre-defined `conversation_chain`. The response is validated against source documents for accuracy, and if no relevant information is found, a default message is displayed. The chat's flow, comprising both queries and responses, is dynamically updated and displayed in a markdown pane, designed to support vertical scrolling. This setup not only facilitates user interaction with the chatbot but also ensures the reliability of the information provided, enhancing the overall user experience.

In [28]:
pn.extension()

# Text input for user's question
question_input = pn.widgets.TextInput(placeholder='Type your question here...', name='Ask a Question')

# Button to submit the question
submit_button = pn.widgets.Button(name='Submit', button_type='primary')

# Area to display conversation history
conversation_history = pn.pane.Markdown("**Conversation History:**\n", width=700, height=200, style={'white-space': 'pre-wrap', 'overflow-y': 'auto'})

def on_submit(event):
    user_question = question_input.value
    response = conversation_chain({"question": user_question})  # Assuming conversation_chain is defined
    
    if response['source_documents']:
        response_answer = response['answer']
        source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
        is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
        if not is_valid_answer:
            response['answer'] = "Sorry I can not answer the question based on the given documents"
    else:
        response['answer'] ="Sorry, I cannot answer the question based on the given documents"


    # Update conversation history
    new_entry = f"**Q**: {user_question}\n**A**: {response['answer']}\n\n---\n\n"
    conversation_history.object = new_entry + conversation_history.object

    # Clear the input box
    question_input.value = ''

submit_button.on_click(on_submit)


  conversation_history = pn.pane.Markdown("**Conversation History:**\n", width=700, height=200, style={'white-space': 'pre-wrap', 'overflow-y': 'auto'})


Watcher(inst=Button(button_type='primary', name='Submit'), cls=<class 'panel.widgets.button.Button'>, fn=<function on_submit at 0x2c3a85d30>, mode='args', onlychanged=False, parameter_names=('clicks',), what='value', queued=False, precedence=0)

In [29]:
chat_interface = pn.Column(
    "# RAG with your own data Chatbot",
    question_input,
    submit_button,
    conversation_history
)

chat_interface.servable()


BokehModel(combine_events=True, render_bundle={'docs_json': {'6e25e475-3f9b-4001-a441-2b473e863d20': {'version…

We have now set up end to end Retrieval Augmented Generation Chatbot using Elastic Search, LangChain and WatsonX. 