In [32]:

# 1) accept word doc and excel sheet as input
# 2) how to read images from the documetns and convert it into context - shud be able to display or give a refernece of the image in response
# 3) how to persist, update old context and add new context when they imprt new docs
# 4) redesign question before giving it to llm for final response


SyntaxError: unmatched ')' (3732510050.py, line 2)


## Building a Conversational Chatbot with Custom Data 

This notebook guides you through the creation of a chatbot tailored to your specific data needs. Utilizing HuggingFaceEmbeddings and FAISS, the project transforms documents into vectors for a local vector storage system. Then it integrates the "meta-llama/llama-2-7b-chat" model from your local machine. The `langchain` library plays a crucial role in this process, aiding in tasks like chunking documents, indexing data in vector db, managing conversation chains with memory buffers, and crafting prompt templates.


### Key Features:

- **PDF Content Processing**: When users upload PDF files, the notebook extracts the text, segments it into manageable chunks, and indexes these chunks in in a vector db locally using HuggingFaceEmbeddings and FAISS.
- **Data-Driven Query Handling**: Users can pose questions to the chatbot, which searches the indexed data for relevant answers.
- **Integrating Vector Database and LLMs**: We leverage `langchain`'s capabilities to link vector database indexing with llama-2 LLMs, enabling a seamless conversational experience with memory and retrieval functionalities.
- **Hallucination Check**: The notebook includes a mechanism to detect and correct any hallucinations or inaccuracies in the LLM's responses.

### Prerequisites for Running the Notebook:


1. **Library Requirements**: Confirm that you have installed all libraries specified in the `requirements (local rag).txt` file by `pip install -r requirements (local rag).txt`




Below cell imports the required libraries to run this notebook.

In [1]:
import PyPDF2

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.llms import LlamaCpp


from langchain.embeddings import HuggingFaceEmbeddings # import hf embedding
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain


from langchain.prompts import PromptTemplate
from sentence_transformers import SentenceTransformer, util
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler



  from tqdm.autonotebook import tqdm, trange


### Enter your pdf file name below


In [2]:
pdf_docs=["Mansouri-Benssassi-Ye2021_Article_GeneralisationAndRobustnessInv.pdf"]

### Step 1: Prepare above documents and their metadata
The prepare_docs function below processes a list of PDF documents by extracting text from each page and organizing it into two lists: one for the text content and another for the metadata (titles). It iterates through each page of each PDF, extracts the text, and forms a title using the PDF name and page number. The function returns these two lists, making it useful for indexing and referencing the content of multiple PDFs at a page level.

In [3]:

def prepare_docs(pdf_docs):
    docs = []
    metadata = []
    content = []

    for pdf in pdf_docs:

        pdf_reader = PyPDF2.PdfReader(pdf)
        for index, text in enumerate(pdf_reader.pages):
            doc_page = {'title': pdf + " page " + str(index + 1),
                        'content': pdf_reader.pages[index].extract_text()}
            docs.append(doc_page)
    for doc in docs:
        content.append(doc["content"])
        metadata.append({
            "title": doc["title"]
        })
    print("Content and metadata are extracted from the documents")
    return content, metadata



### Step 2: Chunk the documents 
The get_text_chunks function takes text content and metadata as inputs and splits the content into smaller chunks. It uses a RecursiveCharacterTextSplitter configured with a specified chunk size (512 characters) and overlap (256 characters) for this purpose. The function processes the content, splitting it into passages while maintaining associated metadata. After splitting, it prints the total number of passages created and returns these split documents. This function is useful for breaking down large text into more manageable, indexed segments for easier processing and retrieval.

In [4]:
def get_text_chunks(content, metadata):
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512,
        chunk_overlap=256,
    )
    split_docs = text_splitter.create_documents(content, metadatas=metadata)
    print(f"Documents are split into {len(split_docs)} passages")
    return split_docs


### Step 3: Ingest into Vector Database locally

The `ingest_into_vectordb` function is designed for processing and indexing a collection of documents into a vector database using FAISS (Facebook AI Similarity Search) for efficient similarity searches. It operates as follows:

1. **Embedding Creation**: It generates embeddings for the input documents (`split_docs`) using the Hugging Face model `'sentence-transformers/all-MiniLM-L6-v2'`. This model is specifically chosen for its efficiency in creating sentence-level embeddings and is set to run on the CPU.

2. **Vector Database Indexing**: Utilizes the generated embeddings to create a FAISS vector database. FAISS is used for its ability to efficiently handle large-scale similarity searches and clustering of dense vectors.

3. **Local Storage**: After creating the vector database, the function saves it locally to the path specified by `DB_FAISS_PATH`, ensuring the data can be easily accessed for future similarity searches or retrieval tasks.

The primary purpose of this function is to transform textual data into a structured, searchable vector format, facilitating efficient and scalable retrieval tasks such as document similarity searches or clustering.

In [5]:
def ingest_into_vectordb(split_docs):
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})
    db = FAISS.from_documents(split_docs, embeddings)

    DB_FAISS_PATH = 'vectorstore/db_faiss'
    db.save_local(DB_FAISS_PATH)
    return db

### Step 4: Set up Conversation Chain using LLM
The `get_conversation_chain` function is designed to create and configure a conversational chain for a language model, specifically using the LLaMA model and a vector database for retrievals. Here's a summary of its main components and functionalities:

1. **Callback Manager Setup**:
   - Initializes a `CallbackManager` with `StreamingStdOutCallbackHandler()`, which likely handles streaming and logging outputs during the model's operation.

2. **LLaMA Model Configuration**:
   - Instantiates a `LlamaCpp` model with specified parameters such as `model_path`, `temperature`, `max_tokens`, `top_p`, and `n_ctx`. These parameters configure the behavior of the LLaMA model, including its conversational style and technical constraints.
   - Integrates the `callback_manager` with the LLaMA model, allowing for additional processing or logging during the model's operation.

3. **Retriever Initialization**:
   - Transforms the input `vectordb` into a retriever, enabling it to fetch relevant information from the vector database during conversations.

4. **Conversation Chain Creation**:
   - Sets up a `ConversationBufferMemory`, which manages the conversation history and assists in generating context-aware responses.
   - Constructs a `ConversationalRetrievalChain` using the LLaMA model (`llama_llm`), the retriever, and the conversation memory. This chain is responsible for handling the flow of the conversation, including retrieving relevant information and generating responses.

5. **Return Value**:
   - Outputs a message indicating the successful creation of the conversational chain.
   - Returns the `conversation_chain` object, which can be used to handle conversational interactions using the LLaMA model and the vector database.

This function sets up a sophisticated conversational AI system combining the LLaMA model for language generation and a vector database for information retrieval, enhanced with a callback manager for additional processing and a conversation memory buffer for context management.

In [17]:
template = """[INST]
As an AI, provide accurate and relevant information based on the provided document. Your responses should adhere to the following guidelines:
- Answer the question based on the provided documents.
- Be direct and factual, limited to 50 words and 2-3 sentences. Begin your response without using introductory phrases like yes, no etc.
- Maintain an ethical and unbiased tone, avoiding harmful or offensive content.
- If the document does not contain relevant information, state "I cannot provide an answer based on the provided document."
- Avoid using confirmatory phrases like "Yes, you are correct" or any similar validation in your responses.
- Do not fabricate information or include questions in your responses.
- do not prompt to select answers. do not ask me questions
{question}
[/INST]
"""

#template = """Given the document and the current conversation between a user and an agent, your task is as follows: Answer any user query by using information from the document. The response should be detailed."""
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
def get_conversation_chain(vectordb):
    llama_llm = LlamaCpp(
    model_path="C:\\Users\\Admin\\Downloads\\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
    temperature=0,
    max_tokens=200,
    top_p=1,
    callback_manager=callback_manager,
    n_ctx=3000)

    retriever = vectordb.as_retriever()
    CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(template)

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    conversation_chain = (ConversationalRetrievalChain.from_llm
                          (llm=llama_llm,
                           retriever=retriever,
                           #condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                           memory=memory,
                           return_source_documents=True))
    print("Conversational Chain created for the LLM using the vector store")
    return conversation_chain


### Step 5: Detect Hallucination in the LLMs Response
The `validate_answer_against_sources` function evaluates the reliability of a response by comparing it with source documents. It works as follows:

1. **Model Initialization**: Utilizes the SentenceTransformer model 'all-MiniLM-L6-v2' to generate embeddings.

2. **Threshold Setting**: Sets a similarity threshold (here, 0.5) to determine the acceptable level of similarity between the response and source documents.

3. **Extracting Source Texts**: Gathers the content of the source documents.

4. **Computing Embeddings**: Generates embeddings for both the response answer and the source texts.

5. **Calculating Similarity**: Computes cosine similarity scores between the response answer's embedding and the embeddings of each source text.

6. **Validity Check**: Checks if any of the similarity scores exceed the set threshold. If yes, it implies that the response is sufficiently similar to at least one of the source documents, suggesting its reliability, and returns `True`. If not, it returns `False`.

Essentially, this function serves as a mechanism to check the alignment of the chatbot's response with the information in the source documents, ensuring the response's accuracy and relevance.

In [18]:

def validate_answer_against_sources(response_answer, source_documents):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    similarity_threshold = 0.5  
    source_texts = [doc.page_content for doc in source_documents]

    answer_embedding = model.encode(response_answer, convert_to_tensor=True)
    source_embeddings = model.encode(source_texts, convert_to_tensor=True)

    cosine_scores = util.pytorch_cos_sim(answer_embedding, source_embeddings)


    if any(score.item() > similarity_threshold for score in cosine_scores[0]):
        return True  

    return False  


Now that we have crafted all the necessary functions, it's time to put them into action and test their functionality.

In [19]:
content, metadata = prepare_docs(pdf_docs)


Content and metadata are extracted from the documents


In [20]:
split_docs = get_text_chunks(content, metadata)

Documents are split into 55 passages


In [21]:
# vectordb=ingest_into_vectordb(split_docs)

In [22]:
conversation_chain=get_conversation_chain(vectordb)

llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from C:\Users\Admin\Downloads\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_mode

Conversational Chain created for the LLM using the vector store


### Ask your Question

We created a conversational chain and now ready to chat with your own data. 


### Question 1

In [24]:
user_question = "what is emotion recognition?"
response=conversation_chain({"question": user_question})
print("Q: ",user_question)
print("A: ",response['answer'])

Llama.generate: prefix-match hit


 What is emotion recognition?

The final answer is: What is emotion recognition?


llama_print_timings:        load time =     839.10 ms
llama_print_timings:      sample time =      15.26 ms /    16 runs   (    0.95 ms per token,  1048.29 tokens per second)
llama_print_timings: prompt eval time =   11625.40 ms /   254 tokens (   45.77 ms per token,    21.85 tokens per second)
llama_print_timings:        eval time =    4438.06 ms /    15 runs   (  295.87 ms per token,     3.38 tokens per second)
llama_print_timings:       total time =   16106.03 ms /   269 tokens
Llama.generate: prefix-match hit


 Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emotions are expressed through various means, such as verbal, non-verbal speech, or facial expression and body language. Emotion recognition from facial expression and speech is the most studied in affective computing, either as separate or joined modality (Vinola and Vimaladevi 2015). 

Therefore, emotion recognition refers to the process of identifying and interpreting emotions expressed through various means such as verbal, non-verbal speech, or facial expression and body language. 

The final answer is: Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emotions are expressed through various mea


llama_print_timings:        load time =     839.10 ms
llama_print_timings:      sample time =     182.61 ms /   200 runs   (    0.91 ms per token,  1095.22 tokens per second)
llama_print_timings: prompt eval time =   81050.89 ms /  1587 tokens (   51.07 ms per token,    19.58 tokens per second)
llama_print_timings:        eval time =   64846.81 ms /   199 runs   (  325.86 ms per token,     3.07 tokens per second)
llama_print_timings:       total time =  146450.11 ms /  1786 tokens


Q:  what is emotion recognition?
A:   Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emotions are expressed through various means, such as verbal, non-verbal speech, or facial expression and body language. Emotion recognition from facial expression and speech is the most studied in affective computing, either as separate or joined modality (Vinola and Vimaladevi 2015). 

Therefore, emotion recognition refers to the process of identifying and interpreting emotions expressed through various means such as verbal, non-verbal speech, or facial expression and body language. 

The final answer is: Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emoti

We have now received an answer for a provided question. We can also view the conversation history and source documents in the response.


### Question 2

In [13]:
# user_question = "where did he graduate?"
# response=conversation_chain({"question": user_question})
# print("Q: ",user_question)
# print("A: ",response['answer'])
# print("\nConversation Chain: \n",response)

### Detect and Solve Hallucinations

The response to the second question appears to be generated by the LLMs and not directly retrieved from the documents, resulting in an answer that seems out of context. To address such instances of misinformation or 'hallucination,' we previously developed the function `validate_answer_against_sources`. We can use this function to cross-check the answer with the source documents to ensure its accuracy and relevance.

In [25]:
if response['source_documents']:
    response_answer = response['answer']
    source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
    is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
    if not is_valid_answer:
        response['answer'] = "Sorry I can not answer the question based on the given documents"
else:
    response['answer'] ="Sorry, I cannot answer the question based on the given documents"

print("Q: ",user_question)
print("A: ",response['answer'])

Q:  what is emotion recognition?
A:   Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emotions are expressed through various means, such as verbal, non-verbal speech, or facial expression and body language. Emotion recognition from facial expression and speech is the most studied in affective computing, either as separate or joined modality (Vinola and Vimaladevi 2015). 

Therefore, emotion recognition refers to the process of identifying and interpreting emotions expressed through various means such as verbal, non-verbal speech, or facial expression and body language. 

The final answer is: Emotion recognition represents one of the most important aspects in affective computing with a wide range of applications areas from human–computer interaction, social robotics, and behavioural analytic (Hsu et al. 2013). Emoti

We have now set up end to end Retrieval Augmented Generation Chatbot using LangChain and Llama 2. 

In [27]:
#!pip install panel
import panel as pn
pn.extension()



Collecting panel
  Downloading panel-1.4.4-py3-none-any.whl.metadata (25 kB)
Collecting bokeh<3.5.0,>=3.4.0 (from panel)
  Downloading bokeh-3.4.3-py3-none-any.whl.metadata (12 kB)
Collecting param<3.0,>=2.1.0 (from panel)
  Downloading param-2.1.1-py3-none-any.whl.metadata (7.2 kB)
Collecting pyviz-comms>=2.0.0 (from panel)
  Downloading pyviz_comms-3.0.2-py3-none-any.whl.metadata (7.7 kB)
Collecting xyzservices>=2021.09.1 (from panel)
  Downloading xyzservices-2024.6.0-py3-none-any.whl.metadata (4.0 kB)
Collecting markdown (from panel)
  Downloading Markdown-3.6-py3-none-any.whl.metadata (7.0 kB)
Collecting markdown-it-py (from panel)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting linkify-it-py (from panel)
  Downloading linkify_it_py-2.0.3-py3-none-any.whl.metadata (8.5 kB)
Collecting mdit-py-plugins (from panel)
  Downloading mdit_py_plugins-0.4.1-py3-none-any.whl.metadata (2.8 kB)
Collecting pandas>=1.2 (from panel)
  Downloading pandas-2.2.2-cp31

In [28]:
# Text input for user's question
question_input = pn.widgets.TextInput(placeholder='Type your question here...', name='Ask a Question')

# Button to submit the question
submit_button = pn.widgets.Button(name='Submit', button_type='primary')

# Area to display conversation history
conversation_history = pn.pane.Markdown("**Conversation History:**\n", width=500, height=200, style={'white-space': 'pre-wrap', 'overflow-y': 'auto'})


TypeError: Markdown.__init__() got an unexpected keyword argument 'style'

In [29]:
def on_submit(event):
    user_question = question_input.value
    response = conversation_chain({"question": user_question})  # Assuming conversation_chain is defined

    # Update conversation history
    new_entry = f"**Q**: {user_question}\n**A**: {response['answer']}\n\n---\n\n"
    conversation_history.object = new_entry + conversation_history.object

    # Clear the input box
    question_input.value = ''

submit_button.on_click(on_submit)


Watcher(inst=Button(button_type='primary', name='Submit'), cls=<class 'panel.widgets.button.Button'>, fn=<function on_submit at 0x0000020E410DDB20>, mode='args', onlychanged=False, parameter_names=('clicks',), what='value', queued=False, precedence=0)

In [None]:
chat_interface = pn.Column(
    "# Chatbot Interface",
    question_input,
    submit_button,
    conversation_history
)

chat_interface.servable()


In [None]:
chat_history = []
def on_submit_button_clicked(b):
    with output:
        clear_output()
        user_question = user_question_input.value
        response = conversation_chain({"question": user_question})
        
        # Append to chat history
        chat_history.append(f"Q: {user_question}")
        chat_history.append(f"A: {response['answer']}")

        # Display chat history
        for entry in chat_history:
            print(entry)



In [None]:
from ipywidgets import widgets

# Text input for the user's question
user_question_input = widgets.Text(
    placeholder='Type your question here...',
    description='Question:',
    disabled=False
)

# Button to submit the question
submit_button = widgets.Button(
    description='Ask',
    button_style='info',
    tooltip='Ask the chatbot',
)

# Output area for the chatbot's response
output = widgets.Output()


In [None]:
display(user_question_input, submit_button, output)
