# #**RAG APPLICATION:**

<img src='https://drive.google.com/uc?id=12NLHYlnV2pl8k7QRn9Fw8d4dyXjO1qZr'>



**The main objective of this application is to develop an interactive, retrieval-augmented application designed to provide generalized responses to queries, based on historical chat interactions between the user and the agent.**

**Provide the CSV file path**

In [1]:
import os
os.environ['ASSESSMENT_CSV_PATH'] = '/content/assesment.csv'

### **Install dependencies**

In [None]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade
!pip install langchain einops accelerate transformers bitsandbytes scipy
!pip install xformers sentencepiece
!pip -q install chromadb
!pip install sentence-transformers

In [3]:
# Import necessary libraries for Preprocessing the data.

import json
import ast
import re
import pandas as pd

# **A. Preprcessing data**

This dataset comprises multiple columns representing data from diverse sources such as email, SMS, API, etc. Among these, we have **selected "annotations" as our metadata, "comments" to analyze the conversation between visitors and agents, and "ticket_id" as the unique identifier for our conversations**.

We have done the following steps to clean our data

1. The "theme" key was removed from annotations due to its consistently empty values across all entries.
2. All topics and aspects were consolidated into a single list, and the maximum sentiment score was converted into a string.
3. The content from both the Visitor and Agent keys in the comments was extracted and labeled as user and agent Q&A. **(It's worth noting that for content originating from APIs, the body part was omitted due to frequent redundancy.)**

In [4]:
# Define the DataProcessor class. This class handles data loading, processing, and annotation manipulation.

class DataProcessor:
    def __init__(self, file_path):
        self.file_path = file_path
        self.df = None
        self.load_data()

    def load_data(self):
        """Load the CSV file into a DataFrame."""
        self.df = pd.read_csv(self.file_path)

    def remove_theme_key(self):
        annotations_column = self.df['annotations']
        for i, annotations_str in annotations_column.items():
            try:
                annotations_list = json.loads(annotations_str.strip(" ' "))
                for annotation in annotations_list:
                    annotation.pop('theme', None)
                annotations_column.at[i] = str(annotations_list)
            except json.JSONDecodeError:
                continue

    def aggregate_annotation_data_corrected(self):
        annotations_column = self.df['annotations']
        aggregated_data = []
        for annotations_str in annotations_column:
            try:
                annotations_str_cleaned = annotations_str.replace("'", "\"")
                topics, topic_aspects, sentiments = [], [], []
                annotations_list = json.loads(annotations_str_cleaned)
                for annotation in annotations_list:
                    if 'topic' in annotation:
                        topics.append(annotation['topic'])
                    if 'topicAspects' in annotation:
                        topic_aspects.extend([aspect for aspect in annotation['topicAspects'] if aspect not in topic_aspects])
                    if 'sentiment' in annotation:
                        sentiments.append(annotation['sentiment'])
                aggregated_data.append({
                    'topics': topics,
                    'topicAspects': topic_aspects,
                    'sentiments': None if not sentiments else max(set(sentiments), key=sentiments.count)
                })
            except json.JSONDecodeError:
                aggregated_data.append({'topics': [], 'topicAspects': [], 'sentiments': []})
        self.df['annotations'] = aggregated_data

    def process_comments(self):
        self.df['processed_comments'] = self.df.apply(lambda x: self._process_comment_entry(x['comments'], x['metadata.channel']), axis=1)

    def _process_comment_entry(self, comments_str, metadata_channel):
        processed_comments = []
        try:
            comments_list = json.loads(comments_str.strip("'"))
            for entry in comments_list:
                author_type = entry.get('authorType', '').lower()
                body = entry.get('body', '')
                if metadata_channel == 'api' and author_type == 'visitor':
                    match = re.search(r'message(.*?)initialurl', body, re.IGNORECASE)
                    if match:
                        body = match.group(1).strip()
                if author_type == 'visitor':
                    processed_comments.append({'user': body})
                elif author_type == 'agent':
                    processed_comments.append({'agent': body})
        except json.JSONDecodeError:
            pass
        return processed_comments

    def get_processed_data(self):
        """Return the processed DataFrame."""
        return self.df

    def process(self):
        self.remove_theme_key()
        self.aggregate_annotation_data_corrected()
        self.process_comments()



# **B. Filter Conversations and Metadata**

  Here we parse data in the respective format-- **{
      "conversation_id": "123",
      "interactions": [
          {"user": "User question"},
          {"agent": "Agent follow-up answer"},
          {"agent": "User question"},
          {"agent": "Agent follow-up answer"},
          // ... more interactions
      ],
      "metadata": {
          "topics": ["x", "y"],
          "topic_aspects": ["a", "b"],
          "sentiment": 1
      }
  }**

  To format the lists of final_test, final_id, and final_metadata compatible with Vector DB, we need to ensure that each entry is structured appropriately for storage. Vector DB typically requires a structured format such as JSON.

  **Note:- here we are using two special tokens USER_TOKEN and AGENT_TOKEN to be encapsulated befor the user question or the agent's answer , so that our model can distinct between the two.**




In [5]:
class ConversationProcessor:
    USER_TOKEN = "<USER>"
    AGENT_TOKEN = "<AGENT>"

    def __init__(self, final_data):
        self.file_path = file_path
        self.conversations = []
        self.final_text = []
        self.final_id = []
        self.final_metadata = []
        self.converted_metadata = []

    def process_data(self):
        self.construct_conversations()
        self.process_conversations()
        self.convert_metadata()

    def construct_conversations(self):
        for index, row in final_data.iterrows():
            conversation = {
                "conversation_id": str(row["_id"]),
                "interactions": ast.literal_eval(str(row["processed_comments"])),
                # "interactions": json.loads((row["processed_comments"]).replace("'")),
                "metadata": ast.literal_eval(str(row["annotations"]))
            }
            self.conversations.append(conversation)

    def clean_text(self, text):
        cleaned_text = re.sub(r"[^a-zA-Z0-9\s" + re.escape(self.USER_TOKEN + self.AGENT_TOKEN) + "]", "", text)
        return cleaned_text.strip()

    def process_conversations(self):
        for conversation in self.conversations:
            conversation_text = ""
            for turn in conversation["interactions"]:
                if 'user' in turn:
                    user_text = self.clean_text(turn['user'])
                    conversation_text += f"{self.USER_TOKEN} {user_text} "
                if 'agent' in turn:
                    agent_text = self.clean_text(turn['agent'])
                    conversation_text += f"{self.AGENT_TOKEN} {agent_text} "

            self.final_id.append(conversation['conversation_id'])
            self.final_text.append(conversation_text)
            self.final_metadata.append(conversation['metadata'])

    def value_to_string(self, value):
        if isinstance(value, list):
            return ' '.join(map(str, value))
        return str(value)

    def convert_metadata(self):
        for item in self.final_metadata:
            converted_item = {k: self.value_to_string(v) for k, v in item.items()}
            self.converted_metadata.append(converted_item)

# **C. Load data**

Using above two Classes and its attributes, we parse our data by give the CSV file path

In [7]:
import os

# Get the file path from the environment variable
file_path = os.getenv('ASSESSMENT_CSV_PATH')

# Check if the environment variable is set and the file path is valid
if file_path:
    print("File path retrieved from environment variable:", file_path)
else:
    print("Environment variable ASSESSMENT_CSV_PATH is not set.")


processor = DataProcessor(file_path)
processor.process()
final_data = processor.get_processed_data()

File path retrieved from environment variable: /content/assesment.csv


In [8]:
# Usage
Conversations= ConversationProcessor(final_data)
Conversations.process_data()
# Now you can access the processed data
print(Conversations.final_text[:3])  # Print first few entries of the processed text
print(Conversations.converted_metadata[:3])  # Print first few entries of the converted metadata

['<USER> Hi I received my order today and only received a two month supply that included a bottle with a bag of refill Joshua Lili ', '<USER> Please follow up with the customer about the feedback they provided Star rating 1 Comment Terrible customer service I would not be going out of my way to leave this message if it was not that awful Original ticket REDACTED <AGENT> This request was closed and merged into request 1566318 New ticket from dacquaah21gmail ', '<USER> Hi How do I add the DHT blocker to my order <USER> Mens Hairline Defender 3 Month Mens Hairline Defender 3 Month Hair Growth REDACTEDceutical Remove 330 280 ']
[{'topics': 'missing item', 'topicAspects': 'refill joshua lili order today', 'sentiments': '2'}, {'topics': 'support responsiveness customer support feedback', 'topicAspects': 'provided star rating comment terrible customer awful original ticket', 'sentiments': '1'}, {'topics': 'amend order', 'topicAspects': 'order add the dht dht blocker', 'sentiments': '2'}]


# **D. Data Indexing and Querying**

1. **Vector Database:** Using **ChromaDB** from the Langchain community open-source vector database. ChromaDB provides capabilities for storing and querying vector embeddings efficiently.

2. **Embedding Model:** Opted for the **HuggingFaceEmbeddings-all-mpnet-base-v2 model**, which is based on the Sentence Transformers base model. This model allows you to generate embeddings for sentences or text passages. Since our chats consist of multiple sentences separated by special tokens, using a sentence transformer-based model is suitable for preserving the structure of our conversations.

3. **Embedding Generation:** Used the chosen model (HuggingFaceEmbeddings-all-mpnet-base-v2) to generate embeddings for each sentence in our chat conversations.

4. **Data Storage:** Store the generated embeddings along with the associated metadata (such as chat IDs,annotations, etc.) in ChromaDB.

5. **Querying and Retrieval:** Implement mechanisms to query ChromaDB for retrieving embeddings based on queries

With this setup,I efficiently stored and retrieved embeddings for our chat conversations using ChromaDB and the chosen embedding model.








In [9]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,pipeline
from transformers import BitsAndBytesConfig
from langchain.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
import torch

In [11]:
# Initialize a new Hugging Face embeddings model using the specified model_name ("all-mpnet-base-v2")
new_embedding = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")

# Create a Chroma instance and index texts, embeddings, metadata, and IDs into the database
vectordb = Chroma.from_texts(
    texts=Conversations.final_text,  # Texts to be indexed
    embedding=new_embedding,  # Embeddings to be indexed
    metadatas=Conversations.converted_metadata,  # Metadata associated with the texts
    ids=Conversations.final_id,  # IDs associated with the texts
    collection_name="rag_application"  # Name of the collection in the database
)

# Convert the indexed database into a retriever for search purposes, specifying search parameters
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

# Retrieve relevant documents based on the query "How can i get refund?"
docs = retriever.get_relevant_documents("How can i get refund?")

# Calculate the number of relevant documents retrieved
len(docs)


5

# **E. Quantized LLM**


1. **Model and Tokenizer:** The code initializes a tokenizer using the pretrained model name 'mistralai/Mistral-7B-Instruct-v0.1'. Mistral-7B-Instruct prioritizes accuracy and context-awareness. Its use of rolling buffer caching efficiently retains relevant information across sequences

2. **Efficient Model Loading:** By utilizing 4-bit precision base model loading (use_4bit=True), the code optimizes memory usage during model loading. it reduces the memory footprint and accelerates loading times.

3. **Quantization for Reduced Precision:** The code implements quantization techniques such as "nf4" (nearest float 4-bit) to reduce the precision of model parameters and computations. This reduction in precision helps conserve memory and computation resources.




In [12]:
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain

In [13]:

model_name='mistralai/Mistral-7B-Instruct-v0.1'


tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

#################################################################
# bitsandbytes parameters
#################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

#################################################################
# Set up quantization config
#################################################################
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

#################################################################
# Load pre-trained config
#################################################################
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
)

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

# **F.Pipeline**

**Here we use the Hugging Face pipeline in our code for streamlining the approach to integrate our model, tokenizer, and configurable parameters into a cohesive workflow.**



In [14]:
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.0,
    # repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=1000,
)

mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# **G.Templates**


The provided code defines multiple prompt templates aimed at enhancing the efficiency of your model. Here's a breakdown of each template:

1. **Template 1:**
Instructs to answer truthfully using provided context.

2. **Template 2**:
Answer question using provided context. If context is insufficient, respond accordingly.

3. **Template 3:**
Answer question using provided context. If context is insufficient, respond accordingly. Avoid sharing personal details.

4. **Template 4:**
Answer question using context. If context is insufficient, respond accordingly. Avoid personal details and specified restricted topics.

5. **Template 5:**
Answer question using context. If context is insufficient, respond accordingly. Avoid personal details and specified restricted topics (marked as urgent).


**Note:**- These templates helps us in providing features like

1. **Removing personal Information**

2. **Answers only on the Available data**

3. **Ban specific phrases/ restricted topics to generate query**

In [15]:
# Define your list of restricted topics
restricted_topics_list = ["refund", "subscription"]
formatted_restricted_topics = ", ".join(restricted_topics_list)

# Template1
prompt_template1 = """
### [INST] Instruction: "Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the context and requires some latest information to be updated, print 'Sorry Not Sufficient context to answer query'":

{context}

### QUESTION:
{question} [/INST]
"""

# Template2
prompt_template2 = """
### [INST] Instruction:
Answer the question below using the provided context. If the context contains relevant information to answer the question, use that information to provide a response. If the context does not contain the necessary information or is insufficient to form a complete answer, respond with 'Sorry, not sufficient context to answer this query.'

### CONTEXT:
{context}

### QUESTION:
{question}

### ANSWER:
"""

# Template3
prompt_template3 = """
### [INST] Instruction:
Answer the question below using the provided context. If the context contains relevant information to answer the question, use that information to provide a response. If the context does not contain the necessary information or is insufficient to form a complete answer, respond with 'Sorry, not sufficient context to answer this query.' Do not share any specific personal details such as names, addresses, or email addresses in your response.

### CONTEXT:
{context}

### QUESTION:
{question}

### ANSWER:
"""

# Template4
prompt_template4 = f"""
### [INST] Instruction:
Answer the question below using the provided context. If the context contains relevant information to answer the question, use that information to provide a response. If the context does not contain the necessary information or is insufficient to form a complete answer, respond with 'Sorry, not sufficient context to answer this query.' Do not share any specific personal details such as names, addresses, or email addresses. Additionally, avoid discussing the following topics: {formatted_restricted_topics}.

### CONTEXT:
{{context}}

### QUESTION:
{{question}}

### ANSWER:
"""

# Template5
prompt_template5 = f"""
### [URGENT INSTRUCTION]
Do not discuss any of the following topics in your response: {formatted_restricted_topics}. This is a critical requirement.

### [INST] Instruction:
Answer the question below using the provided context. If the context does not contain the necessary information or is insufficient to form a complete answer, respond with 'Sorry, not sufficient context to answer this query.' Do not share any specific personal details such as names, addresses, or email addresses.

### CONTEXT:
{{context}}

### QUESTION:
{{question}}

### ANSWER:
"""


# **H. LLM Chain**


**Here we used the LLM chain architecture allowing for easy integration of various components such as retrievers, prompts, and language models.
Here Prompts provide structured input to the language model, guiding the generation of coherent and contextually relevant responses**

In [16]:
# Define the prompt template using PromptTemplate
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template3,  # Specify the prompt template to use
)

# Initialize the LLMChain with the specified language model (mistral_llm) and prompt template
llm_chain = LLMChain(llm=mistral_llm, prompt=prompt)

# Define the RAG (Retriever-Generator) chain using a pipeline of components
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}  # Define the input variables and their sources
    | llm_chain  # Connect the retriever and language model chain
)

# Invoke the RAG chain with a sample question ("How can we cancel our subscription?")
rag_chain.invoke("How can we cancel our subscription?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'context': [Document(page_content='<USER> Cancel subscription Hello How can I cancel my subscription ', metadata={'sentiments': '1', 'topicAspects': 'gee email michellegeehomesgmailcom chrome browser version email michellegeehomesgmailcom gatsbytoken michelle gee email cancel subscription single email michellegeehomesgmailcom phone', 'topics': 'cancel subscription switch to/from vegan'}),
  Document(page_content='<USER> I want to cancel my subscription and need help to do so Hello I do not see a place where I can cancel How do I cancel the subscription manually <AGENT> Hi Catherine Thank you for reaching out Were happy to help with your inquiry It looks like you found your way around our website and canceled your subscription online and we are confirming your cancellation now  If you change your mind down the road you can reactivate this subscription by logging into your account or requesting we restart it for you Please dont hesitate to reach out with any further questions or concern

In [17]:
torch.cuda.empty_cache()

# **I. Chat Interface**


**Using Gradio to create a chat interface:**

 It offers a user-friendly and versatile framework for building chat interfaces that are interactive, customizable, and seamlessly integrated with our models used for conversational agents.


In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!pip install gradio==3.48.0

In [19]:
import gradio as gr

# Define the function to handle user inputs and return responses
def chatbot_interface(user_input):
    response = rag_chain.invoke(user_input)
    answer = response['text']
    return answer

# Create the Gradio interface
iface = gr.Interface(
    fn=chatbot_interface,
    inputs=gr.Textbox(),
    outputs=gr.Textbox(),
    capture_session=True  # This is to capture the CUDA sessions if you are using GPU
)

# Add a button to trigger the chatbot response
iface.launch(share=True)

  iface = gr.Interface(


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://7dda5ff4110a688c2f.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




# **Examples**

<img src='https://drive.google.com/uc?id=121vv-cWwPPyqcd9UIFTTu7WM8bfgdpvC'>



<img src='https://drive.google.com/uc?id=1Yfi6Vd44Iqfy4FeRTjgOcG6UYFwOXwyA'>







<img src='https://drive.google.com/uc?id=1uVV_AfwQC5wIi89cVlXTrMvv6VcfPsLN'>


<img src='https://drive.google.com/uc?id=12wBAYhz8jyBcmRyD6qEYqsXnWpcL-Tno'>
