# **Operation HelpSphere**
SDS Monthly Mission-0002

# Objectives:
1.   Build an AI-powered chatbot capable of answering user queries accurately   and concisely.
2.   Enhance chatbot responses using Retrieval-Augmented Generation (RAG) by integrating company documents.
3. Maintain context and conversation flow with dynamic memory handling.
4. Provide a user-friendly interface for seamless interaction using Gradio.

# Approach:
1. **Utilize Pretrained LLM (Llama2):**: Employ Llama2's capabilities for natural language understanding and response generation.
2.   **Implement RAG Workflow:** Use vectorized embeddings of company documents to retrieve relevant context for user queries.
3.  **Custom Prompting:** Design a specialized prompt to ground responses in context and eliminate generic answers.
4. **Dynamic Conversational Memory:** Keep track of chat history to enable contextual responses.
5. **Interactive Interface:** Integrate the solution into a Gradio-powered chatbot for real-time user interactions.

# Main Components:
1. **Model and Tokenizer:**
    *   Llama-2-7b-chat-hf
2. **Embeding Model:**
    *   all-MiniLM-L6-v2
3. **RAG Vector Database:**
    *   Chroma
4. **Chain:**
    *   LangChain
    *   Custome Memory
    *   Custome Prompt Template









In [2]:
!pip install accelerate peft==0.13.2 bitsandbytes transformers trl==0.12.0 langchain-experimental langchain-huggingface langchain_chroma gradio

Collecting peft==0.13.2
  Downloading peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Collecting trl==0.12.0
  Downloading trl-0.12.0-py3-none-any.whl.metadata (10 kB)
Collecting langchain-experimental
  Downloading langchain_experimental-0.3.4-py3-none-any.whl.metadata (1.7 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain_chroma
  Downloading langchain_chroma-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting gradio
  Downloading gradio-5.12.0-py3-none-any.whl.metadata (16 kB)
Collecting datasets>=2.21.0 (from trl==0.12.0)
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting langchain-community<0.4.0,>=0.3.0 (from langchain-experimental)
  Downloading langchain_community-0.3.15-py3-none-any.whl.metadata (2.9 kB)
Collecting chromadb!=0.5.10,!=0.5.11,!=0.5.12,!=0.5.4,!=0.5.

In [3]:
import torch
import bitsandbytes as bnb
from transformers import (AutoModelForCausalLM, AutoTokenizer, pipeline)
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate
import re
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.memory import  BaseMemory
from typing import Dict, Any
from langchain_chroma import Chroma
import warnings
import gradio as gr
warnings.filterwarnings("ignore")

In [4]:

chat_model = '/content/drive/MyDrive/SDS MM#00002/models/chat/model'          #llama2-chat-hf Model's files path
chat_token = '/content/drive/MyDrive/SDS MM#00002/models/chat/token'          #llama2-chat-hf Tokenizer's files path
company_files = '/content/drive/MyDrive/SDS MM#00002/documents'               #Company Documents Path for RAG Use
chat_vectorDB = '/content/drive/MyDrive/SDS MM#00002/models/chat/vectorDB'    #Company Documents Vector Database for RAG Use

# Classes, Functions and Template

In [5]:
# Prompt Template
init_prompt = '''You are a helpful assistant used in a chatBot. Answer accurately and concisely.
Here is the relevant information retrieved from the knowledge base:
{context}\n
Conversation history:\n
{chat_history}\n
Answer the user's question only based on the above context and conversation.
Do not include phrases like "Based on the information provided in the chatBot's knowledge base".
If you could not find the answer of the user question in the context or the chat history, then your answer should be:
I dont have an answer of your question as it is not part of chat history or the knowledge base.
{question}'''

custom_prompt = PromptTemplate(
    input_variables=["context", "question", "chat_history"],
    template=(init_prompt)
)
#Memory Class
class CustomMemory(BaseMemory):
  """Custom memory class for storing chat history."""

  # Initialize memories attribute with an empty dictionary
  memories: Dict[str, Any] = {"chat_history": []}
  memories['chat_history'] = [init_prompt]

  def load_memory_variables(self, inputs: dict) -> dict:
      """Retrieve stored memory as a List."""
      # Return the chat history as part of the memory
      return {"chat_history": self.memories["chat_history"]}

  def buffer_as_str(self) -> str:
      """Return the chat history as a string."""
      return "\n".join(self.memories["chat_history"])

  def save_input(self, inputs: dict) -> None:
      """Update memory with new input"""
      user_input = inputs.get("question", "")

      # Save the interaction in chat history
      self.memories["chat_history"].append(f"User: {user_input}")

  def save_context(self, outputs:dict) ->None:
      """Update memory with new output"""
      ai_response = outputs.get("answer", "")
      # Save the interaction in chat history
      self.memories["chat_history"].append(f"AI: {ai_response}")

  def clear(self) -> None:
      """Clear the memory."""
      self.memories['chat_history'] = [init_prompt]

  def update_history(self, history) -> None:
      """Update the chat history."""
      self.memories["chat_history"] = history

  def memory_variables(self) -> list:
      """Define the memory variables for this class."""
      return ["chat_history"]

#Prompting Function
def prompting(pipe, memory, chain, prompt, retriever, max_tokens=4000):
    history = memory.load_memory_variables({})['chat_history']
    retrieved_docs  = retriever.get_relevant_documents(prompt)
    context = "\n".join([doc.page_content for doc in retrieved_docs])
    response = chain.invoke(input= {'context':context, 'question':f'<s>[INST] {prompt} [/INST]', 'chat_history' :history})
    memory.save_input({'question':prompt})

    # Update buffer and manage memory
    pointer = list(re.finditer(r'\[/INST]', response))[-1].span()[1]
    token_count = len(pipe.tokenizer.encode(response))
    response = response[pointer + 1:]
    memory.save_context({'answer':response})
    tokenized = pipe.tokenizer.encode(memory.buffer_as_str())
    if len(tokenized) >= max_tokens:
      buffer = pipe.tokenizer.decode(tokenized[-max_tokens:])
      memory.clear()
      memory.update_history(buffer)
    return response

# Loading Model, Tokenizer, Embeddings and Creating Vector Database

In [6]:
# Model
try:
    model = AutoModelForCausalLM.from_pretrained(chat_model, device_map='auto')
    model_load = True
    print('Model Loaded Successfully')
except Exception as e:
    model_load = False
    print(f'Failed to load model. Error: {e}')

# Tokenizer
try:
    tokenizor = AutoTokenizer.from_pretrained(chat_token, device_map='auto')
    token_load = True
    print('Tokenizor Loaded Successfully')
except Exception as e:
    token_load = False
    print(f'Failed to load tokenizer. Error: {e}')

# Embeddings and Vector Database
try:
    #Loading Documents
    company_doc = TextLoader(company_files+'/company.txt', encoding='utf-8').load()
    products_doc = TextLoader(company_files+'/products.txt', encoding='utf-8').load()
    shiping_doc = TextLoader(company_files+'/shipping.txt', encoding='utf-8').load()

    #Splitting Documets Into 800 Char Chunks and Merg all the e Chunks Into one list
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,  # chunk size (characters)
        chunk_overlap=160,  # chunk overlap (characters)
        add_start_index=True)  # track index in original document
    company_docs_chunk = text_splitter.split_documents(company_doc)
    products_docs_chunk = text_splitter.split_documents(products_doc)
    shipping_docs_chunk = text_splitter.split_documents(shiping_doc)
    all_chunks = []
    all_chunks.extend(company_docs_chunk)
    all_chunks.extend(products_docs_chunk)
    all_chunks.extend(shipping_docs_chunk)

    #Initializing Vector Database and Add Documents Chunk and Initializing Retriever
    vector_store = Chroma(
        persist_directory=chat_vectorDB,
        embedding_function=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"))
    vector_store.add_texts([chunk.page_content for chunk in all_chunks])
    retriever = vector_store.as_retriever()
    vector_db = True
except Exception as e:
    vector_db = False
    print(f'Vector Database Failure: {e}')

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Model Loaded Successfully
Tokenizor Loaded Successfully


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Initializing Model, Memory and Chain


In [24]:
try:
    pipe = pipeline( "text-generation", model=model, tokenizer=tokenizor, max_new_tokens=256, temperature =0.7, top_p = 0.9)
    llm = HuggingFacePipeline(pipeline=pipe)
    memory = CustomMemory()
    chain =custom_prompt | llm
    chain_load = True
except Exception as e:
    chain_load = False
    print(f'Model Initilization Failure: {e}')

Device set to use cuda:0


# Testing Chat

In [8]:
'''if chain_load and vector_db and token_load and model_load:
    while True:
        prompt = input('How Can I Help You?')
        if prompt == 'exit':
            break
        prompting(pipe, memory, chain, prompt, retriever)
else:
    print('Termentating App')
    exit()'''

"if chain_load and vector_db and token_load and model_load:\n    while True:\n        prompt = input('How Can I Help You?')\n        if prompt == 'exit':\n            break\n        prompting(pipe, memory, chain, prompt, retriever)\nelse:\n    print('Termentating App')\n    exit()"

# Gradio App

In [25]:
def chatbot(prompt, history):
    try:
        # Call your prompting function with the user's input
        response = prompting(pipe, memory, chain, prompt, retriever)
        # Add the user input and model's response to the history
        history.append((prompt, response))
        return history, history
    except Exception as e:
        # Handle any errors and return an error message
        history.append((prompt, "An error occurred: " + str(e)))
        return history, history

# Define the Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# 🤖 ChatBot Interface")

    # Chatbot display and input
    chatbot_interface = gr.Chatbot(label="AI Assistant")
    user_input = gr.Textbox(label="Your Query", placeholder="Type your quistion here...", lines=1)
    clear_btn = gr.Button("Clear Chat")

    # State to store chat history
    history_state = gr.State([])

    def chatbot_and_clear_input(prompt, history):
        # Call the chatbot function
        updated_history, new_history = chatbot(prompt, history)
        return updated_history, new_history, ""

    # Define button actions
    user_input.submit(
        chatbot_and_clear_input,  # Wrap chatbot to also clear input
        inputs=[user_input, history_state],  # Pass user input and history
        outputs=[chatbot_interface, history_state, user_input] ) # Clear user_input on submission

    clear_btn.click(
        lambda: ([], []),  # Clear the chat history
        inputs=[],
        outputs=[chatbot_interface, history_state]
    )

# Launch the Gradio app
demo.launch(debug=True)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://c85660fff3e9c9c3f6.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://4450cadb2f45469299.gradio.live
Killing tunnel 127.0.0.1:7861 <> https://57ec74d12dc79868fc.gradio.live
Killing tunnel 127.0.0.1:7862 <> https://5be94d798f38b1b4c5.gradio.live
Killing tunnel 127.0.0.1:7863 <> https://c85660fff3e9c9c3f6.gradio.live




In [19]:
try:
    pipe = pipeline( "text-generation", model=model, tokenizer=tokenizor, max_new_tokens=256, temperature =0.7, top_p = 0.9)
    llm = HuggingFacePipeline(pipeline=pipe)
    memory = CustomMemory()
    chain =custom_prompt | llm
    chain_load = True
except Exception as e:
    chain_load = False
    print(f'Model Initilization Failure: {e}')

Device set to use cuda:0


In [22]:
prompt = 'What are the provided services?'
retrieved_docs  = retriever.get_relevant_documents(prompt)
context = "\n".join([doc.page_content for doc in retrieved_docs])
response = chain.invoke(input= {'context':context, 'question':f'<s>[INST] {prompt} [/INST]', 'chat_history' :[]})

In [23]:
print(response)

You are a helpful assistant used in a chatBot. Answer accurately and concisely.
Here is the relevant information retrieved from the knowledge base:
## International Orders

### Customs & Duties
- Import duties not included in shipping cost
- Buyer responsible for customs fees
- Local taxes may apply
- Delivery times may vary due to customs

### International Returns
- Return shipping labels provided for defective items
- Customer pays return shipping for non-defective items
- 45-day return window for international orders

## TechStyle Plus Benefits

### Member Privileges
- Free express shipping on all orders
- Priority processing
- Exclusive holiday shipping rates
- Early access to pre-orders

### Premium Support
- Dedicated shipping support line
- Priority claim processing
- Special handling requests accepted

## Contact Information
## International Orders

### Customs & Duties
- Import duties not included in shipping cost
- Buyer responsible for customs fees
- Local taxes may apply
-