Hi, I am Angshuman Bhattacharjee. In the follwoing code, I have tried to create a LLM based Chatbot based on some HR Policy Documents. Each section is marked down with relevant details & comments.

**Installing the necessary packages**

In [6]:
# Installing the Necessary Dependencies
#!pip install openai pypdf langchain chromadb sentence-transformers bitsandbytes accelerate

**Uploading the Policy Documents**

In [4]:
# Uploading the Policy Files
from google.colab import files
uploaded = files.upload()

Saving Leave_Policy.pdf to Leave_Policy (2).pdf
Saving Performance_Appraisal_Policy.pdf to Performance_Appraisal_Policy (2).pdf
Saving Travel_and_Reimbursement_Policy.pdf to Travel_and_Reimbursement_Policy (2).pdf


**Loading the PDF Documents**

In [5]:
# Loading the PDF Files
from langchain.document_loaders import PyPDFLoader

all_docs = []

pdf_files = [
    "Leave_Policy.pdf",
    "Performance_Appraisal_Policy.pdf",
    "Travel_and_Reimbursement_Policy.pdf"
]

for file in pdf_files:
    loader = PyPDFLoader(file)
    pages = loader.load()
    all_docs.extend(pages)

**Splitting Documents into smaller Chunks**

In [6]:
# Splittig the documents into smaller chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(all_docs)

**Creating a Vectorstore & Storing the Embeddings**

In [7]:
# Creating a vectorstore and storing the embeddings. For Embeddings, I am using the open source sentence-transformers/all-MiniLM-L6-v2 model for this project
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(split_docs, embedding_model, persist_directory="./chroma_db")

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


**Fetching the most similar/relevant chunks**

In [8]:
# Fetching the top most similar chunks (Here the contents are smaller so I selected k value as 2, which means ot will return top 2 most relevant chunks)
from langchain.chains import RetrievalQA

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 2})

**Initializing Open Source LLM**

In [1]:
# Initializing LLM ; Note - For this project, I am using Open Source 'Mistral-7B-Instruct' Model from HuggingFace. I am quantizing the model as the actual model was crashing the Collab session due to RAM unavailability.
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
import torch

token = "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"
model_id = "mistralai/Mistral-7B-Instruct-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=token)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    use_auth_token=token,
    quantization_config=bnb_config,
    device_map="auto"
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, temperature=0.02)
llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=pipe)


**Initializing Langchain QA chain with LLM & the Retrieved Context**

In [12]:
# Preparing the QA Chain with the most relevant chunks that were retrieved and the LLM
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

prompt_template = PromptTemplate(
    template="""
You are an AI assistant whose task is to answer queries based on HR policy documents.
Answer the following question based only on the context provided.
If the answer is not in the context, just say "NA".

Context:
{context}

Question: {question}
Answer:""",
    input_variables=["context", "question"],
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt_template},
)

**Getting Query as input from the user & displaying results with source documents**

In [16]:
# Taking the query as input from the user and returning the answer
import re
query = input("Please enter your query :- ")

result = qa.invoke(query)

#print("Answer: ---> \n", result['result'])
raw_output = result['result']                  # It is returning answer along with the prompt, so just extracting the actual answer
match = re.search(r"Answer:\s*(.+)", raw_output, re.DOTALL)
if match:
    # Remove any leftover instruction text above the actual answer
    answer = match.group(1).strip()
else:
    answer = raw_output.strip()
print("\n Answer: ---> \n", answer)
print("\n\n\n Sources: --> ")
for doc in result['source_documents']:
    print(" -", doc.metadata.get("source"))

Please enter your query :- How many days of leave a employee can take


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



 Answer: ---> 
 An employee can take a total of 18 days of Annual Leave, 12 days of Sick Leave, 6 days of Casual Leave, 26 weeks of Maternity Leave (if female), and 10 days of Paternity Leave (if male).



 Sources: --> 
 - Leave_Policy.pdf
 - Leave_Policy.pdf
