# RAG based Model for help in HR Policies

The Language used in a few clauses was ambiguous, hence I wanted to parse all related policy documents for clauses related to my doubt. So, I created a Model that uses RAG and retrieves top K concerened clauses (which have >= a defined similarity score).

Each step has been explained (Yes I redacted query information).


!pip install pypdf      
!pip install -q transformers      
!pip install langchain      
!pip install sentence_transformers      
!pip install llama_index      

In [None]:
# Imports
import torch

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import HuggingFaceLLM
from llama_index.prompts.prompts import SimpleInputPrompt
from llama_index.core.retrievers import VectorIndexRetriever
#from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# Embeddings
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import ServiceContext
from llama_index.embeddings import LangchainEmbedding

## Read Stuff
SimpleDirectoryReader reads all `.pdf` files in the input directory recursively. This is loaded in variable 'documents'.

In [None]:
loader = SimpleDirectoryReader(
    input_dir = "./contents/",
    recursive = True,
    required_exts = [".pdf"],
)

documents = loader.load_data()
print(f"Loaded {len(documents)} docs.")

Now we define our prompt. This is a wrapper around our front-end prompt.    
> Note that we have done some role-play i.e we provide context to the model regarding its behaviour.

In [None]:
system_prompt="""
You are an HR Manager. Your goal is to answer questions as
accurately as possible based on the instructions and context provided.
If you do not know the answer, Say 'I do not know'.
"""

query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|MANAGER|>")

## LLM using HuggingFace CLI

Here we call a Huggingface LLm with standard parameters. We will be using the 7b parameter version of the [Llama2 model](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).

In [None]:
llm = HuggingFaceLLM(
    context_window = 4096,
    max_new_tokens = 256,
    generate_kwargs = {"temperature": 0.0, "do_sample": False},
    system_prompt = system_prompt,
    query_wrapper_prompt = query_wrapper_prompt,
    tokenizer_name = "meta-llama/Llama-2-7b-chat-hf",
    model_name = "meta-llama/Llama-2-7b-chat-hf",
    device_map = "auto",
    model_kwargs = {"torch_dtype": torch.float16}
)

# Embedding
Using BAAI (Beijing Academy of Artificial Intelligence) small BGE model for embeddings. Experiment with BERT.

* Index: Creates indices of documents
* Retriever: Finds similarity between query and documents, and outputs top k.
* Response Synthesizer: 
* SimilarityPostProcessor: A PostPocessor that calculates similarity between query and documents. It is a filter with cutoff of 0.8 (arbitary)
* QueryEngine: Pipelines retriever, synthesizer and similarity 


Question: Why different models for vectors and others? Why not use mpnet for both?

In [None]:
embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

embed_model = LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embedding_model,
)

retriever = VectorIndexRetriever(
    index = index,
    similarity_top_k = 3,
)

response_synthesizer = get_response_synthesizer()

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.8)],
)


In [None]:
service_context = ServiceContext.from_defaults(
    chunk_size = 1024,
    llm = llm,
    embed_model = embed_model
)

In [None]:
index = VectorStoreIndex.from_documents(documents, service_context = service_context)
query_engine = index.as_query_engine()

In [None]:
resp = query_engine.query("Rules regarding <REDACTED>")
resp # Also provides data regarding page number

In [None]:
resp = query_engine.query("Regulations concerning IP")
resp 