# RAG based Model for help in HR Policies using LangChain

The Language used in a few clauses was ambiguous, hence I wanted to parse all related policy documents for clauses related to my doubt. So, I created a Model that uses RAG and retrieves top K concerened clauses (which have >= a defined similarity score).

Each step has been explained (Yes I redacted query information).

In [1]:
# Loading
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyPDFLoader

# Text 
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS



## Load Data, Doc Transformation
Using DirectoryLoader to parse through directory, loader_cls = PyPDGLoader as all my files are PDFs. Show_progress = True because why not?

Split text into chunks using RecursiveTextSplitter.


In [None]:
dir = "./hr_docs/"


loader = DirectoryLoader(path=dir, glob='**/*.pdf', loader_cls=PyPDFLoader, show_progress=True)
docs = loader.load_and_split()

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(docs)

# Text Embeddings
> Model Used: all-MiniLM-l6-v2

In [None]:
modelPath = "sentence-transformers/all-MiniLM-l6-v2"

embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,
    model_kwargs={'device':'cpu'},
    encode_kwargs={'normalize_embeddings': False}
)

## Vector Stores
FAISS

In [None]:
db = FAISS.from_documents(docs, embeddings)

In [None]:
question = "Rules regarding [REDACTED]."
searchDocs = db.similarity_search(question)
print(searchDocs[0].page_content)

In [None]:
question = "Regulations Concerning IP."
searchDocs = db.similarity_search(question)
print(searchDocs[0].page_content)