#  HR Assistant

## Action

This project implements an HR Assistant chatbot for Nestlé using RAG (Retrieval-Augmented Generation) architecture. Here's what we'll accomplish:

### 1. **Environment Setup**
- Import essential tools and set up OpenAI's API environment
- Load environment variables for secure API access

### 2. **Document Processing**
- Load Nestlé's HR policy using PyPDFLoader
- Split the document into manageable chunks for efficient processing
- Use RecursiveCharacterTextSplitter with optimal chunk size and overlap

### 3. **Vector Database Creation**
- Create vector representations for text chunks using FAISS and OpenAI's embeddings
- Build a searchable knowledge base for quick document retrieval
- Save and load the vector database for future use

### 4. **RAG System Implementation**
- Build a question-answering system using GPT-4o-mini model
- Implement retrieval chain to find relevant document chunks
- Create a prompt template to guide the chatbot's responses

### 5. **User Interface**
- Use Gradio to build a user-friendly chatbot interface
- Enable real-time interaction and information retrieval
- Provide example questions and source citations

In [17]:
from dotenv import load_dotenv
load_dotenv()

True

In [18]:
import os
from typing import List
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS

# LCEL building blocks
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# UI
import gradio as gr

In [19]:
PDF_PATH = "Dataset/the_nestle_hr_policy_pdf_2012.pdf"
CHAT_MODEL = "gpt-4o-mini"

In [20]:
loader = PyPDFLoader(PDF_PATH)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=50, separators=["\n\n", "\n", " ", ""],
)

chunks = text_splitter.split_documents(docs)
print(f"Created {len(chunks)} chunks.")

Created 35 chunks.


In [21]:

embeddings = OpenAIEmbeddings()

vectordb = FAISS.from_documents(chunks, embeddings)
vectordb.save_local("index")

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-********************************************************************************************************************************************************JZsA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'code': 'invalid_api_key', 'param': None}}

In [None]:
# save load db if needed
FAISS.load_local("index", embeddings,allow_dangerous_deserialization=True)
retriever = vectordb.as_retriever(search_kwargs={"k": 4})

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate

prompt_template = """You are a Nestle hiring assistant.

{context}

Question: {input}
Answer here:"""

system_msg = (
    "You are a helpful HR assistant for Nestlé. "
    "Answer ONLY using the provided context. "
    "If the answer is not in the context, say you cannot find it in the policy. "
    "Be concise and use bullet points when listing items. "
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_msg),
        ("human", "Question: {question}\n\nContext:\n{context}\n"),
    ]
)

llm = ChatOpenAI(model=CHAT_MODEL, temperature=0)
parser = StrOutputParser()

def _format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

rag_chain = (
    {
        "context": retriever | _format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | parser
)



In [None]:

# Small helper to create page citations
def pages_citation(docs):
    pages = sorted({(d.metadata.get("page", None) or 0) + 1 for d in docs})
    if pages:
        return "Sources: " + ", ".join(f"p.{p}" for p in pages)
    return "Sources: (none)"

In [None]:
# -------- Gradio Chat Interface --------
EXAMPLE_QUESTIONS = [
    "What is the leave policy (annual, sick, parental)?",
    "How does the performance review process work?",
    "Who do I contact for benefits enrollment?",
    "What is the probation period for new employees?"
]

# Chat function compatible with gr.ChatInterface
def chat_fn(message, history):
    try:
        # Retrieve docs for explicit citation
        docs = retriever.get_relevant_documents(message)
        answer = rag_chain.invoke(message)
        return answer + "\n\n" + pages_citation(docs)
    except Exception as e:
        return f"Error: {e}"

demo = gr.ChatInterface(
    fn=chat_fn,
    title="Nestlé HR Policy Assistant",
    description=(
        "Ask questions about Nestlé’s HR policy. "
        "Answers are grounded in the uploaded PDF. "
        "We'll append the page sources below each answer."
    ),
    examples=EXAMPLE_QUESTIONS,
    textbox=gr.Textbox(placeholder="Type your HR question…", container=True, scale=7)
)


In [None]:
demo.launch()

In [None]:
! pip install jupyter-nbconvert