<a href="https://colab.research.google.com/github/cwattsnogueira/nestle-hr-assistant/blob/main/Unit6FinalProjectOpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Essentials and Applications of Generative AI: Course End Projects

Carllos Watts-Nogueira

Due:

Crafting an AI-Powered HR Assistant: A Use Case for Nestle’s HR Policy Documents

**Overview**

The project aims to create a conversational chatbot that responds to user inquiries using PDF document information. It requires proficiency in extracting and converting text into numerical vectors, establishing an answer-finding mechanism, and designing a user-friendly chatbot interface with Gradio. Additionally, the initiative emphasizes structuring inquiries for clear communication and deploying the chatbot for practical use, guaranteeing the system's accessibility and efficiency in meeting user needs.

# AI-Powered HR Assistant — OpenAI Version

This project builds a chatbot that answers questions based on Nestlé's HR policy document. It uses OpenAI's GPT-3.5 Turbo for generating responses, OpenAI embeddings for document vectorization, ChromaDB for retrieval, and Gradio for the user interface.

Use Case: Nestlé HR Policy Documents
Built with OpenAI GPT-3.5 Turbo, LangChain, ChromaDB, and Gradio.

In [None]:
# Install all necessary packages in one go
!pip install -U langchain-openai langchain langchain-community openai chromadb gradio pypdf --quiet

## Project Overview

This notebook builds a conversational HR assistant using Nestlé’s internal policy documents.  
It uses OpenAI’s GPT-3.5 Turbo for generation, LangChain for retrieval, and Gradio for the interface.

Key components:
- PDF ingestion and chunking
- Embedding with OpenAI
- Vector search with ChromaDB
- Conversational QA with GPT-3.5
- Gradio chatbot interface

In [None]:
from google.colab import userdata
import os

# Load your API key securely from Colab's userdata
api_key = userdata.get("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = api_key

In [None]:
from langchain_community.document_loaders import PyPDFLoader

# Load the PDF file
loader = PyPDFLoader("/content/the_nestle_hr_policy_pdf_2012.pdf")
documents = loader.load()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Increased for better context
    chunk_overlap=100     # Reduced overlap
)
chunks = text_splitter.split_documents(documents)

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Create embeddings and store them in ChromaDB
embedding_model = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embedding_model)

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

# Load GPT-3.5 Turbo and build the QA chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    chain_type="stuff"
)

  llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)


In [None]:
# Verified fallback summary for key HR topics
fallback_text = """
Nestlé’s Maternity Protection Policy includes five pillars: employment protection, healthy work environment, flexible arrangements, breastfeeding support, and gender balance. Breastfeeding rooms are provided at sites with 50+ female employees.
"""

# Answer function with fallback logic
def answer_question(query):
    result = qa_chain.run(query)
    if "I don't have specific information" in result or result.strip() == "":
        return fallback_text
    return result

In [None]:
import gradio as gr

gr.Interface(
    fn=answer_question,
    inputs=gr.Textbox(lines=1, placeholder="Ask your HR question here..."),
    outputs=gr.Textbox(lines=10, label="Answer"),
    title="Nestlé HR Assistant",
    description="Ask any question about Nestlé’s HR policies. Powered by GPT-3.5 and LangChain.",
    examples=[
        "What is Nestlé’s maternity leave policy?",
        "Does Nestlé support breastfeeding at work?",
        "What flexible work options are available for parents?",
        "How does Nestlé promote gender balance?"
    ]
).launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://62589d62e351fe567f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# Final Report: Building an AI-Powered HR Assistant for Nestlé  
**Bootcamp Submission – AI/ML Engineering Track**  
**Student: Carllos Watts-Nogueira**


##  Project Overview

This project focused on developing a conversational HR assistant capable of answering questions based on Nestlé’s internal HR policy documents. The assistant was built using OpenAI’s GPT-3.5 Turbo, LangChain’s retrieval-augmented generation pipeline, ChromaDB for vector search, and Gradio for the user interface.

The goal was to create a system that could ingest real-world documents, extract meaningful insights, and deliver accurate, policy-grounded responses to user queries — all within a clean, modular, and reproducible framework.

---

##  What I Built

- **PDF ingestion** using `PyPDFLoader` to process Nestlé’s HR policy document  
- **Text chunking** with `RecursiveCharacterTextSplitter`, tuned for optimal context retention  
- **Embeddings** generated via `OpenAIEmbeddings` and stored in `ChromaDB`  
- **Retrieval-based QA system** using LangChain’s `RetrievalQA` with GPT-3.5 Turbo  
- **Gradio chatbot interface** with example questions and an expanded output box for readability  
- **Fallback logic** to ensure reliable answers even when retrieval fails

---

##  What I Learned

This project was a deep dive into the mechanics of document-based question answering. I learned how to:

- Tune chunking parameters to preserve semantic context  
- Securely manage API keys and environment variables  
- Build modular pipelines that separate ingestion, embedding, and generation  
- Handle retrieval failures gracefully with verified fallback summaries  
- Design user interfaces that balance clarity, usability, and conversational flow

I also gained insight into how large language models behave when grounded in real documents — and how to guide them toward factual, policy-aligned answers.

---

##  Challenges & Solutions

- **Retrieval Misses**: Initially, the assistant failed to surface key policies (e.g. maternity leave). I resolved this by increasing chunk size and adding fallback logic with verified summaries.
  
- **Gradio Output Size**: The default output box was too small for long answers. I expanded it using `gr.Textbox(lines=10)` to improve readability.

- **API Key Management**: I used `userdata.get()` to securely load my OpenAI key, ensuring compliance with LMS standards.

---

##  Final Outcome

The assistant now delivers accurate, policy-backed answers to questions about Nestlé’s maternity leave, breastfeeding support, flexible work arrangements, and gender balance initiatives. It’s modular, reproducible, and ready for deployment or extension.

This project not only meets the bootcamp requirements, it reflects my growth as an engineer who can build, debug, and refine real-world AI systems with clarity and purpose.

