<a href="https://colab.research.google.com/github/cwattsnogueira/nestle-hr-assistant/blob/main/Unit6FinalProject04.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Essentials and Applications of Generative AI: Course End Projects

Carllos Watts-Nogueira

Due:

Crafting an AI-Powered HR Assistant: A Use Case for Nestle’s HR Policy Documents

**Overview**

The project aims to create a conversational chatbot that responds to user inquiries using PDF document information. It requires proficiency in extracting and converting text into numerical vectors, establishing an answer-finding mechanism, and designing a user-friendly chatbot interface with Gradio. Additionally, the initiative emphasizes structuring inquiries for clear communication and deploying the chatbot for practical use, guaranteeing the system's accessibility and efficiency in meeting user needs.

# 4) AI-Powered HR Assistant

* model_id = "google/flan-t5-base"

In [None]:
!pip install PyPDF2 sentence-transformers faiss-cpu transformers gradio --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from google.colab import userdata
import os

hf_token = userdata.get("hf_token_key")  # optional for public models
os.environ["HF_TOKEN"] = hf_token

In [None]:
!pip install PyPDF2 --quiet

In [None]:
from PyPDF2 import PdfReader

reader = PdfReader("/content/the_nestle_hr_policy_pdf_2012.pdf")
raw_text = "\n".join([page.extract_text() for page in reader.pages])
clean_text = "\n".join([line for line in raw_text.split("\n") if len(line.strip()) > 50])

In [None]:
def chunk_text(text, chunk_size=500, overlap=50):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        if len(chunk.strip()) > 100:
            chunks.append(chunk)
        start = end - overlap
    return chunks

chunks = chunk_text(clean_text)

In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embed_model.encode(chunks)

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [None]:
fallback_text = """
Nestlé’s Maternity Protection Policy promotes five pillars:
1. Employment protection and non-discrimination
2. Healthy work environment
3. Flexible work arrangements
4. Support for breastfeeding
5. Gender balance and family-friendly culture

The policy aligns with ILO Convention C183 and WHO guidelines on exclusive breastfeeding for the first six months.
"""

In [None]:
def answer_with_flan(query, k=5):
    try:
        query_embedding = embed_model.encode([query])
        distances, indices = index.search(np.array(query_embedding), k)
        results = [chunks[i] for i in indices[0] if len(chunks[i].strip()) > 100]

        context = "\n\n".join(results)
        context = context[:1000] if context else fallback_text[:1000]

        prompt = f"""You are an HR assistant for Nestlé. Based on the following policy excerpt, answer the user's question.

Excerpt:
{context}

Question:
{query}
"""

        inputs = tokenizer(prompt, return_tensors="pt")
        output = model.generate(**inputs, max_new_tokens=150)
        return tokenizer.decode(output[0], skip_special_tokens=True)

    except Exception as e:
        print("Error:", e)
        return f"Internal error: {str(e)}"

In [None]:
import gradio as gr

gr.Interface(
    fn=answer_with_flan,
    inputs="text",
    outputs="text",
    title="Nestlé HR Assistant (Flan-T5)",
    description="Ask any HR-related question. This assistant uses Flan-T5 and Nestlé's HR policy."
).launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://22960a869927737578.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


# Final Report

## Project Overview

This project was part of my AI/ML engineering bootcamp, where I built a document-aware HR assistant using Hugging Face models, FAISS retrieval, and Gradio for deployment. The assistant answers HR-related questions based on Nestlé’s internal policy documents, with a fallback mechanism for verified summaries. My goal was to create a modular, memory-safe pipeline that could run reliably in Google Colab.

---

##  What I Built

- **Document ingestion**: I parsed and cleaned a real Nestlé HR policy PDF using `PyPDF2`, removing noise and extracting meaningful text.
- **Chunking logic**: I implemented a sliding window chunking strategy to break the document into overlapping segments for better semantic retrieval.
- **Embedding and indexing**: I used `sentence-transformers` (MiniLM) to embed the chunks and built a FAISS index for fast similarity search.
- **Model selection**: After testing Phi-2 and running into memory and generation issues, I pivoted to `google/flan-t5-base`, which is instruction-tuned and lightweight enough for Colab.
- **Prompt engineering**: I crafted clear, role-based prompts to guide the model’s behavior as an HR assistant, including fallback summaries when retrieval failed.
- **Gradio interface**: I deployed the assistant with a clean UI, allowing users to ask questions and receive answers in real time.

---

##  What I Learned

###  Technical Skills

- How to securely authenticate with Hugging Face using Colab secrets and environment variables.
- How to handle large models in constrained environments using `low_cpu_mem_usage`, prompt trimming, and token limits.
- How to build a retrieval-augmented generation (RAG) pipeline using FAISS and sentence embeddings.
- How to debug model loading errors, memory crashes, and generation stalls — and how to pivot to better-suited architectures.
- How to wrap generation in timeout-safe logic using `concurrent.futures` to prevent Gradio from hanging.

###  Design Principles

- The importance of fallback logic when retrieval fails or returns irrelevant chunks.
- How prompt clarity and structure directly affect model output quality.
- Why model selection matters — not just for performance, but for reliability and user experience.
- How to modularize code for reusability, debugging, and future scaling.

---

##  Challenges I Faced

- Phi-2 repeatedly stalled during generation in Colab, even with trimmed prompts and fallback logic.
- FAISS sometimes retrieved irrelevant chunks, which required manual filtering and fallback summaries.
- Gradio would hang if the model didn’t respond quickly, so I had to implement timeout protection.

---

##  Final Outcome

The final assistant runs smoothly in Colab, responds quickly to HR-related questions, and uses verified fallback content when needed. It’s modular, reproducible, and ready for deployment or extension. I now understand how to build document-aware assistants from scratch, and how to adapt when models or environments don’t behave as expected.