<a href="https://colab.research.google.com/github/afarabee/ai_powered_hr_assistant/blob/main/AI_Powered_HR_Assistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Instructions
**Crafting an AI-Powered HR Assistant: A Use Case for Nestlé’s HR Policy Documents**

---

## 📝 Overview  
The project aims to create a conversational chatbot that responds to user inquiries using PDF document information. It requires proficiency in:
- Extracting and converting text into numerical vectors
- Establishing an answer-finding mechanism
- Designing a user-friendly chatbot interface with Gradio

Additionally, the project emphasizes:
- Structuring inquiries for clear communication
- Deploying the chatbot for practical use
- Guaranteeing the system's accessibility and efficiency in meeting user needs

---

## 📌 Instructions  
- Review the learning materials and the Gradio documentation provided for the project  
- Read the sections on **situation, task, action, and result** carefully to understand the assignment  
- Complete and submit the assignment through the Learning Management System (LMS)  
- Adhere closely to the provided guidelines, ensuring your submission contains all necessary analyses and interpretations  

---

## 🧩 Situation  
As a developer, you have received the critical task of improving the operational efficiency of **Nestlé's Human Resources department**, a leading multinational corporation.

Your toolkit includes:
- Conversational AI technology
- Python libraries
- The powerful **GPT model from OpenAI**
- The user-friendly **Gradio UI**

Your mission is to integrate these advanced tools to **transform HR processes**, creating a more streamlined and efficient workflow within Nestlé.

---

## 🎯 Task  
Your task is to develop a **conversational chatbot** that answers queries about Nestlé's HR reports efficiently.

You must:
- Use **Python libraries**, **OpenAI's GPT model**, and **Gradio UI**
- Create an interface that extracts and processes information from documents
- Provide accurate responses to user queries through the chatbot

---

## 🛠️ Action Steps  

- ✅ Import essential tools and set up OpenAI's API environment  
- ✅ Load Nestlé's HR policy using `PyPDFLoader` and split it for easy processing  
- ✅ Create vector representations for text chunks using **ChromaDB** and **OpenAI's embeddings**  
- ✅ Build a question-answering system using the **GPT-3.5 Turbo model** to retrieve answers  
- ✅ Create a **prompt template** to guide the chatbot’s responses  
- ✅ Use **Gradio** to build a user-friendly chatbot interface for interaction and information retrieval  

---

## ✅ Result  
Upon completing this project, you will submit a `.ipynb` file demonstrating your ability to use advanced AI and machine learning technologies to develop a conversational chatbot.

Your submission must include:
- Setting up the programming environment  
- Processing text documents  
- Creating vector representations  
- Building a question-answering system  
- Designing a **Gradio interface** for effective interaction  

Ensure the interface is **clear, usable, and accurate** in retrieving relevant information from Nestlé’s HR policy.

---


# Step 1: Requirements Breakdown

You're building a smart **HR assistant** that can read the **Nestlé HR policy PDF** and answer employee questions like:
- “What is Nestlé’s policy on promotions?”
- “What does Nestlé offer for employee training?”

This involves:
- 🗃️ Extracting text from the PDF  
- 🔢 Converting text chunks into numerical format (embeddings)  
- 🧠 Storing those embeddings in a retrievable way (Chroma DB)  
- 💬 Asking GPT to find answers using those chunks  
- 🖼️ Displaying this interaction using Gradio  

---

# Step 2: Set Up Environment
- Install required packages (`openai`, `langchain`, `chromadb`, `PyPDFLoader`, `gradio`, etc.)  
- Set your OpenAI API key securely using Colab Secrets  

---

In [12]:
# Step 2: Setup environment

# 2A: Install necessary libraries
# openai → to connect to GPT-3.5 Turbo via API
# pypdf → to split up Nestle's HR policy for easier processing, used by LangChain's PDF loader
# chromadb → use with OpenAI's Embeddings to create vector representations of chunks of text from the PDF that can be stored and searched for
# gradio → to build user-friendly chatbot interface, enabling interaction and information retrieval
# langchain → for handling document loading, splitting, and retrieval logic
# tiktoken → used internally by OpenAI for counting tokens (needed in retrieval chains)
# langchain_community → homes the Chroma integration
# langchain-openai → homes the OpenAI embeddings wrapper

!pip install --quiet openai pypdf chromadb gradio langchain tiktoken langchain-community langchain-openai



[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/70.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.6/70.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [22]:
#
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings



In [13]:
# 2B: Access my OpenAI API key securely using Colab Secrets
# Avoids typing my key manually or exposing it in the notebook

from google.colab import userdata # access my stored keys in Colab
from openai import OpenAI         # new OpenAI client class

api_key = userdata.get("OpenAI")  # securely fetch my key from Secrets
client = OpenAI(api_key=api_key)  # pass it into the OpenAI client directly


In [14]:
# 2C: Define get_completion function using the updated SDK and secure key

from google.colab import userdata               # use Colab's secure key store
from openai import OpenAI                       # import the new OpenAI client

api_key = userdata.get("OpenAI")                # securely fetch my key from Secrets
client = OpenAI(api_key=api_key)                # build a working OpenAI client with that key

def get_completion(prompt, model="gpt-3.5-turbo"):  # send the prompt using the new SDK style
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return response.choices[0].message.content   # extract and return the assistant's reply


In [15]:
# Test Step 1 E2E

get_completion("What is the purpose of a human resources policy?")


'The purpose of a human resources policy is to provide guidelines and procedures for managing employees in a fair, consistent, and legally compliant manner. These policies help to ensure that employees are treated fairly, that their rights are protected, and that the organization operates in accordance with relevant laws and regulations. Additionally, human resources policies can help to promote a positive work environment, clarify expectations for employees, and support the overall goals and objectives of the organization.'

# Step 3: Load and Split the PDF
Use LangChain’s PyPDFLoader to extract text from the Nestlé HR policy. Then, break the content into smaller chunks.

---

In [5]:
# Step 3A: Install addtl requiredLangChain package
#!pip install --quiet langchain langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [16]:
# Step 3B: Import LangChain's PDF loader and text splitter
#from langchain.document_loaders import PyPDFLoader
#from langchain.text_splitter import RecursiveCharacterTextSplitter

local file location: file:///C:/Users/aimee/OneDrive/Documents/AppliedGenAI/Building%20LLM%20Applications/Assessment_1/1728286846_the_nestle_hr_policy_pdf_2012.pdf

In [23]:
# Step 3A: Upload HR policy PDF to Colab environment to get the correct path
from google.colab import files
uploaded = files.upload()

Saving 1728286846_the_nestle_hr_policy_pdf_2012.pdf to 1728286846_the_nestle_hr_policy_pdf_2012 (2).pdf


In [28]:
# Step 3B: Load the Nestlé HR policy document and define the path
pdf_path = "/content/1728286846_the_nestle_hr_policy_pdf_2012 (2).pdf" ### update path if needed ###

# Use PyPDFLoader from LangChain to extract PDF content
loader = PyPDFLoader(pdf_path)

# Step 3B: Extract the document and return a list of LangChain Document objects
# Each page will become a separate document object (1 per PDF pg)
pages = loader.load()

# Test: Check how many pages were loaded
print(f"Loaded {len(pages)} pages from the PDF.")

Loaded 8 pages from the PDF.


In [37]:
# Test: Show characters 100-200 on page 2
print(pages[2].page_content[100:200])


uccess and nothing can be 
achieved without their engagement. 
This document encompasses the guideli


In [39]:
# Step 3C: Define how to chunk the text
# I want chunks that are not too long, and that overlap slightly to preserve context
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # max characters per chunk (adjustable)
    chunk_overlap=100     # overlap to keep context from spilling over boundaries
)

# Test: Text_splitter was created successfully
print(type(text_splitter))


<class 'langchain_text_splitters.character.RecursiveCharacterTextSplitter'>


In [41]:
# Step 3D: Split all pages into smaller chunks
# This will give me a list of Document chunks, ready to be embedded later
chunks = text_splitter.split_documents(pages)

# Test: Preview the third chunk to see how it looks
print(chunks[2].page_content)

The Nestlé Human Resources Policy
1
At Nestlé, we recognize that our employees 
are the key to our success and nothing can be 
achieved without their engagement. 
This document encompasses the guidelines 
which constitute a solid basis for effective Human 
Resources Management throughout the Nestlé 
Group around the world. It explains to all Nestlé 
employees the vision and mission of the Human 
Resources function and illustrates every aspect of 
the Nestlé employee lifecycle. 
The Nestlé Management and Leadership 
Principles inspire all the Nestlé employees in their 
actions and in their dealings with others. The 
Corporate Business Principles refer to all the basic 
principles which Nestlé endorses and subscribes 
to on a worldwide basis. Both these documents 
are the pillars on which the present policy has 
been built.
The implementation of this policy will be 
inspired by sound judgement, compliance with 
local market laws and common sense, taking into


# Step 4: Create Embeddings & Store Them in Chroma

Use OpenAI’s embedding model (such as `text-embedding-3-large`) to:

1. 🔢 **Convert each chunk into a numerical vector**  
   This transforms the text into a format that can be compared mathematically for similarity.

2. 🧠 **Store the vectors in ChromaDB**  
   This allows the system to **retrieve the most relevant chunks** when users ask questions, based on vector similarity.

---

## Step 4a - Store my Chroma DB in GDrive so it persists across sessions

Created the folder

import os

os.makedirs(persist_directory, exist_ok=True)

In [43]:
# Mount my Google Drive so Colab can read/write to it.
from google.colab import drive
drive.mount('/content/drive')

# tells the rest of the code where to save/load the DB
persist_directory = "/content/drive/MyDrive/ai_powered_hr_assistant/chroma_nestle"

# Validate that the path is ready.
print("Chroma DB folder set to:", persist_directory)
print("Folder exists?", os.path.isdir(persist_directory))



Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Chroma DB folder set to: /content/drive/MyDrive/ai_powered_hr_assistant/chroma_nestle
Folder exists? True


#Step 4b: Build the embeddings object

In [None]:
# Embed + persist (OpenAI’s current recs: text-embedding-3-small/large)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectordb = Chroma(collection_name="sops", embedding_function=embeddings, persist_directory="./chroma")
vectordb.add_documents(chunks)
vectordb.persist()

# Step 5: Build the Q&A Pipeline

Use **LangChain’s `RetrievalQA`** (or a similar approach) to create an intelligent question-answering flow.

The pipeline should:

1. **Take the user's question**  
   Accept natural language input from the user (e.g., “What is Nestlé's policy on promotions?”)

2. **Retrieve the most relevant chunks from ChromaDB**  
   Use similarity search to find the document sections most related to the question

3. **Pass those chunks to GPT-3.5 Turbo as context**  
   Feed the retrieved text into the model so it can generate a grounded, accurate answer


# Step 6: Create a Gradio Interface

Use **Gradio** to allow users to type in questions and receive answers from the chatbot.

Use `gr.Interface()` with the following components:

- `gr.Textbox()` as the **input**  
- `gr.Textbox()` or `gr.HTML()` as the **output**  
- Your custom **Q&A function** as the **backend logic**

This interface will let users interact with your AI assistant through a simple, user-friendly web app.


# Step 7: Submit

Make sure your final `.ipynb` notebook meets the following requirements:

- 💬 Has **comments explaining each step** in your workflow  
- 🔄 Shows the **full pipeline working** from document loading to answering questions  
- 🧪 Includes **example questions and answers** about Nestlé’s HR policy

Your notebook should clearly demonstrate your understanding of how to build an AI-powered Q&A assistant using OpenAI, LangChain, Chroma, and Gradio.

# Push to Git

In [None]:
# 🔐 Set up Git identity (only needs to be done once per Colab session)
#!git config --global user.name "afarabee"
#!git config --global user.email "aimee.farabee@crl.com"


In [None]:
#!git clone #################token#############

Cloning into 'ai_powered_hr_assistant'...


In [None]:
# Move your notebook into the repo folder
#!mv /content/AI_Powered_HR_Assistant.ipynb /content/ai_powered_hr_assistant/


mv: cannot stat '/content/AI_Powered_HR_Assistant.ipynb': No such file or directory


In [None]:
#!ls /content/*.ipynb


ls: cannot access '/content/*.ipynb': No such file or directory
