<a href="https://colab.research.google.com/github/afarabee/ai_powered_hr_assistant/blob/main/AI_Powered_HR_Assistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Instructions
**Crafting an AI-Powered HR Assistant: A Use Case for Nestlé’s HR Policy Documents**

---

## 📝 Overview  
The project aims to create a conversational chatbot that responds to user inquiries using PDF document information. It requires proficiency in:
- Extracting and converting text into numerical vectors
- Establishing an answer-finding mechanism
- Designing a user-friendly chatbot interface with Gradio

Additionally, the project emphasizes:
- Structuring inquiries for clear communication
- Deploying the chatbot for practical use
- Guaranteeing the system's accessibility and efficiency in meeting user needs

---

## 📌 Instructions  
- Review the learning materials and the Gradio documentation provided for the project  
- Read the sections on **situation, task, action, and result** carefully to understand the assignment  
- Complete and submit the assignment through the Learning Management System (LMS)  
- Adhere closely to the provided guidelines, ensuring your submission contains all necessary analyses and interpretations  

---

## 🧩 Situation  
As a developer, you have received the critical task of improving the operational efficiency of **Nestlé's Human Resources department**, a leading multinational corporation.

Your toolkit includes:
- Conversational AI technology
- Python libraries
- The powerful **GPT model from OpenAI**
- The user-friendly **Gradio UI**

Your mission is to integrate these advanced tools to **transform HR processes**, creating a more streamlined and efficient workflow within Nestlé.

---

## 🎯 Task  
Your task is to develop a **conversational chatbot** that answers queries about Nestlé's HR reports efficiently.

You must:
- Use **Python libraries**, **OpenAI's GPT model**, and **Gradio UI**
- Create an interface that extracts and processes information from documents
- Provide accurate responses to user queries through the chatbot

---

## 🛠️ Action Steps  

- ✅ Import essential tools and set up OpenAI's API environment  
- ✅ Load Nestlé's HR policy using `PyPDFLoader` and split it for easy processing  
- ✅ Create vector representations for text chunks using **ChromaDB** and **OpenAI's embeddings**  
- ✅ Build a question-answering system using the **GPT-3.5 Turbo model** to retrieve answers  
- ✅ Create a **prompt template** to guide the chatbot’s responses  
- ✅ Use **Gradio** to build a user-friendly chatbot interface for interaction and information retrieval  

---

## ✅ Result  
Upon completing this project, you will submit a `.ipynb` file demonstrating your ability to use advanced AI and machine learning technologies to develop a conversational chatbot.

Your submission must include:
- Setting up the programming environment  
- Processing text documents  
- Creating vector representations  
- Building a question-answering system  
- Designing a **Gradio interface** for effective interaction  

Ensure the interface is **clear, usable, and accurate** in retrieving relevant information from Nestlé’s HR policy.

---


# Step 1: Requirements Breakdown

You're building a smart **HR assistant** that can read the **Nestlé HR policy PDF** and answer employee questions like:
- “What is Nestlé’s policy on promotions?”
- “What does Nestlé offer for employee training?”

This involves:
- 🗃️ Extracting text from the PDF  
- 🔢 Converting text chunks into numerical format (embeddings)  
- 🧠 Storing those embeddings in a retrievable way (Chroma DB)  
- 💬 Asking GPT to find answers using those chunks  
- 🖼️ Displaying this interaction using Gradio  

---

# Step 2: Set Up Environment

## Import essential tools and set up OpenAI API's environment

- Install required packages (`openai`, `langchain`, `chromadb`, `PyPDFLoader`, `gradio`, etc.)  
- Set your OpenAI API key securely using Colab Secrets  

---

In [1]:
# Step 2: Setup environment

# 2A: Install necessary libraries
# openai → to connect to GPT-3.5 Turbo via API
# pypdf → to split up Nestle's HR policy for easier processing, used by LangChain's PDF loader
# chromadb → use with OpenAI's Embeddings to create vector representations of chunks of text from the PDF that can be stored and searched for
# gradio → to build user-friendly chatbot interface, enabling interaction and information retrieval
# langchain → for handling document loading, splitting, and retrieval logic
# tiktoken → used internally by OpenAI for counting tokens (needed in retrieval chains)
# langchain_community → homes the Chroma integration
# langchain-openai → homes the OpenAI embeddings wrapper

!pip install --quiet openai pypdf chromadb gradio langchain tiktoken langchain-community langchain-openai



In [2]:
#
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings



In [3]:
# 2B: Access my OpenAI API key securely using Colab Secrets
# Avoids typing my key manually or exposing it in the notebook

from google.colab import userdata # access my stored keys in Colab
from openai import OpenAI         # new OpenAI client class

api_key = userdata.get("OpenAI")  # securely fetch my key from Secrets
client = OpenAI(api_key=api_key)  # pass it into the OpenAI client directly


In [4]:
# 2C: Define get_completion function using the updated SDK and secure key

from google.colab import userdata               # use Colab's secure key store
from openai import OpenAI                       # import the new OpenAI client

api_key = userdata.get("OpenAI")                # securely fetch my key from Secrets
client = OpenAI(api_key=api_key)                # build a working OpenAI client with that key

def get_completion(prompt, model="gpt-3.5-turbo"):  # send the prompt using the new SDK style
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    return response.choices[0].message.content   # extract and return the assistant's reply


In [5]:
# Test Step 1 E2E

get_completion("What is the purpose of a human resources policy?")


'The purpose of a human resources policy is to provide guidelines and procedures for managing employees in a fair, consistent, and legally compliant manner. These policies help to ensure that employees are treated fairly, that their rights are protected, and that the organization operates in accordance with relevant laws and regulations. Additionally, human resources policies can help to promote a positive work environment, improve communication between employees and management, and support the overall goals and objectives of the organization.'

# Step 3: Load and Split the PDF

## Load Nestle's HR Policy using PyPDFLoader and split for easy processing.

Use LangChain’s PyPDFLoader to extract text from the Nestlé HR policy. Then, break the content into smaller chunks.

---

In [6]:
# Step 3A: Install addtl requiredLangChain package
#!pip install --quiet langchain langchain-community

In [7]:
# Step 3B: Import LangChain's PDF loader and text splitter
#from langchain.document_loaders import PyPDFLoader
#from langchain.text_splitter import RecursiveCharacterTextSplitter

local file location: file:///C:/Users/aimee/OneDrive/Documents/AppliedGenAI/Building%20LLM%20Applications/Assessment_1/1728286846_the_nestle_hr_policy_pdf_2012.pdf

In [8]:
# Step 3A: Upload HR policy PDF to Colab environment to get the correct path
from google.colab import files
uploaded = files.upload()

Saving 1728286846_the_nestle_hr_policy_pdf_2012.pdf to 1728286846_the_nestle_hr_policy_pdf_2012 (1).pdf


In [9]:
# Step 3B: Load the Nestlé HR policy document and define the path
pdf_path = "/content/1728286846_the_nestle_hr_policy_pdf_2012.pdf" ### update path if needed ###

# Use PyPDFLoader from LangChain to extract PDF content
loader = PyPDFLoader(pdf_path)

# Step 3B: Extract the document and return a list of LangChain Document objects
# Each page will become a separate document object (1 per PDF pg)
pages = loader.load()


In [10]:
# TEST: Check how many pages were loaded
print(f"Loaded {len(pages)} pages from the PDF.")

Loaded 8 pages from the PDF.


In [11]:
# TEST: Show characters 100-200 on page 2
print(pages[2].page_content[100:200])


uccess and nothing can be 
achieved without their engagement. 
This document encompasses the guideli


In [12]:
# TEST: Print the length of the first embedded document
model_name = "text-embedding-3-large"
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large", #OpenAI embedding model to use
    api_key=api_key                 #API key variable
)
text = pages[0].page_content
embedding = embeddings.embed_query(text)
print(len(embedding))


3072


In [13]:
# Step 3C: Define how to chunk the text
# I want chunks that are not too long, and that overlap slightly to preserve context
text_splitter = RecursiveCharacterTextSplitter( #defines HOW to chunk the text
    chunk_size=1000,      # max characters per chunk (adjustable)
    chunk_overlap=100     # overlap to keep context from spilling over boundaries (adjustable)
)

# TEST: Text_splitter was created successfully
print(type(text_splitter))


<class 'langchain_text_splitters.character.RecursiveCharacterTextSplitter'>


In [14]:
# Step 3D: Split all pages into smaller chunks
# This will give me a list of Document chunks, ready to be embedded later
chunks = text_splitter.split_documents(pages)

# TEST: Preview the third chunk to see how it looks
# Access the first element of the slice (which is the third chunk) and then its page_content
print(chunks[2].page_content)

The Nestlé Human Resources Policy
1
At Nestlé, we recognize that our employees 
are the key to our success and nothing can be 
achieved without their engagement. 
This document encompasses the guidelines 
which constitute a solid basis for effective Human 
Resources Management throughout the Nestlé 
Group around the world. It explains to all Nestlé 
employees the vision and mission of the Human 
Resources function and illustrates every aspect of 
the Nestlé employee lifecycle. 
The Nestlé Management and Leadership 
Principles inspire all the Nestlé employees in their 
actions and in their dealings with others. The 
Corporate Business Principles refer to all the basic 
principles which Nestlé endorses and subscribes 
to on a worldwide basis. Both these documents 
are the pillars on which the present policy has 
been built.
The implementation of this policy will be 
inspired by sound judgement, compliance with 
local market laws and common sense, taking into


In [15]:
# TEST: Print the content of the first 5 chunks
for i, chunk in enumerate(chunks[:3]):  #adds a counter; returns pairs of (index, value) for each item in the list; "for" loop iterates through the pairs
  print(f"--- Chunk {i+1} ---")         #prints a header for each chunk
  print(chunk.page_content)             #prints text content of the current chunk


--- Chunk 1 ---
Policy
Mandatory
September  2012
The Nestlé  
Human Resources Policy
--- Chunk 2 ---
Policy
Mandatory
September 
 20
12
Issuing departement
Hum
an Resources
Target audience 
All
 employees
Approver
Executive Board, Nestlé S.A.
Repository
All Nestlé Principles and Policies, Standards and  
Guidelines can be found in the Centre online repository at:  
http://intranet.nestle.com/nestledocs
Copyright
 and confidentiality
Al
l rights belong to Nestec Ltd., Vevey, Switzerland.
© 2012, Nestec Ltd.
Design
Nestec Ltd., Corporate Identity & Design,  
Vevey, Switzerland
Production
brain’print GmbH, Switzerland
Paper
This report is printed on BVS, a paper produced  
from well-managed forests and other controlled sources  
certified by the Forest Stewardship Council (FSC).
--- Chunk 3 ---
The Nestlé Human Resources Policy
1
At Nestlé, we recognize that our employees 
are the key to our success and nothing can be 
achieved without their engagement. 
This document encompasses the guid

# Step 4: Create Embeddings & Store Them in Chroma

##Create vector representations for text chunks using Chroma dB and OpenAI's embeddings.

Use OpenAI’s embedding model (such as `text-embedding-3-large`) to:

1. 🔢 **Convert each chunk into a numerical vector**  
   This transforms the text into a format that can be compared mathematically for similarity.

2. 🧠 **Store the vectors in ChromaDB**  
   This allows the system to **retrieve the most relevant chunks** when users ask questions, based on vector similarity.

---

## Step 4a - Store my Chroma DB in GDrive so it persists across sessions

Created the folder

import os

os.makedirs(persist_directory, exist_ok=True)

In [16]:
# Mount my Google Drive so Colab can read/write to it.
from google.colab import drive
drive.mount('/content/drive')

import os # Import the os module

# tells the rest of the code where to save/load the DB
persist_directory = "/content/drive/MyDrive/ai_powered_hr_assistant/chroma_nestle"

# Validate that the path is ready.
print("Chroma DB folder set to:", persist_directory)
print("Folder exists?", os.path.isdir(persist_directory))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Chroma DB folder set to: /content/drive/MyDrive/ai_powered_hr_assistant/chroma_nestle
Folder exists? True


##Step 4b: Build the embeddings object

In [17]:
!pip install -U langchain-chroma



In [18]:
from langchain_chroma import Chroma

In [19]:
# Embed + persist (OpenAI’s current recs: text-embedding-3-small/large)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large", api_key=api_key) #load API key again if needed


In [20]:
# Pass the embeddings model directly during Chroma initialization
# Docs are auto persisted in new versions of Chroma, no need for vectordb.persist()

vectordb = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=persist_directory)

In [21]:
# Convert the vector store into a Retriever object
# The Chroma vector store object (vectordb) has a method to easily convert it into a retriever

retriever = vectordb.as_retriever()

In [22]:
# Test the vector database by performing a similarity search
query = "What is Nestlé's policy on employee training?"
# Use k= to retrieve only the top n most relevant documents
docs = vectordb.similarity_search(query, k=3)

# Print the content of the retrieved documents to see if they are relevant
print(f"Retrieved {len(docs)} documents:")
for i, doc in enumerate(docs):
    print(f"--- Retrieved Document {i+1} ---")
    print(doc.page_content)
    print("\n")

Retrieved 3 documents:
--- Retrieved Document 1 ---
The Nestlé Human Resources Policy
4
Learning is part of the Company culture.
Employees at all levels are systematically 
encouraged to consider how they upgrade their 
knowledge and skills.
The Company determines training and deve-
lopment priorities. The responsibility for turning 
these into actions is shared between employees, 
line managers and the Human Resources. 
Experience and on-the-job training are the 
primary source of learning. Managers are 
responsible for guiding and coaching employees 
to succeed in their current positions.  
Nestlé employees understand the importance 
of continuous improvement, as well as sharing 
knowledge and ideas freely with others. Practices 
such as lateral professional development, 
extension of responsibilities, and cross functional 
teams are encouraged to acquire additional skills, 
enrich job content and widen accountability.
Nestlé also offers a comprehensive range of 
training activities 

# Step 5: Build the Q&A Pipeline

##*Build a question-answering system using the GPT-3.5 Turbo model to retrieve answers from text chunks.*


---


Build a RetrievalQA chain that uses:
 - my prompt template (QA_CHAIN_PROMPT)
 - my retriever (built from the Chroma vectorstore)
 - an LLM (ChatOpenAI) with temperature=0 for consistent answers

----
Use **LangChain’s `RetrievalQA`** (or a similar approach) to create an intelligent question-answering flow.

The pipeline should:

1. **Take the user's question**  
   Accept natural language input from the user (e.g., “What is Nestlé's policy on promotions?”)

2. **Retrieve the most relevant chunks from ChromaDB**  
   Use similarity search to find the document sections most related to the question

3. **Pass those chunks to GPT-3.5 Turbo as context**  
   Feed the retrieved text into the model so it can generate a grounded, accurate answer

In [23]:
# Step 5.1: Import the LangChain OpenAI chat wrapper
from langchain_openai import ChatOpenAI

In [24]:
#Step 5.2: Initialize the language model (GPT-3.5 Turbo); Create the LLM object
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0,
    api_key=api_key
)

In [25]:
# TEST the LLM connectivity
print(llm.invoke("Hello, how are you?"))

content="Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to assist you. How can I help you today?" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 33, 'prompt_tokens': 13, 'total_tokens': 46, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-C6HbW0g8oSB0tLAiGJTfCGhousAtN', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--1895a388-2c97-4520-bc30-2ce2b37d602c-0' usage_metadata={'input_tokens': 13, 'output_tokens': 33, 'total_tokens': 46, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


In [26]:
#Step 5.3: Assemble the retriever
from langchain.chains import RetrievalQA


In [27]:
# TEST: Retrieve relevant documents from the vectordb using similarity search with RetrievalQA

query = "What ensures the success of Nestle as a company?"
docs = vectordb.similarity_search(query)

# Retrieve only the top n most relevant documents
docs = vectordb.similarity_search(query, k=3)

# Print the content of the retrieved documents to see if they are relevant
print(f"Retrieved {len(docs)} documents:")
for i, doc in enumerate(docs):
    print(f"--- Retrieved Document {i+1} ---")
    print(doc.page_content)
    print("\n")

Retrieved 3 documents:
--- Retrieved Document 1 ---
of a global company with the creativity and 
knowledge of a local business. As a result, people 
can have far-reaching influence every day and 
explore their full long-term potential, propelled by 
continual support and a collaborative approach by 
line managers and employees.
Corporate policy: 
Nestlé on the Move
 A flexible and dynamic organisation


--- Retrieved Document 2 ---
of a global company with the creativity and 
knowledge of a local business. As a result, people 
can have far-reaching influence every day and 
explore their full long-term potential, propelled by 
continual support and a collaborative approach by 
line managers and employees.
Corporate policy: 
Nestlé on the Move
 A flexible and dynamic organisation


--- Retrieved Document 3 ---
of a global company with the creativity and 
knowledge of a local business. As a result, people 
can have far-reaching influence every day and 
explore their full long-term potenti

In [48]:
#Step 5.4: Create a Prompt Template

# prompt_template: tell it what to use (the context), what not to do, and limit output length

prompt_template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer based on the provided context, please respond with "Sorry, I could not find an answer. Would you like to try a different question?". Do not try to make up an answer.
Use three sentences maximum.

{context}

Question: {question}
Answer:
"""



---


###Breakdown of Step 5.4 "Create a Prompt Template"

####1. The instructions to the LLM:
```
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum.

```
These are system-like instructions baked into the prompt to guide how the LLM should behave.

They:

Tell it what to use (the context we’ll give it)

Tell it what not to do (no guessing / hallucinating)

Put a limit on output length (max three sentences)


####2. {context} placeholder

```
{context}

```
This is a placeholder in a Python format string.

Later, you’ll insert the actual retrieved text chunks here:


```
prompt = prompt_template.format(context="policy text...", question="What is Nestlé's policy on training?")

```
The curly braces {} get replaced with real values by .format().

####3. The question and answer section


```
Question: {question}
Answer:

```
{question} is another placeholder where you’ll insert the actual user’s question.

The Answer: line is left blank so the LLM will write the answer directly after it.


####4. Putting it together
When you run:
```
    prompt = prompt_template.format(
        context="Nestlé provides a wide range of training programs...",
        question="What is Nestlé's policy on training?"
    )
    print(prompt)
```

You get:
```
    Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Use three sentences maximum.

    Nestlé provides a wide range of training programs...

    Question: What is Nestlé's policy on training?
    Answer:
```

You can reuse the same template for every question.

The {context} gets filled dynamically with retrieved chunks from Chroma.

The {question} gets filled with the user’s query.

This keeps your LLM grounded in the right document and avoids hallucination.


---



In [49]:
#Step 5.5: Turn my text template into a PromptTemplate object that L.C. can use
# L.C. will dynamically construct the final prompt sent to GPT-3.5 Turbo during the Q&A process.

from langchain.prompts import PromptTemplate
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt_template) #takes raw string input and converts into PromptTemplate that L.C. can work with

# This method parses the string, identifies the placeholders, & sets up the structure so when P.T. is used later, L.C. knows where to inject the actual document context and users' question.
# Required b/c L.C. chains, like RetrievalQA, are designed to work w/ structured objects

In [50]:
#Step 5.6: Create the RetrievalQA chain & assign it to the qa_chain variable
qa_chain = RetrievalQA.from_chain_type(
    # initialized model object = GPT-3.5 Turbo with temperature=0
    llm=llm,
    # passes retriever object (created from vectordb); tells the chain where to get the relevant document chunks from when a query comes in
    retriever=retriever,
    # Optional: returns the chunks used to generate the answer
    return_source_documents=True,
    # RetrievalQA uses chain type="stuff," by default. This part tells the "stuff" chain to use the custom QA_CHAIN_PROMPT as the template for the prompt it constructs.
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT} # Pass the prompt template to the chain. allows you to pass additional parameters to the underlying chain type being used by RetrievalQA
)

In [51]:
# TEST the chain by asking a question
query = "What is the HR structure at Nestle?"
response = qa_chain.invoke({"query": query})

# Print the Question and Answer
print("Question:", query)
print("Answer:", response['result'])


Question: What is the HR structure at Nestle?
Answer: The HR structure at Nestle is based on three dedicated areas: Centres of Expertise, Business Partners, and Employee Services. Each area provides specialized services, deploys HR strategies within specific businesses, and performs transactional activities.


In [52]:
# Optional: Print source documents
print("\nSource Documents:")
for doc in response['source_documents']:
     print(doc.page_content[:200])


Source Documents:
HR has adopted a streamlined approach to 
ensuring functional leadership and the highest 
level of focus, clarity, and efficiency. Our structure 
is based on three dedicated areas which provide 
speci
HR has adopted a streamlined approach to 
ensuring functional leadership and the highest 
level of focus, clarity, and efficiency. Our structure 
is based on three dedicated areas which provide 
speci
HR has adopted a streamlined approach to 
ensuring functional leadership and the highest 
level of focus, clarity, and efficiency. Our structure 
is based on three dedicated areas which provide 
speci
HR has adopted a streamlined approach to 
ensuring functional leadership and the highest 
level of focus, clarity, and efficiency. Our structure 
is based on three dedicated areas which provide 
speci


In [44]:
# TEST with a different question
query = "What is the mission of HR Managers at Nestlé?"
response = qa_chain.invoke({"query": query})

# Print the Question and Answer
print("Question:", query)
print("Answer:", response['result'])

Question: What is the mission of HR Managers at Nestlé?
Answer: The mission of HR managers at Nestlé is to provide professional guidance to line managers in order to deliver superior business results by optimizing the performance of their people while ensuring exemplary working conditions. They aim to establish business needs and corresponding people requirements, empowering line managers to build and sustain an environment where employees have a sense of personal commitment to their work and give their best for the company's success. With a 'Nestlé in the Market' approach, HR ensures functional leadership and the highest standards within the organization.


# Step 6: Create a Gradio Interface

## *Use Gradio to build a user-friendly chatbot interface, enabling interaction and information retrieval.*

Allow users to type in questions and receive answers from the chatbot.



In [53]:
#Step 6.1: Install gradio (safe to re-run; does nothing if it's already there)
!pip install --quiet gradio

In [54]:
#Step 6.2: Import the library to build the web UI and test it
import gradio as gr

def greet(name, intensity):
    return "Hello " * intensity + name + "!"

demo = gr.Interface(
    fn=greet,
    inputs=["text", "slider"],
    outputs=["text"],
)
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://12d32af22459f7e18e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [55]:
#Step 6.3: Print the version to confirm it's available in this runtime
print("Gradio version:", gr.__version__)

Gradio version: 5.42.0


In [56]:
#Step 6.4: Backend function that takes the user's question (string), sends to RetrievalQA (qa_chain) chain and returns an answer (string)
def answer_question(user_question: str) -> str:
    """
    Backend:
    - Accept a question string from the UI
    - Use the RetrievalQA chain (qa_chain) to get an answer
    - Return a clean text response for display in Gradio
    """

    # TEST: make sure qa_chain exists (built in Step 5)
    assert 'qa_chain' in globals(), "qa_chain not found. Build the RetrievalQA chain before launching Gradio."


    # Handle Empty input
    if not user_question or not user_question.strip():
        return "Please enter a question about the Nestlé HR policy."


    # Process the question using the QA chain
    # Assuming qa_chain is defined globally or passed in
    response = qa_chain.invoke({"query": user_question})


    # Extract the answer text from the response, with default message if key isn't found or is None
    answer_text = response.get('result', 'Sorry, I could not find an answer. Would you like to try a different question?')


    # Return final message to the UI
    return answer_text

In [57]:
# TEST: call the backend directly (bypassing Gradio) to verify it returns an answer
test_q = "What is the Policy on Conditions of Work and Employment?"

# Call the function to print the answer
print(answer_question(test_q))

The Policy on Conditions of Work and Employment outlines the guidelines and regulations related to employment and working conditions within the organization. It serves as a framework for ensuring a fair and respectful environment for employees. HR plays a crucial role in implementing and upholding this policy to support the well-being of the workforce.


In [58]:
# Create Gradio UI to call the backend function and launch the Gradio app

import gradio as gr

# Create input + output widgets
#   Small text box to type a question
#   Larger text box to display answer
question_box = gr.Textbox(
    label="Ask a question about the Nestlé HR policy",
    placeholder="e.g., What does Nestlé say about training and development?",
    lines=2
)
answer_box = gr.Textbox(
    label="Answer",
    lines=10
)

# Wrap backend function + widgets into a simple interface
demo = gr.Interface(
    fn=answer_question,     # function defined in step 6.4
    inputs=question_box,    # input component(s)
    outputs=answer_box,     # output component(s)
    title="Nestlé HR Assistant",
    description="Type your question below and I’ll do my best to help."
)

# Launch the app (share=True gives me a temporary public link I can submit/test)
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://afc651e1cc7a92ce0e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# Step 7: Submit

Make sure your final `.ipynb` notebook meets the following requirements:

- 💬 Has **comments explaining each step** in your workflow  
- 🔄 Shows the **full pipeline working** from document loading to answering questions  
- 🧪 Includes **example questions and answers** about Nestlé’s HR policy

Your notebook should clearly demonstrate your understanding of how to build an AI-powered Q&A assistant using OpenAI, LangChain, Chroma, and Gradio.

# Push to Git

In [40]:
# 🔐 Set up Git identity (only needs to be done once per Colab session)
#!git config --global user.name "afarabee"
#!git config --global user.email "aimee.farabee@crl.com"


In [41]:
#!git clone #################token#############

In [42]:
# Move your notebook into the repo folder
#!mv /content/AI_Powered_HR_Assistant.ipynb /content/ai_powered_hr_assistant/


In [43]:
#!ls /content/*.ipynb
