In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Introduction

This notebook demonstrates how to build a chatbot that answers questions about Nestl√©'s HR policies. It‚Äôs powered by OpenAI‚Äôs GPT model to understand and respond to queries in natural language, and it uses Gradio to create a clean, easy-to-use interface. The system extracts information from PDF documents, converts it into vector form, and enables intelligent question-answering.

### Project Overview
The aim is to assist Nestl√©‚Äôs HR team by automating responses to queries about HR reports. This involves processing PDF data, generating vector embeddings, and setting up a retrieval-based QA system with a simple and interactive chatbot interface.


## Installing Dependencies

We will install all the necessary Python packages required for the project:

-   `langchain`: A framework for developing applications powered by language models.
-   `openai`: The official OpenAI library for working with models like GPT-3.5 Turbo.
-   `PyPDF2`: Helps with reading and working with PDF files.
-   `chromadb`: A vector database used to store and search through embeddings.
-   `tiktoken`: A fast tokenizer optimized for OpenAI‚Äôs models.
-   `gradio`: Makes it easy to build user interfaces for ML applications.
-   `langchain-community`: Adds community-supported integrations and tools for LangChain.
-   `pypdf`: A pure Python library for splitting, merging, and modifying PDF files.


In [3]:
# Install required packages
!pip install langchain openai PyPDF2 chromadb tiktoken gradio langchain-community pypdf



In [4]:
# Import necessary libraries
import os
from langchain.document_loaders import PyPDFLoader   # Loads PDF documents and extracts text
from langchain.text_splitter import RecursiveCharacterTextSplitter # Splits large texts into smaller chunks for processing
from langchain.embeddings.openai import OpenAIEmbeddings # Converts text into vector embeddings using OpenAI models
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA             # Sets up a retrieval-based question-answering chain
from langchain.chat_models import ChatOpenAI         # Uses OpenAI's chat models
from langchain.prompts import PromptTemplate         # Helps format and customize prompts sent to the language model
import gradio as gr

## OpenAI Client Initialization

We initialize the OpenAI client with the API key. This key authenticates requests to OpenAI's servers. I have removed my API key before submitting in accordance with security best practices.

In [None]:
# Step 1: Load OpenAI API key securely from environment variables
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
    raise ValueError("Missing OpenAI API Key. Please set the OPENAI_API_KEY environment variable.")

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

We will load the PDF document, splits it into smaller chunks, and creates a vector store using OpenAI embeddings.

In [None]:
# Step 2: Load and process the PDF using PyPDFLoader
loader = PyPDFLoader("/path/to/your/HR_policy_document.pdf")
documents = loader.load()


In [7]:
# Splits text into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

In [8]:
# Step 3: Create embeddings and store in Chroma vector store

# Initialize OpenAI embeddings
embedding = OpenAIEmbeddings()

# Create a vector store from the text chunks
vectorstore = Chroma.from_documents(texts, embedding, persist_directory="db")

# Create a retriever interface
retriever = vectorstore.as_retriever()

  embedding = OpenAIEmbeddings()


In [9]:
# Step 4: Define GPT-3.5 Turbo model and QA chain
llm = ChatOpenAI(model_name="gpt-3.5-turbo")

  llm = ChatOpenAI(model_name="gpt-3.5-turbo")


Let us look into what we will do next.

- We‚Äôre going to create a custom prompt template to help the chatbot understand and respond to user queries.
- The `RetrievalQA.from_chain_type` method will set up a question-answering chain using the GPT-3.5 Turbo model.
- We‚Äôll set the `chain_type` to "stuff," so all the relevant documents get packed into the prompt.
- The `retriever` will pull out the documents we need.
- The `chain_type_kwargs` will let us pass in our custom prompt template to the chain.


In [10]:
# Step 5: Define a custom prompt template
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
You are an AI assistant helping employees understand Nestl√©'s HR policies.
Based on the following context, answer the question below:

Context:
{context}

Question:
{question}

Answer:
"""
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt_template}
)

In [1]:
!pip install gTTS



In [11]:
from gtts import gTTS   # For text-to-speech
import tempfile

In [15]:
# --- Unified chatbot function for both text and voice --- #
def chatbot_response(query):
    # Run the QA chain
    result = qa_chain.run(query)
    # Generate speech output
    tts = gTTS(result)
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
    tts.save(temp_file.name)
    return result, temp_file.name


In [16]:
# --- Build Gradio UI --- #
with gr.Blocks(title="Nestl√© HR Assistant Bot") as demo:
    gr.Markdown("## ü§ñ Ask me about Nestl√©'s HR policies (via text or voice)")

    with gr.Tab("üí¨ Text Chat"):
        text_input = gr.Textbox(label="Enter your question")
        text_output = gr.Textbox(label="Answer (Text)")
        audio_output = gr.Audio(label="Answer (Voice)")
        text_button = gr.Button("Ask")

        text_button.click(fn=chatbot_response,
                          inputs=text_input,
                          outputs=[text_output, audio_output])

    with gr.Tab("üéôÔ∏è Voice Chat"):
        mic_input = gr.Audio(sources=["microphone"], type="filepath", label="Speak your question")
        voice_text_output = gr.Textbox(label="Answer (Text)")
        voice_audio_output = gr.Audio(label="Answer (Voice)")

        mic_input.change(fn=chatbot_response,
                         inputs=mic_input,
                         outputs=[voice_text_output, voice_audio_output])


In [12]:
# Gradio functions for text and voice

def chatbot_interface(query):
    result = qa_chain.run(query)
    tts = gTTS(result)

def chatbot_voice(audio):
    # Convert speech to text automatically handled by Gradio Microphone input
    # `audio` is a dict with keys "name", "sample_rate", "data"
    # Gradio automatically does the transcription to text if using type="text"
    result = qa_chain.run(audio)
    tts = gTTS(result)
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
    tts.save(temp_file.name)
    return result, temp_file.name

In [13]:
# Gradio interface for text and voice

with gr.Blocks(title="Nestl√© HR Assistant Bot") as demo:
    gr.Markdown("## Ask me about Nestl√© HR policies (text or voice)")

    with gr.Tab("Text Chat"):
        txt_in = gr.Textbox(label="Enter your question")
        txt_out = gr.Textbox(label="Answer")
        txt_btn = gr.Button("Ask")
        txt_btn.click(chatbot_interface, inputs=txt_in, outputs=txt_out)

    with gr.Tab("Voice Chat"):
        mic_in = gr.Audio(sources=["microphone"], type="filepath", label="Speak your question")
        voice_out_text = gr.Textbox(label="Answer (text)")
        voice_out_audio = gr.Audio(label="Answer (voice)")
        mic_in.change(chatbot_voice, inputs=mic_in, outputs=[voice_out_text, voice_out_audio])


In [17]:
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://abd4db7f4542545288.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




We will now create a Gradio interface for the chatbot. It takes the user's input (query), runs it through the QA chain, returns the result as output, and displays it in a simple interface where users can ask questions and get answers in real-time.

In [None]:
# Step 6: Create a Gradio interface
def chatbot_interface(query):
    result = qa_chain.run(query)
    return result

gr.Interface(fn=chatbot_interface, inputs="text", outputs="text", title="Nestl√© HR Assistant Bot").launch()

## Conclusion
In this project, we built a chatbot that helps answer questions about Nestl√©'s HR policies. By using smart AI tools like OpenAI's GPT and Gradio for the interface, we created a simple, user-friendly way for employees to get quick answers straight from HR documents. This shows how AI can make HR work smoother and save everyone time by giving fast, accurate information when it is needed.

### Future Scope and Improvements
- **Expand Document Coverage:** The chatbot's knowledge base can be expanded by including more HR-related documents, such as training manuals, benefits information, and company-wide announcements.
- **Improve Response Accuracy:** Fine-tuning the GPT model and refining the prompt template can lead to more accurate and relevant responses.
- **Add Multi-Language Support:** Adding different language options would let employees from all over the world use it easily.
- **Integrate with HR Systems:** This upgrade would make the chatbot even more powerful by providing personalized, real-time information. It would transform the bot into an essential tool for employees' day-to-day HR-related needs.
- **Incorporate User Feedback:** Implement a system for collecting and incorporating user feedback to continuously improve the chatbot's performance and usability.
