# FAQ Chatbot for PDF Documents

This project is a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on the content of a PDF document. It demonstrates how modern language models can transform static documents into dynamic, interactive assistants.


## Project Goal

To build an interactive chatbot that allows users to ask questions and receive accurate answers drawn directly from a PDF file. This simulates the functionality of a smart assistant that understands your documents.

## Features
  *  Parses any PDF document and converts it into machine-readable text.
  *  Splits the text into chunks and generates vector embeddings for semantic search.
  *  Uses FAISS to retrieve the most relevant information from the document.
  *  Utilizes OpenAI GPT to generate natural language answers.
  *  Includes a command-line Q&A loop for quick testing.

## Tech Stack
  *  PDF Parsing: **PyMuPDF** enables accurate and fast text extraction from PDF files.
  *  LLM Framework: **LangChain** orchestrates the retrieval and response generation.
  *  Embeddings: OpenAI’s **text-embedding-ada-002** is used for high-quality vector representations.
  *  Vector Store: **FAISS** allows efficient similarity search over embedded document chunks.
  *  LLM: **GPT-3.5 from OpenAI** provides the language understanding and response generation.

## Data Source

Title: "Extending Human Creativity with AI"

Authors: Katherine O’Toole & Emőke-Ágnes Horvát

Published in: Journal of Creativity, 2024

License: CC BY-NC-ND (Open Access)

This publication is available at the [following link](https://www.sciencedirect.com/science/article/pii/S2713374524000062?via%3Dihub).




## How It Works


1.  Download a PDF and extract the raw text using PyMuPDF.
2.  Split and embed the text using LangChain’s text splitter and OpenAI embeddings.
3.  Store embeddings in FAISS, which enables fast and accurate vector search.
4.  Use RetrievalQA from LangChain to match user questions with relevant document content.
5.  Answer questions using GPT, returning responses grounded in the original document.


In [1]:
# Mounting to Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
cd "your-path-here"

In [23]:
# Install necessary packages
%%capture
!pip install openai langchain langchain-community PyMuPDF faiss-cpu tiktoken
!pip install -U langchain-openai

In [31]:
# Import libraries
%%capture
import os
import fitz
import requests
import textwrap
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI

In [None]:
# OPENAI API Key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key-here"

In [18]:
# Load the PDF file

pdf_path = "your-path-here/file.pdf" # make sure that you uploaded your pdf file

# Open and read all pages
doc = fitz.open(pdf_path)
text_chunks = [page.get_text() for page in doc]
full_text = "\n".join(text_chunks)

# Preview the first few lines
print(full_text[:500])

Journal of Creativity 34 (2024) 100080
Available online 6 February 2024
2713-3745/© 2024 The Authors. Published by Elsevier Ltd on behalf of Academy of Creativity. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Extending human creativity with AI 
Katherine O’Toole *, Em˝oke-´Agnes Horv´at 
Northwestern University, Frances Searle Room 1-269, 2240 Campus Drive, Evanston, IL 60208, USA   
A R T I C L E  I N F O   
Keywords: 
Computa


In [21]:
# Split the Text into Chunks

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ".", "!", "?", " ", ""]
)

chunks = text_splitter.split_text(full_text)

# Check results
print(f"🔹 Total chunks: {len(chunks)}")
print("🔹 Sample chunk:\n", chunks[0][:300])

🔹 Total chunks: 87
🔹 Sample chunk:
 Journal of Creativity 34 (2024) 100080
Available online 6 February 2024
2713-3745/© 2024 The Authors. Published by Elsevier Ltd on behalf of Academy of Creativity. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Extending human creati


In [25]:
# Create Embeddings and Vector Store

embedding_model = OpenAIEmbeddings()
vector_store = FAISS.from_texts(chunks, embedding_model)

print("Vector store created!")

Vector store created!


In [28]:
# Build the RetrievalQA Chain

# Define the LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever()
)

print("QA system ready.")

QA system ready.


In [32]:
# Ask Questions

# Simple loop to ask questions
while True:
    question = input("\n❓ Ask a question about the PDF (or type 'exit'): ")
    if question.lower() == "exit":
        break

    result = qa_chain.invoke({"query": question})

    # Wrap text for better readability
    wrapped = textwrap.fill(result["result"], width=100)
    print("\n💡 Answer:\n", wrapped)


❓ Ask a question about the PDF (or type 'exit'): What is the central argument of the article regarding AI’s role in human creativity?

💡 Answer:
 The central argument of the article is that the development of generative AI has led to novel ways
that technology can be integrated into creative activities. It discusses how AI tools can be
designed to work with human creators, facilitating human creativity rather than replacing it. The
article explores how AI models can help shed light on elements of the creative process, build
interfaces that encourage idea exploration, and design technological affordances to support the
development of new creative practices.

❓ Ask a question about the PDF (or type 'exit'): How does the paper define or distinguish between human and artificial creativity?

💡 Answer:
 The paper discusses the development of generative AI and how it can be integrated into creative
activities. It focuses on designing AI tools that work with human creators rather than replaci

## Create FAQ Chatbot with predefined questions

In [34]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever()
)

print("FAQ Chatbot is ready.")

FAQ Chatbot is ready.


### Option 1: Static FAQ Chatbot (Fixed Questions)

This version loops through a list of predefined FAQ questions.

In [36]:
faq_questions = [
    "What is the central argument of the article regarding AI’s role in human creativity?",
    "How does the paper define or distinguish between human and artificial creativity?",
    "What are the key benefits and limitations of using AI tools in creative tasks, according to the authors?",
    "How does the article address the ethical implications of AI-generated creative work?",
    "What future directions or recommendations do the authors propose for integrating AI into creative professions?"
]

for question in faq_questions:
    print(f"\n❓ {question}")
    result = qa_chain.invoke({"query": question})
    wrapped = textwrap.fill(result["result"], width=100)
    print("💡 Answer:\n", wrapped)


❓ What is the central argument of the article regarding AI’s role in human creativity?
💡 Answer:
 The central argument of the article is that the development of generative AI has led to novel ways
that technology can be integrated into creative activities. It discusses how AI tools can facilitate
human creativity and allow users to engage fully and authentically in the creative process, rather
than replacing human creators. The article also raises concerns about how AI creativity can coexist
with human creativity and the potential impact on creative industries.

❓ How does the paper define or distinguish between human and artificial creativity?
💡 Answer:
 The paper "Extending human creativity with AI" by Katherine O’Toole and Em˝oke-´Agnes Horv´at from
Northwestern University focuses on how generative AI can be integrated into creative activities. It
explores the development of AI tools that work with human creators rather than replacing them. The
paper discusses leveraging AI models 

### Option 2: Selectable FAQ Interface

If we want the user to choose from the FAQ list interactively

In [39]:
faq_questions = {
    "1": "What is the central argument of the article regarding AI’s role in human creativity?",
    "2": "How does the paper define or distinguish between human and artificial creativity?",
    "3": "What are the key benefits and limitations of using AI tools in creative tasks, according to the authors?",
    "4": "How does the article address the ethical implications of AI-generated creative work?",
    "5": "What future directions do the authors propose for integrating AI into creative professions?"
}

while True:
    print("\n Choose a question (type number), or 'exit':")
    for key, q in faq_questions.items():
        print(f" {key}. {q}")
    print()  #

    choice = input("Your choice: ").strip()
    if choice.lower() == "exit":
        break
    if choice not in faq_questions:
        print("⚠️ Invalid option.")
        continue

    question = faq_questions[choice]
    result = qa_chain.invoke({"query": question})
    wrapped = textwrap.fill(result["result"], width=100)
    print("\n💡 Answer:\n", wrapped)


 Choose a question (type number), or 'exit':
 1. What is the central argument of the article regarding AI’s role in human creativity?
 2. How does the paper define or distinguish between human and artificial creativity?
 3. What are the key benefits and limitations of using AI tools in creative tasks, according to the authors?
 4. How does the article address the ethical implications of AI-generated creative work?
 5. What future directions do the authors propose for integrating AI into creative professions?

Your choice: 1

💡 Answer:
 The central argument of the article is that the development of generative AI has led to novel ways
that technology can be integrated into creative activities. It discusses how AI tools can facilitate
human creativity and allow users to engage fully and authentically in the creative process, rather
than replacing human creators. The article also raises concerns about how AI creativity can coexist
with human creativity and the potential impact on creative