RAG chatbot 
Version 1.0 

Import Libraries

In [4]:
# Imports
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain_classic.retrievers import MultiQueryRetriever
import chromadb

# For formatting output
from IPython.display import display, Markdown

# Set environment variable for protobuf to prevent potential 
# protobuf-related errors when running the RAG chatbot with LangChain and Ollama
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

LOAD PDF

In [5]:
# Load PDF
doc_path = "TaskFlow.pdf"
if doc_path:
    loader = UnstructuredPDFLoader(file_path=doc_path, languages=["deu"])
    data = loader.load()
    print(f"PDF loaded successfully: {doc_path}")
else:
    print("Upload documents")

#TODO: language detection!!!!


Could get FontBBox from font descriptor because None cannot be parsed as 4 floats
Could get FontBBox from font descriptor because None cannot be parsed as 4 floats


PDF loaded successfully: TaskFlow.pdf


SPLIT TEXT INTO CHUNKS

In [6]:
#Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data) 
print(f"Number of text chunks: {len(chunks)}")

Number of text chunks: 6


CREATE VECTOR DATABASE

In [None]:
# Create vector database
client = chromadb.PersistentClient(path="./chroma_data") 
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text"),
    collection_name="rag_chatbot_collection",
    client=client
)
print("Vector database created successfully.")


Vector database created successfully.


Set up LLM and Retrieval

In [8]:
#Set up LLM and retrieval
local_model = "llama3.2"
llm = ChatOllama(model=local_model, temperature=0.1)

# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are a query rephrasing assistant for a vector database search.
    Your task is to generate **3** alternative, semantically diverse versions of the user's question. These variations should explore different terminology, structures, or angles of the original query to maximize relevant document retrieval.
    Provide only the rephrased questions, with each one separated by a newline. Do not include any introductory phrases, explanations, or the original question itself.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

Create chain

In [None]:
# RAG prompt template
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template) 


# Create chain (pipeline): context from retriever + unchanged question + prompt + llm + output parser as string
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Chat with PDF

In [10]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(question)))


In [11]:
# Example 1
chat_with_pdf("What is the main idea of this document?")

The main idea of this document appears to be a technical specification for a task management system, outlining requirements and guidelines for its implementation, functionality, and performance. The document covers various aspects such as user interface, data storage, API design, testing, and non-functional requirements like usability and performance.

In [12]:
# Example 2
chat_with_pdf("What are the two options for technical implementation of this system?")

Die beiden Optionen für die technische Implementierung des Systems sind:

1. Implementierung als Webanwendung (Spring oder ähnliche Frameworks)
2. Implementierung als Desktop-Anwendung (Swing)

In [13]:
# Example 4
chat_with_pdf("What should I do? Can you give me some steps?")

Based on the provided context from the TaskFlow.pdf document, it appears that you are tasked with managing personal tasks using a task management system called TaskFlow.

Here are some steps you can follow:

1. **Create a new task**: Go to the "Aufgabenverwaltung" section and click on the "Neue Aufgabe erstellen" button. Fill in the required fields, including title, description, priority, and optional deadline.
2. **Edit an existing task**: Locate the task you want to edit and click on the "Bearbeiten"-button. Update the relevant fields, such as title, description, or priority, and save the changes.
3. **Delete a task**: Find the task you want to delete and click on the "Löschen"-button. Confirm that you want to delete the task by clicking on the "Ja" button after being prompted for confirmation.
4. **View all tasks**: Go to the "Aufgabenansicht" section and view the list of all tasks, sorted by priority and then by deadline.
5. **Filter tasks**: Use the filter options to narrow down the list of tasks based on status, priority, deadline, or search criteria in title or description.
6. **View task details**: Click on a task to view its detailed information, including history of status changes.
7. **Check statistics**: Go to the "Statistikübersicht" section to view statistics on the number of tasks per status and overdue tasks.

Please note that these steps are based solely on the provided context and may not be an exhaustive list of all possible actions or features available in TaskFlow.