# Ingesting PDF

In [1]:
%pip install --q unstructured langchain langchain-community
%pip install --q "unstructured[all-docs]" ipywidgets tqdm

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import UnstructuredExcelLoader


In [7]:
pdf_path = "data/EazyCar_Product_knowledge_for_AI_chatbot.pdf"
excel_path = "data/Eazy Car_Package.xlsx"

pdf_loader = UnstructuredPDFLoader(pdf_path)
excel_loader = UnstructuredExcelLoader(excel_path)

pdf_docs = pdf_loader.load()
excel_docs = excel_loader.load()

docs = pdf_docs + excel_docs
print(docs[0].page_content)

No features in text.
No features in text.


Product Knowledge for AI Chatbot

What is Eazy Car?

Eazy Car is a new-style car subscription service offering brand-new cars with flexible terms.

Available for both individuals and companies

• Choose any brand and model Pay one fixed monthly rate • All costs are covered throughout the contract •

Eazy Car Offers 3 Service Types:

1. Long-Term New Car Subscription (2–6 years)

•

Brand-new cars only Annual contract-based rental

2. Monthly Car Rental

Flexible term from 1 to 12 months (or annual rental) • • Used vehicles (white plates) from model years 2020–2024

3. Short-Term New Car Subscription (6, 12, or 18 months)



Brand-new cars only

How to Apply for Eazy Car

1. Choose your preferred service and vehicle 2. Apply online or contact our team 3. Receive credit approval 4. Make payment and receive your car

Required Documents (May Vary by Package)

National ID Card

Driver’s License

Can I Own the Car?

Yes, you can choose to purchase the vehicle during or at the end of the cont

# Vecter Embedding

In [2]:
# !ollama pull nomic-embed-text:v1.5
!ollama list
# !pip install -q chromadb
# !pip install -q langchain-text-splitters
# !ollama pull deepseek-r1:1.5b

NAME                     ID              SIZE      MODIFIED     
deepseek-r1:8b           6995872bfe4c    5.2 GB    2 months ago    
deepseek-r1:1.5b         e0979632db5a    1.1 GB    2 months ago    
nomic-embed-text:v1.5    0a109f422b47    274 MB    2 months ago    


In [12]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chucks = text_splitter.split_documents(docs)

In [13]:
#Add to vector database
vector_db = Chroma.from_documents(
    documents=chucks,
    embedding=OllamaEmbeddings(model="nomic-embed-text:v1.5",show_progress=True),
    collection_name="Eazycar_db"
)

OllamaEmbeddings: 100%|██████████| 15/15 [00:55<00:00,  3.69s/it]


# Retrieval

In [14]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [15]:
local_model = "deepseek-r1:1.5b"
llm = ChatOllama(model=local_model)
QUERRY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}
    """
)

  llm = ChatOllama(model=local_model)


In [16]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),
    llm,
    prompt=QUERRY_PROMPT
)

#RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
""" 
prompt = ChatPromptTemplate.from_template(template)

In [17]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
# chain.invoke(input(""))
print("Ask a question about Eazy Car products: ")
answer = chain.invoke(input(""))
print("answer:", answer)

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.05s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.22s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.16s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.37s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.14s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.24s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.12s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.29s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]


'<think>\nFirst, I need to understand the main question: "Why EazyCar is better than another service?" I\'ll compare it with two other services based on the information provided.\n\nLooking at EazyCar\'s page content:\n\n1. **Insurance and Tax Handling**: EazyCar handles all insurance (CMI), road tax, and CMI throughout the contract.\n   - Other services might require separate handling of these aspects.\n\n2. **Vehicle Ownership**: \n   - EazyCar offers 3 types of vehicles: general cars (500 THB rental + fuel), luxury cars (9800 THB rental + fuel), and high-end cars (14700 THB rental + fuel).\n   - Other services are not detailed here, so it\'s challenging to compare directly. However, if we assume the luxury and high-end options, EazyCar is more expensive.\n\n3. **Support and Features**:\n   - EazyCar supports road maintenance, insurance, and has a 40% penalty for early termination plus deposit forfeited.\n   - Other services may not cover these aspects as comprehensively.\n\nConsider