In [1]:
!pip install langchain langchain-community langchain-chroma \
langchain-huggingface langchain-ollama chromadb \
sentence-transformers transformers python-dotenv tqdm



You should consider upgrading via the 'D:\amlgo_project\amlgo\Scripts\python.exe -m pip install --upgrade pip' command.


In [11]:
from pathlib import Path

PDF_PATH = Path("AI Training Document.pdf")
CHROMA_PATH = Path("vectordb2")
CHUNKS_JSON = Path("chunks/document_chunks.json")

EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 800
CHUNK_OVERLAP = 100
TOP_K = 3


In [12]:
!pip install pymupdf



You should consider upgrading via the 'D:\amlgo_project\amlgo\Scripts\python.exe -m pip install --upgrade pip' command.


In [13]:
import os, json
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

os.makedirs("chunks", exist_ok=True)

def save_chunks(chunks, filename=CHUNKS_JSON):
    data = [{"id": f"chunk_{i+1}", "text": c.page_content, "page": c.metadata.get("page")} for i, c in enumerate(chunks)]
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"📂 Chunks saved to {filename}")

loader = PyMuPDFLoader(str(PDF_PATH))
docs = loader.load()
print("Pages detected:", len(docs))

splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
chunks = splitter.split_documents(docs)
print("Chunks created:", len(chunks))
save_chunks(chunks)

embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
db = Chroma.from_documents(chunks, embedding=embeddings, persist_directory=str(CHROMA_PATH))
print(f"💾 Chroma DB saved at {CHROMA_PATH}")

Pages detected: 20
Chunks created: 106
📂 Chunks saved to chunks\document_chunks.json


  from .autonotebook import tqdm as notebook_tqdm
  return forward_call(*args, **kwargs)


💾 Chroma DB saved at vectordb2


In [14]:
from langchain_core.prompts import PromptTemplate
from langchain_ollama import ChatOllama

PROMPT_TEMPLATE = """
You are an expert assistant helping to answer questions based on a document that describes
a technical assignment for building a RAG-based chatbot.

Use the provided context (retrieved chunks from the document) to answer the user query.
If the context does not contain enough information, state clearly: "The document does not
provide this information."

Instructions:
- Answer in 2-4 clear sentences unless the question requires a detailed explanation.
- Quote or paraphrase relevant parts of the document when possible.
- Do not include information that is not supported by the context.

Context:
{context}

User: {question}
Answer:
"""

prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context", "question"])
llm = ChatOllama(model="llama3", base_url="http://localhost:11434")  # Ollama should be running locally

def rag_answer(query):
    docs = db.similarity_search(query, k=TOP_K)
    context = "\n\n".join(d.page_content for d in docs)
    final_prompt = prompt.format(context=context, question=query)
    return llm.invoke(final_prompt).content, docs

In [15]:
ans, srcs = rag_answer("What entities does the User Agreement mention for different countries?")
print(ans)
for s in srcs:
    print("Page:", s.metadata.get("page"), "->", s.page_content[:200], "...")

  return forward_call(*args, **kwargs)


According to the provided context, the User Agreement mentions two entities:

* For users residing in the United States: eBay Inc., located at 2025 Hamilton Ave., San Jose, CA 95125.
* For users residing in the UK: eBay (UK) Limited, located at 1 More London Place, London, SE1 2AF.

These are mentioned in section "The entity you are contracting with is:" under the "Introduction" part of the User Agreement.
Page: 6 -> to, buyers around the world. Many of our Services are accessible internationally. We offer certain 
programs, tools, and experiences of particular interest to international sellers and buyers, such as ...
Page: 18 -> to act on behalf of such business and bind the business to this User Agreement. Such account is 
owned and controlled by the business entity. No agency, partnership, joint venture, employee-
employer  ...
Page: 0 -> User Agreement 
1. Introduction 
This User Agreement, the Mobile Application Terms of Use, and all policies and additional terms 
posted on and in

In [16]:
ans, srcs = rag_answer("What actions can eBay take if a user violates policies?")
print(ans)
for s in srcs:
    print("Page:", s.metadata.get("page"), "->", s.page_content[:200], "...")

  return forward_call(*args, **kwargs)


According to the document, if eBay believes a user is violating their policies or abusing the Services in any way, they may take various actions, including:

* Limiting, restricting, suspending, or downgrading your seller account
* Limiting or terminating your user account(s) and access to our Services
* Delaying or removing hosted content
* Removing special status associated with your account(s)
* Reducing or eliminating any discounts
* Taking technical and/or legal steps to prevent you from using our Services

These actions are taken in eBay's sole discretion, without limiting other remedies.
Page: 2 -> our Services or post content that does not belong to you; 
• 
commercialize any eBay application or any information, data, or software associated with 
such application, except with the prior express  ...
Page: 1 -> are a person with whom transactions are prohibited under economic or trade sanctions; 
• 
fail to pay for items purchased by you, unless you have a valid reason as set out

In [17]:
ans, srcs = rag_answer("what is this doc about")
print(ans)
for s in srcs:
    print("Page:", s.metadata.get("page"), "->", s.page_content[:200], "...")

  return forward_call(*args, **kwargs)


This document appears to be an agreement to arbitrate, specifically outlining the terms and conditions for resolving disputes through arbitration. The document describes a process called "Batch Arbitration" which involves batching together similar demands or claims, appointing arbitrators for each batch, and resolving each batch on a consolidated basis with one set of filing and administrative fees.
Page: 16 -> resolution of discovery disputes; and (2) the adoption of an expedited calendar of the 
arbitration proceedings. This Batch Arbitration provision shall in no way be interpreted as 
authorizing a class ...
Page: 16 -> after the batching described above, a final batch consisting of the remaining Demands); (2) 
appoint one arbitrator for each batch; and (3) provide for the resolution of each batch on a 
consolidated  ...
Page: 13 -> IS AVAILABLE. Subject to this Agreement to Arbitrate, the arbitrator may award declaratory 
or injunctive relief only in favor of the individual party 