In [1]:
from endee import Endee
from sentence_transformers import SentenceTransformer
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
loader = PyPDFLoader(r"C:\Users\bhavy\PycharmProjects\PythonProject1\llm.pdf")
docs=loader.load()

In [8]:
#Text Splitters
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
chunks=text_splitter.split_documents(docs)

In [9]:
#Loading Embedding model
model=SentenceTransformer("all-MiniLM-L6-v2")
text=[doc.page_content for doc in chunks]
vectors=model.encode(text)

In [10]:
vectors

array([[ 0.0299766 , -0.05862331,  0.07544198, ...,  0.02781818,
         0.02469181,  0.01118146],
       [ 0.04924611, -0.08084214,  0.09697701, ..., -0.0180906 ,
        -0.01246059, -0.03512884],
       [ 0.03481973, -0.09109627,  0.01799844, ..., -0.02777154,
        -0.11951374, -0.00536834],
       ...,
       [-0.0452126 , -0.01945757,  0.03190787, ..., -0.00194283,
         0.02860099, -0.02823781],
       [-0.02253441, -0.00456246,  0.01028754, ...,  0.04631921,
         0.07505558, -0.05320664],
       [-0.04532268,  0.04679902, -0.06417809, ..., -0.01675811,
         0.00049432, -0.08918935]], shape=(603, 384), dtype=float32)

In [11]:
#Connecting to Endee
client=Endee()
index=client.get_index(name="documents")

In [12]:
items = []
for i, vec in enumerate(vectors):
    items.append({
    "id": f"chunk_{i}",
    "vector": vec.tolist(),
    "meta": {
    "text": text[i],
    "page": chunks[i].metadata.get("page", 0),
    "source": "pdf"
    }
    })

In [13]:
index.upsert(items)
print("✅ PDF indexed using LangChain + Endee")

✅ PDF indexed using LangChain + Endee


In [14]:
query = "What is LLM"

SEMANTIC SEARCH

In [15]:
query_vec = model.encode([query])[0]
results = index.query(vector=query_vec.tolist(), top_k=2)
for r in results:
    print(r["meta"]["text"][:300])

LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering
the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise
yet comprehensive overview of the rec
surveys in [54, 55, 56, 57, 58]. In contrast to these surveys, our
contribution focuses on providing a comprehensive yet concise
overview of the general direction of LLM research. This arti-
cle summarizes architectural and training details of pre-trained
LLMs and delves deeper into the details of c


In [16]:
query = "What is Tokenizatiom"

In [17]:
query_vec = model.encode([query])[0]
results = index.query(vector=query_vec.tolist(), top_k=2)
for r in results:
    print(r["meta"]["text"][:300])

2. Background
We provide the relevant background to understand the fun-
damentals related to LLMs in this section. We briefly discuss
necessary components in LLMs and refer the readers interested
in details to the original works.
2.1. Tokenization
Tokenization [59] is an essential pre-processing ste
Next Token Standard - BPE+ Pre-LayerLearned GeLU - 96 96 12288ERNIE 3.0 (10B) Causal-Dec Next Token Standard - WordPiece Post-LayerRelative GeLU - 48 64 4096Jurassic-1 (178B) Causal-Dec Next Token Standard 256k SentencePiece∗ Pre-LayerLearned GeLU ✓ 76 96 13824HyperCLOV A (82B)Causal-Dec Next TokenD


RAG PIPELINE

In [18]:
from langchain_ollama import ChatOllama
llm=ChatOllama(model="gemma3:4b")

In [24]:
def ask_pdf(query,top_k=5):
    query_vec = model.encode([query])[0]
    results = index.query(vector=query_vec.tolist(),top_k=top_k)
    context = "\n\n".join(r["meta"]["text"] for r in results)
    prompt = f"""
    Answer the question using ONLY the context below.
    If the answer is not in the context, say "I don't know".
    Context:
    {context}
    Question:
    {query}
    """
    response=llm.invoke(prompt)
    print(response.content)


In [25]:
q="What is Tokenization"
print(ask_pdf(q))


Tokenization [59] is an essential pre-processing step in LLM training that parses the text into non-decomposing units called tokens. Tokens can be characters, subwords, symbols, or words, depending on the tokenization process.
None


In [26]:
q="What is Self Attention"
print(ask_pdf(q))

Self-attention has O(n2) time complexity which becomes infeasible for large sequences. To speed up the computation, sparse attention iteratively calculates attention in sliding windows for speed gains.
None


In [29]:
q="What is LLM"
print(ask_pdf(q))

LLMs can play user-defined roles and behave like a specific domain expert. In multi-agent systems, each LLM is assigned a unique role, simulating human behavior and collaborating with other agents to complete a complex task.
None
