In [21]:
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyMuPDFLoader("sample1.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 100
)
chunks = splitter.split_documents(pages)

for i, chunk in enumerate(chunks[:5]):
    print(f"Chunk {i+1}:\n{chunk.page_content[:300]}\n{'-'*40}")

Chunk 1:
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and ﬁgures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani→
Google Brain
avaswani@google.com
Noam Shazeer→
Google Brain
noam@google.com
Niki Parm
----------------------------------------
Chunk 2:
Google Research
llion@google.com
Aidan N. Gomez→†
University of Toronto
aidan@cs.toronto.edu
!ukasz Kaiser→
Google Brain
lukaszkaiser@google.com
Illia Polosukhin→‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural net
----------------------------------------
Chunk 3:
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks sh

In [22]:
def clean_chunks(chunks):
    filtered = []
    for doc in chunks:
        text = doc.page_content.strip()

        if len(text) < 100:
            continue
        if any(phrase in text.lower() for phrase in ["permission", "@", "license", "figure", "©", "reproduce"]):
            continue
        filtered.append(doc)
    return filtered

cleaned_chunks = clean_chunks(chunks)

for i, chunk in enumerate(cleaned_chunks[:5]):
    print(f"Chunk {i+1}:\n{chunk.page_content[:300]}\n{'-'*40}")


Chunk 1:
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these mo
----------------------------------------
Chunk 2:
less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-
to-German translation task, improving over the existing best results, including
ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task,
our model establishes a new single-model state-of-the-art BLEU scor
----------------------------------------
Chunk 3:
best models from the literature. We show that the Transformer generalizes well to
other tasks by applying it successfully to English constituency parsing both with
large and limited training data.
→Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attent

now the text is split, so let us embed it and store in faiss

In [23]:
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings

embedding_model = SentenceTransformerEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

db = FAISS.from_documents(cleaned_chunks, embedding_model)
db.save_local('faiss_index')

In [24]:
db = FAISS.load_local("faiss_index", embedding_model, allow_dangerous_deserialization=True)

query = "How does the self-attention mechanism work?"
docs = db.similarity_search(query, k=4)

for doc in docs:
    print(doc.page_content[:500])
    print("-" * 50)



described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
of a single sequence in order to compute a representation of the sequence. Self-attention has been
used successfully in a variety of tasks including reading comprehension, abstractive summarization,
textual entailment and learning task-independent sentence representations [4, 27, 28, 22].
--------------------------------------------------
the approach we take in our model.
As side beneﬁt, self-attention could yield more interpretable models. We inspect attention distributions
from our models and present and discuss examples in the appendix. Not only do individual attention
heads clearly learn to perform different tasks, many appear to exhibit behavior related to the syntactic
and semantic structure of the sentences.
5
Training
This section describes the training regime for our models.
5.1
Training Data and Batching
-------------------------------------------

In [25]:
query = "How does the transformer encoder work?"
docs = db.similarity_search(query, k=4)

for doc in docs:
    print(doc.page_content[:500])
    print("-" * 50)



The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder. This allows every
position in the decoder to attend over all positions in the input sequence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence models such as
[38, 2, 9].
--------------------------------------------------
To the best of our knowledge, however, the Transformer is the ﬁrst transduction model relying
entirely on self-attention to compute representations of its input and output without using sequence-
aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate
self-attention and discuss its advantages over models such as [17, 18] and [9].
3
Model Architecture
Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35].
----------------------