## Import PDF

In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Attention Is All Your Need.pdf")

data = loader.load()

In [4]:
len(data)

15

## Chunking

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000)
docs = text_splitter.split_documents(data)

print("Total: ", len(docs))

Total:  52


The purpose of the above process is to divide the text into smaller parts in order to understand and respond faster. 1000 characters were chosen so as not to lose context/meaning

In [6]:
docs[2]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'Attention Is All Your Need.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='best models from the literature. We show that the Transformer generalizes well to\nother tasks by applying it successfully to English constituency parsing both with\nlarge and limited training data.\n∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started\nthe effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and\nhas been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head\n

## Word Embeddings (Google Generative AI)

In [7]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv

In [8]:
load_dotenv()

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [9]:
vector = embeddings.embed_query("hello, world!")
vector[:5]

[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734263241291,
 0.01813093200325966]

## Saving Over ChromaDB

In [11]:
vectorstore = Chroma.from_documents(documents = docs, embedding=embeddings)

In [12]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k":10})

In [13]:
query = "What is encoder?"

retrieved_docs = retriever.invoke(query)

In [14]:
len(retrieved_docs)

10

In [15]:
print(retrieved_docs[5].page_content)

Attention Visualizations
Input-Input Layer5
It
is
in
this
spirit
that
a
majority
of
American
governments
have
passed
new
laws
since
2009
making
the
registration
or
voting
process
more
difficult
.
<EOS>
<pad>
<pad>
<pad>
<pad>
<pad>
<pad>
It
is
in
this
spirit
that
a
majority
of
American
governments
have
passed
new
laws
since
2009
making
the
registration
or
voting
process
more
difficult
.
<EOS>
<pad>
<pad>
<pad>
<pad>
<pad>
<pad>
Figure 3: An example of the attention mechanism following long-distance dependencies in the
encoder self-attention in layer 5 of 6. Many of the attention heads attend to a distant dependency of
the verb ‘making’, completing the phrase ‘making...more difficult’. Attentions here shown only for
the word ‘making’. Different colors represent different heads. Best viewed in color.
13


## LLM Invoke

In [16]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [17]:
llm = ChatGoogleGenerativeAI(
    model='gemini-1.5-pro',
    temeprature = 0.3,
    max_tokens=500
)

In [27]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [22]:
system_promt = (
                "You are an assistant for question-answering tasks"
                "Use the following pieces for retrieved context to answer"
                "If you don't know the answer, say that you don't know"
                "Use three sentences maximum and keep the answer corrects"
                "\n\n"
                "{context}"
                )

In [26]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_promt),
        ("human", "{input}")
    ]
)

## Create Question-Answering Chain (LLM + PROMPT)

In [28]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)

## RAG Chain (RAG + LLM)

In [29]:
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

## Generating an response by running a user query

In [30]:
response = rag_chain.invoke({"input":"What is encoder?"})

print(response)

{'input': 'What is encoder?', 'context': [Document(id='6cdcbd44-72a0-4fbc-9494-e4e8f95b37db', metadata={'author': '', 'creationdate': '2024-04-10T21:11:43+00:00', 'creator': 'LaTeX with hyperref', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'page': 1, 'page_label': '2', 'producer': 'pdfTeX-1.40.25', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'source': 'Attention Is All Your Need.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': '/False'}, page_content='Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence\nof continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output\nsequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive\n[10], consuming the previously generated symbols as additional input when generating the next.\n2'), Document(id='74425e74-3689-4d5c-9f8a-7bfb7b5a21a9', metada

In [33]:
print(response["answer"])

The encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn).  Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols.  The Transformer's encoder is composed of a stack of N = 6 identical layers.
