In [2]:
# !pip install langchain-community==0.2.4 langchain==0.2.3 faiss-cpu==1.8.0 unstructured==0.14.5 unstructured[pdf]==0.14.5 transformers==4.41.2 sentence-transformers==3.0.1

In [13]:
# !pip install -U langchain-huggingface

In [24]:
import os

from langchain_community.llms import Ollama
from langchain_community.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

In [38]:
# loading the LLM
llm = Ollama(
    model="llama3.1:8b",
    temperature=0
)

In [39]:
# Use the correct filename with the double .pdf extension
loader = PyPDFLoader("NIPS-2017-attention-is-all-you-need-Paper.pdf.pdf")
documents = loader.load()

In [40]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

In [41]:
text_chunks = text_splitter.split_documents(documents)

In [42]:
print(len(documents))
print(documents[0].page_content if documents else "No content")

15
Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Exper

In [43]:
# loading the vector embedding model
embeddings = HuggingFaceEmbeddings()



In [44]:
knowledge_base = FAISS.from_documents(text_chunks, embeddings)

In [45]:
# retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=knowledge_base.as_retriever())

In [46]:
question = "What is this document about?"
response = qa_chain.invoke({"query": question})
print(response["result"])

This document appears to be a research paper or article discussing the application of attention mechanisms in natural language processing (NLP) models, specifically in relation to resolving anaphora (a type of linguistic reference where a pronoun refers back to a previously mentioned noun or phrase). The text includes figures and visualizations illustrating the attention mechanism's behavior in different layers of a neural network model.


In [47]:
question = "What is the architecture discussed in the model?"
response = qa_chain.invoke({"query": question})
print(response["result"])

The architecture discussed in the model is a stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, as shown in Figure 1. Specifically:

* The encoder consists of a stack of N=6 identical layers, each with two sub-layers:
	+ A multi-head self-attention mechanism
	+ A simple, position-wise fully connected feed-forward network
* The decoder also consists of a stack of N=6 identical layers, each with three sub-layers:
	+ A multi-head attention over the output of the encoder stack
	+ Two additional sub-layers similar to those in the encoder


In [49]:
question = "What are use cases for this model?"
response = qa_chain.invoke({"query": question})
print(response["result"])

According to Section 3.2.3, Applications of Attention in our Model, the Transformer uses multi-head attention in three different ways:

* In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the keys and values come from the encoder output.
* This model is used for machine translation tasks, as evidenced by the results presented in Table 3, which compares different settings of the model on a development set (newstest2013).
 

So, one use case for this model is machine translation.
