In [1]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from dotenv import load_dotenv


In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
load_dotenv()

True

In [5]:

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")
GOOGLE_KEY = os.getenv("GOOGLE_API_KEY")


Get the article

In [6]:
article_path = os.path.join(os.getcwd(), "NIPS-2017-attention-is-all-you-need-Paper.pdf")

In [7]:


loader = PyPDFLoader(article_path)
docs = loader.load_and_split()

Split the article into chunks of characters 

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700)
documents = text_splitter.split_documents(docs)
documents

[Document(page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention', metadata={'source': '/home/varad/Work/Simple_agent/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'page': 0}),
 Document(page_content='convolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism.

In [9]:
len(documents)

67

In [12]:
print(documents[30].page_content)

Table 1: Maximum path lengths, per-layer complexity and minimum number of sequential operations
for different layer types. nis the sequence length, dis the representation dimension, kis the kernel
size of convolutions and rthe size of the neighborhood in restricted self-attention.
Layer Type Complexity per Layer Sequential Maximum Path Length
Operations
Self-Attention O(n2·d) O(1) O(1)
Recurrent O(n·d2) O(n) O(n)
Convolutional O(k·n·d2)O(1) O(logk(n))
Self-Attention (restricted) O(r·n·d)O(1) O(n/r)
bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel
as the embeddings, so that the two can be summed. There are many choices of positional encodings,


In [13]:
len(documents[30].page_content)

700

Save the chunks to the vector database.

In [14]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [15]:
embeddings.embed_documents(["hello how are you?"])

[[0.04290894,
  0.010860758,
  -0.0695875,
  -0.014248739,
  0.07053263,
  0.004657977,
  0.012642296,
  -0.024851436,
  -0.0027408386,
  0.039466925,
  -0.028003572,
  0.018725859,
  -0.031486686,
  0.009888926,
  -0.025231062,
  -0.024300467,
  0.0059706005,
  0.009758454,
  0.0340696,
  -0.010193446,
  0.026648272,
  0.03091546,
  -0.0010330754,
  0.009947902,
  0.029554654,
  -0.032269068,
  0.005008308,
  -0.023405911,
  -0.04513908,
  -0.0037998473,
  -0.04387463,
  0.019480545,
  -0.034257423,
  0.011855205,
  0.026713785,
  -0.06557813,
  0.0029399134,
  0.02638414,
  -0.016712325,
  0.014347559,
  0.014496104,
  -0.041622728,
  -0.036522344,
  0.010547787,
  -0.001901172,
  0.0046484885,
  -0.058222197,
  0.027973374,
  0.0125011625,
  -0.060385894,
  0.03896192,
  0.03166357,
  0.0681432,
  -0.036142062,
  -0.017825482,
  -0.041675087,
  0.050051928,
  -0.0011736911,
  -0.011245955,
  0.021568865,
  -0.0043471856,
  -0.0095727425,
  0.04433018,
  -0.0011613648,
  -0.032459017

In [16]:
vector = FAISS.from_documents(documents, embeddings)

In [17]:
retriever = vector.as_retriever()

In [18]:
retriever.invoke("What is the decoder?")

[Document(page_content='Decoder: The decoder is also composed of a stack of N= 6identical layers. In addition to the two\nsub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head\nattention over the output of the encoder stack. Similar to the encoder, we employ residual connections\naround each of the sub-layers, followed by layer normalization. We also modify the self-attention\nsub-layer in the decoder stack to prevent positions from attending to subsequent positions. This\nmasking, combined with fact that the output embeddings are offset by one position, ensures that the\npredictions for position ican depend only on the known outputs at positions less than i.\n3.2 Attention', metadata={'source': '/home/varad/Work/Simple_agent/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'page': 2}),
 Document(page_content='Similarly to other sequence transduction models, we use learned embeddings to convert the input\ntokens and output tokens to vectors o

In [19]:
# create tool for retrieving documents
from langchain.tools.retriever import create_retriever_tool

In [20]:
retriever_tool = create_retriever_tool(
    retriever=retriever,
    name="transformer_search",
    description="Search for information about transformers in AI and attention mechanism. \
        For any questions about the transformer architecture and attention mechanism, you must use this tool!",
)

In [21]:
# tool for connecting to the web
search = TavilySearchResults()

In [22]:
search.dict()

{'name': 'tavily_search_results_json',
 'description': 'A search engine optimized for comprehensive, accurate, and trusted results. Useful for when you need to answer questions about current events. Input should be a search query.',
 'args_schema': langchain_community.tools.tavily_search.tool.TavilyInput,
 'return_direct': False,
 'verbose': False,
 'tags': None,
 'metadata': None,
 'handle_tool_error': False,
 'handle_validation_error': False,
 'api_wrapper': {'tavily_api_key': SecretStr('**********')},
 'max_results': 5}

In [23]:
# list of tools to be used by the agent
tools = [search, retriever_tool]

In [24]:
# create the llm
from langchain_google_genai import GoogleGenerativeAI
llm = GoogleGenerativeAI(model="gemini-pro", google_api_key=GOOGLE_KEY)


Create the agent

In [25]:
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

In [26]:
# get the prompt for execution
prompt = hub.pull("hwchase17/react")

In [27]:
print(prompt.template)

Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}


In [28]:
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)

In [29]:
agent_executer = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [30]:
agent_executer.invoke({"input":"what is the weather in New Delhi"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should search for the weather in New Delhi
Action: tavily_search_results_json
Action Input: what is the weather in New Delhi[0m[36;1m[1;3m[{'url': 'https://www.accuweather.com/en/in/new-delhi/187745/weather-forecast/187745', 'content': 'New Delhi, Delhi, India Weather Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days.'}, {'url': 'https://www.timeanddate.com/weather/india/new-delhi', 'content': 'Current weather in New Delhi and forecast for today, tomorrow, and next 14 days'}, {'url': 'https://www.bbc.com/weather/1261481', 'content': 'Observed at 08:30, Wednesday 20 March BBC Weather in association with MeteoGroup, external All times are India Standard Time (Asia/Kolkata, GMT+5:30) unless otherwise stated ...'}, {'url': 'https://www.timeanddate.com/weather/india/new-delhi/ext', 'content': 'New Delhi 14 Day Extended Forecast. Time/General. Weather. Time Zone. DST Changes. Sun & Mo

{'input': 'what is the weather in New Delhi',
 'output': 'The current weather in New Delhi is sunny and 83 degrees Fahrenheit.'}

In [31]:
agent_executer.invoke({"input":"What is the scaled dot-product in attention mechanism?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mtransformer_search has better information about transformer model.
Action: transformer_search
Action Input: Scaled dot-product attention[0m[33;1m[1;3mpredictions for position ican depend only on the known outputs at positions less than i.
3.2 Attention
An attention function can be described as mapping a query and a set of key-value pairs to an output,
where the query, keys, values, and output are all vectors. The output is computed as a weighted sum
of the values, where the weight assigned to each value is computed by a compatibility function of the
query with the corresponding key.
3.2.1 Scaled Dot-Product Attention
We call our particular attention "Scaled Dot-Product Attention" (Figure 2). The input consists of
queries and keys of dimension dk, and values of dimension dv. We compute the dot products of the
3

between any two positions in the network. Convolutional layers are generally more expensive than
recurrent layers

{'input': 'What is the scaled dot-product in attention mechanism?',
 'output': 'Scaled dot-product attention calculates the dot products between queries and keys of dimension dk, then divides each value by √dk and finally applies a softmax function to the values.'}

In [50]:
agent_executer.invoke({"input":"What is multi-head attention?\
                       What are some latest advancements on that topic? "})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI can use the transformer_search tool to get information about multi-head attention and the latest advancements.
Action: transformer_search
Action Input: Explain multi-head attention mechanism in detail. Also mention its latest advancements.[0m[33;1m[1;3mMulti-head attention allows the model to jointly attend to information from different representation
subspaces at different positions. With a single attention head, averaging inhibits this.
4To illustrate why the dot products get large, assume that the components of qandkare independent random
variables with mean 0and variance 1. Then their dot product, q·k=∑dk
i=1qiki, has mean 0and variance dk.
4

The Transformer uses multi-head attention in three different ways:
•In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder. This allows every
position in the decoder to attend o

{'input': 'What is multi-head attention?                       What are some latest advancements on that topic? ',
 'output': 'Multi-head attention is a mechanism that allows a model to attend to different parts of a sequence in parallel. It is implemented by using multiple attention heads, each of which is responsible for attending to a different part of the sequence. The outputs of the attention heads are then concatenated to form the final output of the multi-head attention layer.\nMulti-head attention has been shown to improve the performance of transformer models on a variety of tasks, including natural language processing and machine translation. Some of the latest advancements in multi-head attention include the use of scaled dot-product attention, which is a more efficient way to compute the attention scores, and the use of relative positional encoding, which helps the model learn about the relative positions of the elements in the sequence.'}