# LlamaParse

In [27]:
import nest_asyncio

nest_asyncio.apply()

import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = ""

# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = ""

In [28]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

embed_model = OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

Settings.llm = llm
Settings.embed_model = embed_model

In [29]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="markdown").load_data("papers/attention_is_all_you_need.pdf")

Started parsing the file under job_id 7ee0b5d8-eb58-4ad0-bfab-809a3a7c1447
.

In [30]:
from llama_index.core import SimpleDirectoryReader
naive_documents = SimpleDirectoryReader("papers").load_data()

In [31]:
# naive pdf parser
print(naive_documents[0].get_content())

Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experime

In [32]:
#llamaparse 결과
print(documents[0].get_content())

# Attention Is All You Need

|Ashish Vaswani∗|Noam Shazeer∗|Niki Parmar∗|Jakob Uszkoreit∗|
|---|---|---|---|
|Google Brain|Google Brain|Google Research|Google Research|
|avaswani@google.com|noam@google.com|nikip@google.com|usz@google.com|

|Llion Jones∗|Aidan N. Gomez∗ †|Łukasz Kaiser∗|
|---|---|---|
|Google Research|University of Toronto|Google Brain|
|llion@google.com|aidan@cs.toronto.edu|lukaszkaiser@google.com|

Illia Polosukhin∗ ‡

illia.polosukhin@gmail.com

# Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable an

In [33]:
naive_index = VectorStoreIndex.from_documents(naive_documents)
lp_index = VectorStoreIndex.from_documents(documents)

In [34]:
naive_engine = naive_index.as_query_engine()
lp_engine = lp_index.as_query_engine()

In [35]:
naive_response = naive_engine.query('What is self-attention mechanism?')
print(naive_response.response)

The self-attention mechanism connects all positions in a sequence with a constant number of sequentially executed operations. It allows each position to attend to all other positions in the sequence, calculating attention scores based on the similarity between the position's representation and the representations of other positions.


In [36]:
lp_response = lp_engine.query('What is self-attention mechanism?')
print(lp_response.response)

Self-attention mechanism is a key component in neural network architectures that allows the model to weigh the importance of different words in a sequence when processing information. It enables the model to focus on relevant parts of the input sequence by assigning different attention weights to different words, helping the model capture long-range dependencies and improve performance in tasks involving sequential data processing.


In [37]:
naive_response = naive_engine.query('where is aidan gomez from?')
print(naive_response.response)

Aidan Gomez is from an unspecified location based on the provided context information.


In [38]:
print(naive_documents[0].get_content())

Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experime

In [39]:
lp_response = lp_engine.query('where is aidan gomez from?')
print(lp_response.response)

The information provided does not contain any details about Aidan Gomez's origin or nationality.


In [40]:
print(documents[0].get_content())

# Attention Is All You Need

|Ashish Vaswani∗|Noam Shazeer∗|Niki Parmar∗|Jakob Uszkoreit∗|
|---|---|---|---|
|Google Brain|Google Brain|Google Research|Google Research|
|avaswani@google.com|noam@google.com|nikip@google.com|usz@google.com|

|Llion Jones∗|Aidan N. Gomez∗ †|Łukasz Kaiser∗|
|---|---|---|
|Google Research|University of Toronto|Google Brain|
|llion@google.com|aidan@cs.toronto.edu|lukaszkaiser@google.com|

Illia Polosukhin∗ ‡

illia.polosukhin@gmail.com

# Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable an

In [41]:
naive_response = naive_engine.query('What is the training cost of Deep-Att + PosUnk on EN-FR task?')
print(naive_response.response)

2.3·10^19


In [42]:
naive_response.source_nodes

[NodeWithScore(node=TextNode(id_='d940a3ff-14d1-431b-9121-22b35336a9e2', embedding=None, metadata={'page_label': '8', 'file_name': 'attention_is_all_you_need.pdf', 'file_path': '/Users/doosolini/Downloads/streamlit_test/papers/attention_is_all_you_need.pdf', 'file_type': 'application/pdf', 'file_size': 2215244, 'creation_date': '2024-08-21', 'last_modified_date': '2024-08-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='81493d42-6698-42e2-a583-c91643970102', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '8', 'file_name': 'attention_is_all_you_need.pdf', 'file_path': '/Users/doosolini/Downloads/streamlit_test/papers/attention_is_all_you_need.pdf', 'file_type': 'application/pdf', 'file_si

In [43]:
print(naive_documents[7].get_content())

Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the
English-to-German and English-to-French newstest2014 tests at a fraction of the training cost.
ModelBLEU Training Cost (FLOPs)
EN-DE EN-FR EN-DE EN-FR
ByteNet [18] 23.75
Deep-Att + PosUnk [39] 39.2 1.0·1020
GNMT + RL [38] 24.6 39.92 2.3·10191.4·1020
ConvS2S [9] 25.16 40.46 9.6·10181.5·1020
MoE [32] 26.03 40.56 2.0·10191.2·1020
Deep-Att + PosUnk Ensemble [39] 40.4 8.0·1020
GNMT + RL Ensemble [38] 26.30 41.16 1.8·10201.1·1021
ConvS2S Ensemble [9] 26.36 41.29 7.7·10191.2·1021
Transformer (base model) 27.3 38.1 3.3·1018
Transformer (big) 28.4 41.8 2.3·1019
Residual Dropout We apply dropout [ 33] to the output of each sub-layer, before it is added to the
sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the
positional encodings in both the encoder and decoder stacks. For the base model, we use a rate of
Pdrop= 0.1.
Label Smoothing During training, w

In [44]:
lp_response = lp_engine.query('What is the training cost of Deep-Att + PosUnk on EN-FR task?')
print(lp_response.response)

The training cost of Deep-Att + PosUnk on the English-to-French task is 1.0 · 10^2020.


In [45]:
print(documents[7].get_content())

# Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the English-to-German and English-to-French newstest2014 tests at a fraction of the training cost.

|Model| |BLEU| |Training Cost (FLOPs)|
|---|---|---|---|---|
|ByteNet [18]| |23.75| | |
|Deep-Att + PosUnk [39]| | |39.2|1.0 · 10^2020|
|GNMT + RL [38]|24.6|39.92|2.3 · 10^1819|1.4 · 10^20|
|ConvS2S [9]|25.16|40.46|9.6 · 10^19|1.5 · 10^20|
|MoE [32]|26.03|40.56|2.0 · 10|1.2 · 10^20|
|Deep-Att + PosUnk Ensemble [39]| | |40.4|8.0 · 10^21|
|GNMT + RL Ensemble [38]|26.30|41.16|1.8 · 10^1920|1.1 · 10^21|
|ConvS2S Ensemble [9]|26.36|41.29|7.7 · 10|1.2 · 10|
|Transformer (base model)|27.3|38.1|3.3 · 10^1918| |
|Transformer (big)|28.4|41.8|2.3 · 10| |

Residual Dropout We apply dropout [33] to the output of each sub-layer, before it is added to the sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the positional encodings in both the encoder and decoder st

In [46]:
naive_response = naive_engine.query('What is the training cost of ByteNet?')
print(naive_response.response)

The training cost of ByteNet is 1.0·10^20 FLOPs.


In [47]:
lp_response = lp_engine.query('What is the training cost of ByteNet?')
print(lp_response.response)

The training cost of ByteNet is not provided in the table.
