# Query Engine and Chat Engine

### Content
- Flow : user input -> Query Engine -> Output
- Query Engine Functionality: 
- Pros and Cons of Query Engine
- Type of engines: Query Engine and Chat Engine. What is the differnce between them.
- Understand when to use which Engine?

### Setup

In [1]:
import os

In [2]:
from dotenv import load_dotenv, find_dotenv
load_dotenv('D:/Training/FAA-Training/Beyond-the-Prompt-Practical-RAG-for-Real-World-AI/RAG-systems-using-LlamaIndex/RAG-System-Using-LamaIndex/.env')

True

In [3]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

### Download Data

In [4]:
!mkdir data
!wget "https://arxiv.org/pdf/1706.03762" -O 'data/transformers.pdf'

A subdirectory or file data already exists.
'data/transformers.pdf': No such file or directory


In [5]:
from pathlib import Path
from llama_index.readers.file import PDFReader

In [6]:
loader = PDFReader()

In [7]:
documents = loader.load_data(file=Path('./data/transformers.pdf'))

In [8]:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)

2025-08-29 12:01:48,934 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [9]:
# configure retriever
retriever = index.as_retriever()

In [10]:
#configure response synthesizer
from llama_index.core import get_response_synthesizer

response_synthesizer = get_response_synthesizer(response_mode="compact")

# Query Engine

In [11]:
query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)

In [12]:
response = query_engine.query("Give me the authors of transformers paper")
print(response)

2025-08-29 12:02:25,248 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-29 12:02:26,815 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.


In [13]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='6279e4f9-e34d-4152-b2d5-5348898dad28', embedding=None, metadata={'page_label': '1', 'file_name': 'transformers.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f202ca4a-5156-4969-8871-8b4ec4589172', node_type='4', metadata={'page_label': '1', 'file_name': 'transformers.pdf'}, hash='d1fd71a3296330e6927a5595de7704d9ed873e2962c42deb643b63aa165bdd98')}, metadata_template='{key}: {value}', metadata_separator='\n', text='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAi

In [14]:
response.source_nodes[0].dict()

{'node': {'id_': '6279e4f9-e34d-4152-b2d5-5348898dad28',
  'embedding': None,
  'metadata': {'page_label': '1', 'file_name': 'transformers.pdf'},
  'excluded_embed_metadata_keys': [],
  'excluded_llm_metadata_keys': [],
  'relationships': {'1': {'node_id': 'f202ca4a-5156-4969-8871-8b4ec4589172',
    'node_type': '4',
    'metadata': {'page_label': '1', 'file_name': 'transformers.pdf'},
    'hash': 'd1fd71a3296330e6927a5595de7704d9ed873e2962c42deb643b63aa165bdd98',
    'class_name': 'RelatedNodeInfo'}},
  'metadata_template': '{key}: {value}',
  'metadata_separator': '\n',
  'text': 'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jo

In [15]:
response.source_nodes[0].text

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurren

In [16]:
print(response.source_nodes[0].get_content())

Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Exp

In [17]:
response = query_engine.query("What is the use of positional encoding?")
print(response)

2025-08-29 12:03:40,718 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-29 12:03:41,985 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Positional encoding is used in the model to provide information about the relative or absolute position of tokens in a sequence. This allows the model to understand the order of the sequence, as the Transformer architecture does not have inherent mechanisms like recurrence or convolution to capture sequential information.


In [18]:
response = query_engine.query("What is the use of positional encoding? Answer in approx 250 characters.")
print(response)

2025-08-29 12:04:06,189 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-29 12:04:07,317 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Positional encoding is used in transformers to provide information about the position of tokens in a sequence. It helps the model differentiate between different positions and maintain positional information, enabling the model to understand the order of tokens in the input sequence.


In [19]:
print(response.get_formatted_sources())

> Source (Doc id: 576fd079-d2e1-4951-b534-e57e63089df0): length n is smaller than the representation dimensionality d, which is most often the case with
s...

> Source (Doc id: 986d8f31-fdb3-4c8c-93ad-80174f4aa76a): output values. These are concatenated and once again projected, resulting in the final values, as...


In [20]:
response.metadata

{'576fd079-d2e1-4951-b534-e57e63089df0': {'page_label': '7',
  'file_name': 'transformers.pdf'},
 '986d8f31-fdb3-4c8c-93ad-80174f4aa76a': {'page_label': '5',
  'file_name': 'transformers.pdf'}}

In [21]:
len(response.response)

284

# Chat Engine

In [22]:
chat_engine = index.as_chat_engine(response_synthesizer=response_synthesizer)

In [23]:
response = chat_engine.chat("Give me the authors of transformers")
print(response)

2025-08-29 12:04:27,509 - INFO - Condensed question: Give me the authors of transformers
2025-08-29 12:04:28,011 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-29 12:04:29,299 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The authors of the paper on transformers are:
1. Ashish Vaswani
2. Noam Shazeer
3. Niki Parmar
4. Jakob Uszkoreit
5. Llion Jones
6. Aidan N. Gomez
7. Łukasz Kaiser
8. Illia Polosukhin

These authors have contributed to the research on the Transformer model as detailed in the provided document.


In [24]:
response = chat_engine.chat("What is the use of positional encoding?")
print(response)

2025-08-29 12:04:50,967 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-29 12:04:50,976 - INFO - Condensed question: What is the use of positional encoding in the Transformer model?
2025-08-29 12:04:51,537 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-29 12:04:53,356 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Positional encoding is used in the Transformer model to provide information about the position of tokens in the input sequence. Since the Transformer model does not inherently understand the order of tokens in a sequence like recurrent neural networks do, positional encoding is added to the input embeddings to give the model information about the position of tokens within the sequence.

By adding positional encoding to the input embeddings, the Transformer model can differentiate between tokens based on their position in the sequence. This allows the model to learn dependencies and relationships between tokens that are dependent on their position in the input sequence.

In summary, positional encoding is crucial in the Transformer model to help the model understand the sequential order of tokens in the input data, enabling it to effectively capture dependencies and relationships within the sequence data.
