### Load environment variables

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [4]:
from llama_index.core import SimpleDirectoryReader,VectorStoreIndex

documents = SimpleDirectoryReader('data').load_data()
documents

[Document(id_='9da0aae0-5f38-449f-9806-48bd158db24a', embedding=None, metadata={'page_label': '1', 'file_name': 'attention all you need.pdf', 'file_path': 'd:\\GenerativeAI_scratch_to_advanced\\GenerativeAI_Scratch_to_Advanced\\Llama_Index\\02_RAG_LLM_App_using_LlamaIndex\\data\\attention all you need.pdf', 'file_type': 'application/pdf', 'file_size': 2215244, 'creation_date': '2024-03-21', 'last_modified_date': '2024-01-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nno

In [5]:
len(documents)

27

## Convert this documents into the vector index

In [6]:

index=VectorStoreIndex.from_documents(documents,show_progress=True)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 27/27 [00:00<00:00, 235.22it/s]
Generating embeddings: 100%|██████████| 35/35 [00:03<00:00, 10.75it/s]


In [7]:
index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x1ff61515ed0>

### let's retrieve the information from the vector index

In [8]:
query_engine = index.as_query_engine()

In [9]:
query_engine

<llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x1ff6159cac0>

In [10]:
response = query_engine.query('what is transformer')

In [11]:
print(response)

The Transformer is a sequence transduction model based entirely on attention, which replaces the recurrent layers commonly used in encoder-decoder architectures with multi-headed self-attention. It consists of stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The encoder is made up of a stack of identical layers, each containing a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The decoder, also composed of a stack of identical layers, includes an additional sub-layer for multi-head attention over the encoder stack's output. The Transformer model architecture allows for faster training compared to architectures based on recurrent or convolutional layers and has shown state-of-the-art performance in translation tasks.


In [12]:
response2 = query_engine.query('what is gpt')
print(response2)

Generative Pre-Training


## lets get deep dive in the Llama index

##### which is the most suitable response for the query it will give you...
- that means on the basis of similarity top most result you will bget ove here

In [13]:
from llama_index.core.response.pprint_utils import pprint_response

In [14]:
pprint_response(response, show_source= True)
print(response)

Final Response: The Transformer is a sequence transduction model based
entirely on attention, which replaces the recurrent layers commonly
used in encoder-decoder architectures with multi-headed self-
attention. It consists of stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder. The encoder is made
up of a stack of identical layers, each containing a multi-head self-
attention mechanism and a position-wise fully connected feed-forward
network. The decoder, also composed of a stack of identical layers,
includes an additional sub-layer for multi-head attention over the
encoder stack's output. The Transformer model architecture allows for
faster training compared to architectures based on recurrent or
convolutional layers and has shown state-of-the-art performance in
translation tasks.
______________________________________________________________________
Source Node 1/2
Node ID: f6d009fa-cc14-472b-899c-e049080311c6
Similarity: 0.7815099759304441

#### What if I need more than 2 responses for that

In [23]:
# commented  are depreicated versions in this cell

# from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine
# from llama_index.core.indices.postprocessor import SimilarityPostprocessor
from llama_index.core.postprocessor import SimilarityPostprocessor

In [15]:
from llama_index.core.indices.vector_store.retrievers.retriever import VectorIndexRetriever

from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine

from llama_index.core.postprocessor import SimilarityPostprocessor

In [16]:
retriever=VectorIndexRetriever(index=index,similarity_top_k=4)

query_engine=RetrieverQueryEngine(retriever=retriever)


In [17]:
response = query_engine.query('what is attention all you need')
from llama_index.core.response.pprint_utils import pprint_response
pprint_response(response, show_source= True)
print(response)

Final Response: The "Attention Is All You Need" paper introduces a new
network architecture called the Transformer, which is based solely on
attention mechanisms. This architecture eliminates the need for
complex recurrent or convolutional neural networks typically used in
sequence transduction models. The Transformer model has shown superior
performance in quality, parallelizability, and training efficiency
compared to traditional models. It achieves significant improvements
in machine translation tasks and establishes state-of-the-art results
with less training time and resources.
______________________________________________________________________
Source Node 1/4
Node ID: 61ee83cf-5894-4bce-b646-19149d1c40e8
Similarity: 0.832215948238621
Text: Provided proper attribution is provided, Google hereby grants
permission to reproduce the tables and figures in this paper solely
for use in journalistic or scholarly works. Attention Is All You Need
Ashish Vaswani∗ Google Brain avaswani@goo

#### see here we got the 4 similarity based answer and we choose the top one.

### If i want to see the similarity is above 80% , it as an act like as thresold. this is handled by SimilarityPostprocessor

In [30]:

retriever=VectorIndexRetriever(index=index,similarity_top_k=4)
postprocessor=SimilarityPostprocessor(similarity_cutoff=0.80)

query_engine=RetrieverQueryEngine(retriever=retriever,
                                  node_postprocessors=[postprocessor])

In [18]:
from llama_index.core.indices.vector_store.retrievers.retriever import VectorIndexRetriever
from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.core.response.pprint_utils import pprint_response

retriever=VectorIndexRetriever(index=index,similarity_top_k=4)
postprocessor=SimilarityPostprocessor(similarity_cutoff=0.80)

query_engine=RetrieverQueryEngine(retriever=retriever,
                                  node_postprocessors=[postprocessor])


response = query_engine.query('what is attention all you need')
pprint_response(response, show_source= True)
print(response)

Final Response: The term "Attention Is All You Need" refers to a new
simple network architecture called the Transformer, which is based
solely on attention mechanisms. This architecture eliminates the need
for complex recurrent or convolutional neural networks typically used
in sequence transduction models, by connecting the encoder and decoder
through attention mechanisms. The Transformer model has shown superior
performance in quality, parallelizability, and training efficiency
compared to traditional models.
______________________________________________________________________
Source Node 1/1
Node ID: 61ee83cf-5894-4bce-b646-19149d1c40e8
Similarity: 0.8322257557492051
Text: Provided proper attribution is provided, Google hereby grants
permission to reproduce the tables and figures in this paper solely
for use in journalistic or scholarly works. Attention Is All You Need
Ashish Vaswani∗ Google Brain avaswani@google.comNoam Shazeer∗ Google
Brain noam@google.comNiki Parmar∗ Google Res

### storage the indexing locally and then the as per queryig take from vector store

In [19]:
import os.path
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)


In [20]:
import os.path
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)


# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What are transformers?")
print(response)

Transformers are a model architecture that relies entirely on an attention mechanism to establish global dependencies between input and output, without using recurrence. They allow for significantly more parallelization compared to recurrent models, enabling faster training and improved performance in tasks such as translation.


##### ********************************** finish **************************