<a href="https://colab.research.google.com/github/SamurAIGPT/LlamaIndex-course/blob/main/retrievers/Retrievers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install llama-index

# Retrievers

In LlamaIndex retrievers are responsible for fetching the most relevant context given a user query. A retriever is built on top of an index and specifies how the nodes needs to be fetched from an index. We have discussed indices in the previous lesson

LlamaIndex supports many types of retrievers, including the following:

**Vector Store Retriever**

The Vector Store Retriever fetches the top-k most similar nodes from a vector store index. Mode has no significance here

**List Retriever**

You can use the List Retriever to fetch all nodes from a list index. This retriever supports two modes: default and embeddings.

Default mode fetches all the nodes while embeddings mode fetches top-k nodes using embeddings

**Tree Retriever**

The Tree Retriever, as its name suggests, extracts nodes from a hierarchical tree of nodes. This retriever supports many different modes, the default is select_leaf.

**Keyword Table Retriever**

The Keyword Table Retriever extracts keywords from the query and uses them to find nodes having matching keywords.

The retriever supports three different modes: default, simple, and rake.

**Knowledge Graph Retriever**

The Knowledge Graph Retriever fetches nodes from a hierarchical tree of nodes.

It supports keywords, embeddings, and hybrid mode. Hybrid mode uses both keywords and embeddings to find relevant triplets.

Here is a code example of a retriever built on a list index

In [6]:
from llama_index import ListIndex
from llama_index import download_loader

YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")

loader = YoutubeTranscriptReader()
docs = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=nHcbHdgVUJg&ab_channel=WintWealth'])
list_index = ListIndex(docs)
retriever = list_index.as_retriever(
    retriever_mode='embedding',
)

# Node postprocessor

Node postprocessors in LlamaIndex take a set of nodes and apply a transformation or filtering before returning them.

Node postprocessors are most commonly used in a query engine, after the retrieval step and before the response synthesis step.

For instance, you can require specific keywords to be absent or present in the retrieved nodes. You can also rank results based on attributes such as time.

There are many types of post-processors, let's discuss few of them:

**SimilarityPostprocessor**

Allows you to require the retrieved nodes to have a minimum similarity score.

**KeywordNodePostprocessor**

Allows you to require certain keywords to be present in nodes.

**TimeWeightedPostprocessor**

The TimeWeightedPostprocessor ranks the nodes by their recency.

**PIINodePostprocessor**

The PIINodeprocessor masks Personally Identifiable Information (PII) in the text using a local LLM such as StableLM. For example, replaces a credit card number with [CREDIT_CARD_NUMBER].

**FixedRecencyPostprocessor**

If the query involves a time factor (temporal query), this post processor orders the nodes by date and returns the first k nodes to the response synthesizes.

Let's see a code example of a SimilarityPostprocessor

In [9]:
from llama_index.indices.postprocessor import SimilarityPostprocessor
from llama_index.schema import Node, NodeWithScore

nodes = [
  NodeWithScore(node=Node(text="text"), score=0.7),
  NodeWithScore(node=Node(text="text"), score=0.8)
]

# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)

In [10]:
filtered_nodes

[NodeWithScore(node=TextNode(id_='45c4ba0f-edac-4b2b-bcaa-ae9dd3d5210a', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='8a363c3e689f3fd0f4d742d70375383ef3906cc7a90e6f9ee402405a72de759c', text='text', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.8)]

As can be seen, node with score 0.7 is filtered out

# Response Synthesizer

A response synthesizer takes a list of nodes as input and generates output in the form of a response.

LlamaIndex supports different modes of response synthesis, including:

**Refine**

An iterative way of generating a response that uses all nodes. The initial answer is generated using the first node as context. This answer and the second node are then fed as context to a refinement prompt. This process continues until all nodes have been processed in the same way and the final response has been generated.

**Compact and Refine**

Same as refine, except the text chunks are merged into larger chunks to optimize cost and performance.

**Tree Summarize**

Tree summarization is a bottoms-up approach to constructing a response from a list of nodes. It involves generating a summary and parent node by combining every two nodes until only one answer remains.

**Simple Summarize**

Combines all text chunks and makes one LLM call using the whole text as context.

In [None]:
from llama_index import (
    VectorStoreIndex,
    get_response_synthesizer,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine

# build index
index = VectorStoreIndex.from_documents(docs)

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# query
response = query_engine.query("What did the author do growing up?")
print(response)
