# Auto Merging Retriever

In this notebook, we showcase our `AutoMergingRetriever`, which looks at a set of leaf nodes and recursively "merges" subsets of leaf nodes that reference a parent node beyond a given threshold. This allows us to consolidate potentially disparate, smaller contexts into a larger context that might help synthesis.

You can define this hierarchy yourself over a set of documents, or you can make use of our brand-new text parser: a HierarchicalNodeParser that takes in a candidate set of documents and outputs an entire hierarchy of nodes, from "coarse-to-fine".

## Load Data

Let's first load the Llama 2 paper: https://arxiv.org/pdf/2307.09288.pdf. This will be our test data.

In [None]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

In [2]:
from pathlib import Path
from llama_hub.file.pdf.base import PDFReader

In [3]:
loader = PDFReader()
docs0 = loader.load_data(file=Path("./data/llama2.pdf"))

By default, the PDF reader creates a separate doc for each page.
For the sake of this notebook, we stitch docs together into one doc. 
This will help us better highlight auto-merging capabilities that "stitch" chunks together later on.

In [4]:
from llama_index import Document

doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

## Parse Chunk Hierarchy from Text, Load into Storage

In this section we make use of the `HierarchicalNodeParser`. This will output a hierarchy of nodes, from top-level nodes with bigger chunk sizes to child nodes with smaller chunk sizes, where each child node has a parent node with a bigger chunk size.

By default, the hierarchy is:
- 1st level: chunk size 2048
- 2nd level: chunk size 512
- 3rd level: chunk size 128


We then load these nodes into storage. The leaf nodes are indexed and retrieved via a vector store - these are the nodes that will first be directly retrieved via similarity search. The other nodes will be retrieved from a docstore.

In [5]:
from llama_index.node_parser import HierarchicalNodeParser, SimpleNodeParser

In [6]:
node_parser = HierarchicalNodeParser.from_defaults()

In [7]:
nodes = node_parser.get_nodes_from_documents(docs)

In [8]:
len(nodes)

999

Here we import a simple helper function for fetching "leaf" nodes within a node list. 
These are nodes that don't have children of their own.

In [9]:
from llama_index.node_parser import get_leaf_nodes

In [10]:
leaf_nodes = get_leaf_nodes(nodes)

In [11]:
len(leaf_nodes)

783

### Load into Storage

We define a docstore, which we load all nodes into. 

We then define a `VectorStoreIndex` containing just the leaf-level nodes.

In [12]:
# define storage context
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage import StorageContext

docstore = SimpleDocumentStore()

# insert nodes into docstore
docstore.add_documents(nodes)

# define storage context (will include vector store by default too)
storage_context = StorageContext.from_defaults(docstore=docstore)

In [13]:
## Load index into vector index
from llama_index import VectorStoreIndex

base_index = VectorStoreIndex(leaf_nodes, storage_context=storage_context)

## Define Retriever

In [14]:
from llama_index.retrievers.auto_merging_retriever import AutoMergingRetriever

In [15]:
base_retriever = base_index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

In [None]:
# query_str = "What were some lessons learned from red-teaming?"
query_str = "Can you tell me about the key concepts for safety finetuning"

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

In [17]:
len(nodes)

5

In [None]:
from llama_index.response.notebook_utils import display_source_node

for node in nodes:
    display_source_node(node, source_length=10000)

In [None]:
for node in base_nodes:
    display_source_node(node, source_length=10000)

## Plug it into Query Engine

In [20]:
from llama_index.query_engine import RetrieverQueryEngine

In [21]:
query_engine = RetrieverQueryEngine.from_args(retriever)
base_query_engine = RetrieverQueryEngine.from_args(base_retriever)

In [None]:
response = query_engine.query(query_str)

In [23]:
print(str(response))

The key concepts for safety fine-tuning include supervised safety fine-tuning and safety RLHF (Reinforcement Learning with Human Feedback). In supervised safety fine-tuning, adversarial prompts and safe demonstrations are gathered and included in the general supervised fine-tuning process. This helps the model align with safety guidelines even before RLHF. Safety RLHF involves collecting human preference data by having annotators write prompts that they believe can elicit unsafe behavior. Multiple model responses are compared, and the safest response is selected according to a set of guidelines. This data is then used to train a safety reward model. These concepts are aimed at mitigating safety risks and improving the safety of language models.


In [24]:
base_response = base_query_engine.query(query_str)

In [25]:
print(str(base_response))

The key concepts for safety fine-tuning include supervised safety fine-tuning and safety RLHF (Reinforcement Learning with Human Feedback). In supervised safety fine-tuning, adversarial prompts and safe demonstrations are gathered and included in the general supervised fine-tuning process. This helps the model align with safety guidelines even before RLHF. Safety RLHF involves observing and generalizing from safe demonstrations in supervised fine-tuning. The model learns to write detailed safe responses, address safety concerns, explain sensitive topics, and provide additional helpful information. These concepts aim to mitigate safety risks and improve the safety of language model models.
