## Auto Merging Retrieval

In this notebook, we showcase our AutoMergingRetriever, which looks at a set of leaf nodes and recursively "merges" subsets of leaf nodes that reference a parent node beyond a given threshold. This allows us to consolidate potentially disparate, smaller contexts into a larger context that might help synthesis.

You can define this hierarchy yourself over a set of documents, or you can make use of our brand-new text parser: a HierarchicalNodeParser that takes in a candidate set of documents and outputs an entire hierarchy of nodes, from "coarse-to-fine".

In [1]:
# !pip install llama-index
# !pip install llama-index-vector-stores-qdrant llama-index-readers-file llama-index-embeddings-fastembed llama-index-llms-openai
# !pip install -U qdrant_client fastembed
# !pip install python-dotenv
# !pip install ragas
# !pip install trulens_eval

In [6]:
import logging
import sys
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display

# qdrant official client
import qdrant_client

# LLama-index dependencies
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import SimpleDirectoryReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings

# setting the embedding model to BAAI/bge-base-en-v1.5 and FastEmbed to inference these models
# Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
# embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5" , max_length=1024)

# load all environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
QDRANT_CLOUD_ENDPOINT = os.getenv("QDRANT_CLOUD_ENDPOINT")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

# Optional
EVAL_DB_URL = os.getenv("EVAL_DB_URL")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

In [7]:
# lets loading the documents using SimpleDirectoryReader
from llama_index.core import Document
reader = SimpleDirectoryReader("./data/69_markdown_test/" , recursive=True)
documents = reader.load_data(show_progress=True)

# combining all the documents into a single document for later chunking and splitting
documents = Document(text="\n\n".join([doc.text for doc in documents]))

Loading files: 100%|██████████| 1/1 [00:00<00:00, 221.35file/s]
Loading files: 100%|██████████| 1/1 [00:00<00:00, 221.35file/s]


## Setting up Vector Database

We will be using qDrant as the Vector database
There are 4 ways to initialize qdrant 

1. Inmemory
```python
client = qdrant_client.QdrantClient(location=":memory:")
```
2. Disk
```python
client = qdrant_client.QdrantClient(path="./data")
```
3. Self hosted or Docker
```python

client = qdrant_client.QdrantClient(
    # url="http://<host>:<port>"
    host="localhost",port=6333
)
```

4. Qdrant cloud
```python
client = qdrant_client.QdrantClient(
    url=QDRANT_CLOUD_ENDPOINT,
    api_key=QDRANT_API_KEY,
)
```

for this notebook we will be using qdrant cloud

In [8]:
# creating a qdrant client instance

client = qdrant_client.QdrantClient(
    # you can use :memory: mode for fast and light-weight experiments,
    # it does not require to have Qdrant deployed anywhere
    # but requires qdrant-client >= 1.1.1
    # location=":memory:"
    # otherwise set Qdrant instance address with:
    url=QDRANT_CLOUD_ENDPOINT,
    # otherwise set Qdrant instance with host and port:
    # host="localhost",
    # port=6333
    # set API KEY for Qdrant Cloud
    api_key=QDRANT_API_KEY,
    # path="./db/"
)

vector_store = QdrantVectorStore(client=client, collection_name="4_Auto_merging_retrieval_RAG")

In [25]:
from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes


chunk_sizes = [2048, 512, 128]
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
nodes = node_parser.get_nodes_from_documents([documents])
leaf_nodes = get_leaf_nodes(nodes)

In [26]:
# define storage context
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core import StorageContext
docstore = SimpleDocumentStore()

# insert nodes into docstore
docstore.add_documents(nodes)

# define storage context (will include vector store by default too)
storage_context = StorageContext.from_defaults(docstore=docstore)


base_index = VectorStoreIndex(
    leaf_nodes,
    storage_context=storage_context,
)

In [11]:
from llama_index.core.retrievers import AutoMergingRetriever

In [12]:
base_retriever = base_index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

In [17]:
# query_str = "What were some lessons learned from red-teaming?"
# query_str = "Can you tell me about the key concepts for safety finetuning"
query_str = (
    "DALVANCE- than comparator-treated subjects with normal baseline transaminase"
)

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

## Modify Prompts

In [7]:
qa_prompt_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)

refine_prompt_str = (
    "We have the opportunity to refine the original answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question: {query_str}. "
    "If the context isn't useful, output the original answer again.\n"
    "Original Answer: {existing_answer}"
)

from llama_index.core import ChatPromptTemplate

# Text QA Prompt
chat_text_qa_msgs = [
    ("system","You are a AI assistant who is well versed with medical information and only answer question per training to the medical domain"),
    ("user", qa_prompt_str),
]
text_qa_template = ChatPromptTemplate.from_messages(chat_text_qa_msgs)

# Refine Prompt
chat_refine_msgs = [
    ("system","Always answer the question, even if the context isn't helpful.",),
    ("user", refine_prompt_str),
]
refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)

### Final RAG application

In [27]:
from llama_index.core.query_engine import RetrieverQueryEngine

In [28]:
query_engine = RetrieverQueryEngine.from_args(retriever)
base_query_engine = RetrieverQueryEngine.from_args(base_retriever)

In [30]:
response = query_engine.query("How does one get asthma")
display(Markdown(str(response)))

Bronchospasm, a condition characterized by the constriction of the muscles in the airways, can be a symptom of asthma.

## Performing Evaluation using RAGAS

### Creating Synthetic Test Set

In [None]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# documents = load your documents

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader

# the document passes here is from the 2nd cell
testset = generator.generate_with_llamaindex_docs([documents], 10, distributions) 
testset = testset.to_pandas()

In [32]:
# save it to use with other RAG techniques as well
# testset.to_csv('eval_data.csv', index=False)
import pandas as pd
testset = pd.read_csv('./eval_data.csv')

## Run Evaluation using Truelens

In [None]:
from trulens_eval import Tru


# Eval DB URL will be a postgres url
if EVAL_DB_URL:
    print("Connecting to postgres database")
    tru = Tru(database_url=EVAL_DB_URL)
else:
    print("Connecting to local sqlite database")
    tru = Tru()
    
# # incase you want to resent the database    
# tru.reset_database()

In [34]:
from trulens_eval.feedback.provider import OpenAI
from trulens_eval import Feedback
import numpy as np

# Initialize provider class
provider = OpenAI()

# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(query_engine)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()) # collect context chunks into a list
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance)
    .on_input_output()
)
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

✅ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In context_relevance_with_cot_reasons, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In context_relevance_with_cot_reasons, input context will be set to __record__.app.query.rets.source_nodes[:].node.text .


In [35]:
from trulens_eval import TruLlama

tru_query_engine_recorder = TruLlama(query_engine,app_id="4_Auto_merging_retrieval_RAG",feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance])

In [None]:
eval_questions = testset['question'].to_list()

with tru_query_engine_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [37]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [None]:
records.head()

In [39]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.1.5:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

In [40]:
tru.stop_dashboard()