# How do we get +15% RAG hit_rate improvement for question answering on documentation?

- On most scenarios OpenAI's Ada model paired with a naive similarity search can produce satisfactory results.
- and the primary factors to consider when implementing RAGs in production settings are accuracy (recall), cost, and latency.
- For higher accuracy or recall during searches, one might need to employ advanced retrieval techniques. These methods might involve varying data chunk sizes, rewriting queries multiple times, and more, potentially increasing latency and costs.
- Activeloop's deep memory addresses the LLM's pipeline issues and introuduce a tiny neural network layer to trained to match user queries. This can increase accuracy upto 27% and remains cost effective.

## Load env variables

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Create dataset

In [1]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

def get_all_links(url):
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Failed to retrieve the page: {url}")
        return []

    soup = BeautifulSoup(response.content, "html.parser")

    # Finding all 'a' tags which typically contain href attribute for links
    links = [
        urljoin(url, a["href"])
        for a in soup.find_all("a", href=True)
        if a["href"]
    ]

    return links

In [4]:
from langchain.document_loaders import AsyncHtmlLoader
from langchain.document_transformers import Html2TextTransformer
from llama_index.core import Document


def load_documents(url):
    all_links = get_all_links(url)
    loader = AsyncHtmlLoader(all_links)
    docs = loader.load()

    html2text = Html2TextTransformer()
    docs_transformed = html2text.transform_documents(docs)
    docs = [Document.from_langchain_format(doc) for doc in docs_transformed]
    return docs

docs = load_documents("https://docs.deeplake.ai/en/latest/")

Fetching pages:  94%|#########4| 116/123 [00:15<00:01,  6.69it/s]Failed to decode content from https://docs.deeplake.ai/_/downloads/en/latest/pdf/
Failed to decode content from https://docs.deeplake.ai/_/downloads/en/latest/epub/
Fetching pages: 100%|##########| 123/123 [00:19<00:00,  6.26it/s]


In [6]:
print(docs[0].text)

latest

Getting Started

  * Installation

Key Concepts

  * Datasets
  * Vector Store
  * Tensors
  * Htypes
  * Compressions
  * PyTorch and Tensorflow Support
  * Utility Functions

Integrations

  * Weights and Biases
  * MMDetection

High-Performance Features

  * Dataloader
  * Sampler
  * Tensor Query Language
  * Random Split
  * Deep Memory

API Reference

  * deeplake
  * deeplake.VectorStore
  * deeplake.core
  * deeplake.core.dataset
  * deeplake.core.tensor
  * deeplake.api
  * deeplake.auto
  * deeplake.util
  * deeplake.client.log
  * deeplake.core.transform
  * deeplake.core.vectorstore.deep_memory
  * deeplake.random.seed

__Deep Lake

  * »
  * Deep Lake API Reference
  * Edit on GitHub

* * *

# Deep Lake API Reference

Deep Lake is an open-source database for AI.

Getting Started

  * Installation

Key Concepts

  * Datasets
    * Creating Datasets
    * Loading Datasets
    * Deleting and Renaming Datasets
    * Copying Datasets
    * Dataset Operations
    * Data

In [7]:
len(docs)

123

In [5]:
from llama_index.core.evaluation import generate_question_context_pairs
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.vector_stores.deeplake import DeepLakeVectorStore
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.llms.openai import OpenAI

token = os.getenv("ACTIVELOOP_TOKEN")
username = "akshatunsubscribe"

vector_store = DeepLakeVectorStore(
    dataset_path= f"hub://{username}/deeplake_docs_deepmemory2",
    overwrite=False,  # set to True to overwrite the existing dataset
    runtime={"tensor_db": True},
    token=token,
)

Deep Lake Dataset in hub://akshatunsubscribe/deeplake_docs_deepmemory2 already exists, loading from the storage


In [7]:
def create_modules(vector_store, docs=[], populate_vector_store=True):
    if populate_vector_store:
        node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
        nodes = node_parser.get_nodes_from_documents(docs)
    else:
        nodes = []

    # by default, the node ids are set to random uuids. To ensure same id's per run, we manually set them.
    for idx, node in enumerate(nodes):
        node.id_ = f"node_{idx}"

    llm = OpenAI(model="gpt-3.5-turbo")
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    return storage_context, nodes, llm

In [9]:
(
    storage_context,
    nodes,
    llm,
) = create_modules(
    vector_store=vector_store,
    docs=docs,
    # populate_vector_store=False, # uncomment this line to skip populating the vector store
)

In [10]:
vector_index = VectorStoreIndex(nodes, storage_context= storage_context)
deep_memory_retriever = vector_index.as_retriever(
    similarity_top_k= 4,
    deep_memory = True
)

## Training deep memory

To train the deep memory model we need to create the following things:

- **question**: is a text of strings, where each string represents a query.
- **relevance**: Contains links to the ground truth for each question. There might be several docs that contain an answer to the given question. Because of this, relevance is `List[List[tuple[str, float]]]`. All the list present in the first hierarchy contains their corresponding query data. The 2nd list (list of tuples) contains a str, float pair where the string represents the id of the source doc while the float contains how much the current document is related to the question.

In [11]:
NO_OF_SAMPLES = 600
TRAIN_QA_DATASET_PATH = f"./data/deeplake_docs_{NO_OF_SAMPLES}_train.json"
TEST_QA_DATASET_PATH = f"./data/deeplake_docs_{NO_OF_SAMPLES}_train.json"

In [12]:
from llama_index.core.evaluation import (
    generate_question_context_pairs,
    EmbeddingQAFinetuneDataset
)
import random

def create_train_test_datasets(
        num_of_samples= 600,
        llm= None,
        nodes= None,
        save= False
):
    random_indices = random.sample(range(len(nodes)), num_of_samples)
    
    # ratio of train=80% and test=20%
    ratio = int(len(random_indices) * 0.8)

    # random indices for train/test
    train_indices, test_indices = random_indices[:ratio], random_indices[ratio:]

    # sample random nodes for train/test
    train_nodes= [nodes[i] for i in train_indices]
    test_nodes= [nodes[i] for i in test_indices]

    # generate train question
    train_qa_dataset = generate_question_context_pairs(
        train_nodes,
        llm= llm,
        num_questions_per_chunk=1
    )

    # generate test question
    test_qa_dataset = generate_question_context_pairs(
        test_nodes,
        llm= llm,
        num_questions_per_chunk=1
    )

    if save:
        train_qa_dataset.save_json(TRAIN_QA_DATASET_PATH)
        test_qa_dataset.save_json(TEST_QA_DATASET_PATH)        

    return train_qa_dataset, test_qa_dataset

In [13]:
if not os.path.exists(TRAIN_QA_DATASET_PATH) or not os.path.exists(TEST_QA_DATASET_PATH):
    train_qa_dataset, test_qa_dataset = create_train_test_datasets(nodes= nodes, llm= llm, save=True)
else:
    train_qa_dataset = EmbeddingQAFinetuneDataset.from_json(TRAIN_QA_DATASET_PATH)
    test_qa_dataset  = EmbeddingQAFinetuneDataset.from_json(TEST_QA_DATASET_PATH)  

In [14]:
def create_query_relevance(qa_dataset: EmbeddingQAFinetuneDataset):
    """Function for converting llama-index dataset to correct format for deep memory training"""
    
    # extract the queries from the dataset
    queries = [text for _, text in qa_dataset.queries.items()]

    # extract the relevant docs from the dataset
    relevance = [ [(qa_dataset.relevant_docs[doc][0], 1)] for doc in qa_dataset.relevant_docs]

    return queries, relevance

In [15]:
train_queries, train_relevance = create_query_relevance(train_qa_dataset)
test_queries, test_relevance = create_query_relevance(test_qa_dataset)

In [9]:
print(train_queries[:3])
print("=" * 40)
print(train_relevance[:3])


['How does the `create_shape_tensor` parameter impact the reading of sample shapes in a dataset?', 'What are the exceptions to the rule that no data is loaded until a sample is read from a dataset?', 'How can credentials be added to a dataset for authentication purposes?']
[[('node_683', 1)], [('node_683', 1)], [('node_683', 1)]]


In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

job_id= vector_store._vectorstore.deep_memory.train(
    queries= train_queries,
    relevance= train_relevance,
    embedding_function= embeddings.embed_documents
)

In [16]:
job_id = "66014b4db56bbac8b7a0161a"
vector_store._vectorstore.deep_memory.status(job_id)

This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/akshatunsubscribe/deeplake_docs_deepmemory2
--------------------------------------------------------------
|                  66014b4db56bbac8b7a0161a                  |
--------------------------------------------------------------
| status                     | completed                     |
--------------------------------------------------------------
| progress                   | eta: 78.1 seconds             |
|                            | recall@10: 64.5% (+30.6%)     |
--------------------------------------------------------------
| results                    | recall@10: 64.5% (+30.6%)     |
--------------------------------------------------------------




# Evaluation

## Recall eval

In [16]:
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

recalls = vector_store._vectorstore.deep_memory.evaluate(
    queries= test_queries,
    relevance= test_relevance,
    embedding_function= embeddings.embed_documents,
)

  warn_deprecated(


Embedding queries took 33.28 seconds
---- Evaluating without Deep Memory ---- 
Recall@1:	  7.5%
Recall@3:	  17.7%
Recall@5:	  24.3%
Recall@10:	  34.1%
Recall@50:	  69.6%
Recall@100:	  83.2%
---- Evaluating with Deep Memory ---- 
Recall@1:	  44.8%
Recall@3:	  77.2%
Recall@5:	  83.7%
Recall@10:	  91.2%
Recall@50:	  98.9%
Recall@100:	  99.5%


Prv results with same details

```Embedding queries took 5.03 seconds
---- Evaluating without Deep Memory ---- 
Recall@1:	  4.9%
Recall@3:	  14.5%
Recall@5:	  20.7%
Recall@10:	  32.2%
Recall@50:	  70.5%
Recall@100:	  83.7%
---- Evaluating with Deep Memory ---- 
Recall@1:	  6.4%
Recall@3:	  21.8%
Recall@5:	  28.7%
Recall@10:	  46.3%
Recall@50:	  90.7%
Recall@100:	  95.1%```

**Observation**:

- The above results show that how deep memory significantly increases the recall.
- The resutls may be different for different LLM's. 
- Here we are using `gpt-3.5-turbo` and `gpt-4` can be used to increase the recall. 

## MRR & Hit Rate

In [17]:
import pandas as pd

def display_results(eval_results):
    """Display results from evaluate"""

    hit_rates = []
    mrrs = []
    names = []

    for name, eval_result in eval_results.items():

        metric_dicts = [er.metric_vals_dict for er in eval_result]

        full_df = pd.DataFrame(metric_dicts)

        hit_rate = full_df['hit_rate'].mean()
        mrr = full_df['mrr'].mean()

        hit_rates.append(hit_rate)
        mrrs.append(mrr)
        names.append(name)

    metric_df = pd.DataFrame(
        [
            {
                "retrievers": names[i],
                "hit_rate": hit_rates[i],
                "mrr": mrrs[i] 
            } for i in range(len(names))
        ]
    )

    return metric_df

### Deep memory retriever eval

In [None]:
from llama_index.core.evaluation import RetrieverEvaluator

deep_memory_retriver = vector_index.as_retriever(
    similarity_top_k= 10,
    vector_store_kwargs={"deep_memory" : True}
)

dm_retriver_evaluator= RetrieverEvaluator.from_metric_names(["mrr", "hit_rate"], retriver= deep_memory_retriver)
dm_eval_results= await dm_retriver_evaluator.aevaluate_dataset(test_qa_dataset, retriver= dm_retriver_evaluator, show_progress=True, workers=8) 

AttributeError: 'VectorStoreIndex' object has no attribute 'as_retriver'

### Naive retriever eval

In [21]:
from llama_index.core.evaluation import RetrieverEvaluator

naive_retriever= vector_index.as_retriever(similarity_top_k= 10)
naive_retriever_evaluator= RetrieverEvaluator.from_metric_names(['mrr', 'hit_rate'], retriever= naive_retriever)

naive_eval_results= await naive_retriever_evaluator.aevaluate_dataset(test_qa_dataset, retriever= naive_retriever, show_progress=True, workers=8) 

  0%|          | 0/2639 [00:00<?, ?it/s]

 10%|█         | 274/2639 [30:43<3:56:31,  6.00s/it]

CancelledError: 

In [None]:
eval_results = {
    f"{mode} with Deep Memory top-10 eval": eval_result
    for mode, eval_result in zip(
        ["with", "without"], [dm_eval_results, naive_eval_results]
    )
}

display_results(eval_results)


# Inference

In [None]:
query_engine= vector_index.as_query_engine(
    vector_store_kwargs= {"deep_memory": True}, llm= llm
)

response = query_engine.query("How can you connect your own storage to the deeplake?")
print(response)

In [None]:
query_engine = vector_index.as_query_engine(
    vector_store_kwargs={"deep_memory" : False}, llm= llm
)

response = query_engine.query("How can you connect your own storage to the deeplake?")
print(response)