<a href="https://colab.research.google.com/github/brianMutea/Fine-tuning-RAG-with-DeepMemory/blob/main/Fine_tuning_vs_RAG_Activeloop_Deep_Memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning vs RAG: Activeloop Deep Memory

In [1]:
%%capture
!pip install -q llama-index==0.9.14.post3 openai==1.3.8 deeplake tiktoken

In [30]:
import os

os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ["ACTIVELOOP_TOKEN"] = "your_activeloop_token"

#### Download sample data

In [3]:
!mkdir -p './data/paul_graham/'

!curl 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -o 'data/paul_graham/paul_graham_essay.txt'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75042  100 75042    0     0   331k      0 --:--:-- --:--:-- --:--:--  331k


In [4]:
from llama_index.node_parser import SimpleNodeParser
from llama_index import SimpleDirectoryReader
from llama_index import VectorStoreIndex, ServiceContext, StorageContext
from llama_index.vector_stores import DeepLakeVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms import OpenAI
import deeplake

#### Create LlamaIndex nodes/ chunks

In [5]:
docs = SimpleDirectoryReader("./data/paul_graham").load_data()
node_parser = SimpleNodeParser.from_defaults(chunk_size = 512, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents(docs)


# By default, the node/chunks ids are set to random uuids.
# To ensure same id's per run, we manually set them.
for idx, node in enumerate(nodes):
  node.id_ = f"node_{idx}"

print(f"Number of Documents: {len(docs)}")
print(f"Number of nodes: {len(nodes)} with the current chunk size of {node_parser.chunk_size}")

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Number of Documents: 1
Number of nodes: 37 with the current chunk size of 512


### Create a local Deep Lake vector store

In [6]:
%%capture
!pip install --upgrade deeplake

In [7]:
 # create a DeepLakeVectorStore locally to store the vectors

dataset_path = "./data/paul_graham/deep_lake_db" # for local DeepLakeVectorStore storage
vector_store = DeepLakeVectorStore(dataset_path = dataset_path, ingestion_batch_size=1024, overwrite=True)

# define the LLM
llm = OpenAI(model="gpt-3.5-turbo-1106")
embed_model = OpenAIEmbedding() # embeddings


service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_index = VectorStoreIndex(nodes,
                                service_context=service_context,
                                storage_context=storage_context,
                                show_progress=True)




Generating embeddings:   0%|          | 0/37 [00:00<?, ?it/s]

Uploading data to deeplake dataset.


100%|██████████| 37/37 [00:00<00:00, 189.14it/s]


Dataset(path='./data/paul_graham/deep_lake_db', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
   text       text      (37, 1)      str     None   
 metadata     json      (37, 1)      str     None   
 embedding  embedding  (37, 1536)  float32   None   
    id        text      (37, 1)      str     None   




### Upload the local Vectore Store to Activeloop's platform and convert it into a managed database.

In [8]:
local = dataset_path
hub_path = f"hub://academiaarticles/optimize_RAG_paul_graham"
hub_managed_path = f"hub://academiaarticles/optimize_RAG_paul_graham_managed"

# upload the local vector store
deeplake.deepcopy(local, hub_path, overwrite=True)

# create a managed vector store with different name
ds = deeplake.deepcopy(hub_path,
                       hub_managed_path,
                       overwrite=True,
                       runtime={"tensor_db": True})

Copying dataset: 96%|█████████▋| 27/28 [00:31<00:01


This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/academiaarticles/optimize_RAG_paul_graham
Your Deep Lake dataset has been successfully created!


Copying dataset: 96%|█████████▋| 27/28 [00:41<00:01


This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/academiaarticles/optimize_RAG_paul_graham_managed
Your Deep Lake dataset has been successfully created!


In [9]:
ds.visualize()

HINT: Please forward the port - 47751 to your local machine, if you are running on the cloud.
 * Serving Flask app 'dataset_visualizer'
 * Debug mode: off


Instantiate a Vector Store with the managed dataset that we just created

In [20]:
db = DeepLakeVectorStore(dataset_path=hub_managed_path,
                         overwrite=False,
                         read_only=True,
                         runtime={"tensor_db": True})

Deep Lake Dataset in hub://academiaarticles/optimize_RAG_paul_graham_managed already exists, loading from the storage


Generate a dataset of Queries and Documents

Fetch `docs` and `ids` from the vector store

In [21]:
# Fetch dataset docs and ids
docs = db.vectorstore.dataset.text.data(fetch_chunks=True, aslist=True)['value']
ids = db.vectorstore.dataset.id.data(fetch_chunks=True, aslist=True)['value']
print(len(docs))

37


### Generating a synthetic training dataset.
We need labeled data (`query` and `document_id` pairs) to train a Deep Memory model. Sometimes, it can be difficult to get labeled data when you are starting from scratch. We will  generate queries/questions using gpt-3.5-turbo from our existing documents.

In [22]:
from openai import OpenAI
client = OpenAI()

def generate_question(text):
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=[
                {"role": "system", "content": "You are a world class expert for generating questions based on provided context. \
                        You make sure the question can be answered by the text."},
                {
                    "role": "user",
                    "content": text,
                },
            ],
        )
        return response.choices[0].message.content
    except:
        question_string = "No question generated"
        return question_string

In [23]:
import random
from tqdm import tqdm

def generate_queries(docs: list[str], ids: list[str], n: int):

    questions = []
    relevances = []
    pbar = tqdm(total=n)
    while len(questions) < n:
        # 1. randomly draw a piece of text and relevance id
        r = random.randint(0, len(docs)-1)
        text, label = docs[r], ids[r]

        # 2. generate queries and assign and relevance id
        generated_qs = [generate_question(text)]
        if generated_qs == ["No question generated"]:
            print("No question generated")
            continue

        questions.extend(generated_qs)
        relevances.extend([[(label, 1)] for _ in generated_qs])
        pbar.update(len(generated_qs))

    return questions[:n], relevances[:n]

Launch the query generation process with a desired size of 40 queries/questions

In [24]:
questions, relevances = generate_queries(docs, ids, n=40)
print(len(questions)) #40
print(questions[0])

100%|██████████| 40/40 [00:48<00:00,  1.21s/it]

40
What was the original idea for the startup, that the author and his colleague later realized was not a good one?





### Launch Deep Memory Training

In [15]:
%%capture
!pip install langchain -U langchain-openai

In [25]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

job_id = db.vectorstore.deep_memory.train(
    queries=questions,
    relevance = relevances,
    embedding_function=embeddings.embed_documents,
)

Starting DeepMemory training job
Your Deep Lake dataset has been successfully created!




Preparing training data for deepmemory:


Creating 40 embeddings in 1 batches of size 40:: 100%|██████████| 1/1 [00:04<00:00,  4.62s/it]


DeepMemory training job started. Job ID: 65c5c1a6b014de51622ae8e6


In [27]:
# During training you can check the status of the training run
db.vectorstore.deep_memory.status(job_id="65c5c1a6b014de51622ae8e6")

This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/academiaarticles/optimize_RAG_paul_graham_managed
--------------------------------------------------------------
|                  65c5c1a6b014de51622ae8e6                  |
--------------------------------------------------------------
| status                     | completed                     |
--------------------------------------------------------------
| progress                   | eta: 0.1 seconds              |
|                            | recall@10: 87.50% (+12.50%)   |
--------------------------------------------------------------
| results                    | recall@10: 87.50% (+12.50%)   |
--------------------------------------------------------------




### Run a Deep Memory-enabled inference by setting `deep_memory=True`.

In [28]:
from llama_index.llms import OpenAI
query = "What are the main things Paul worked on before college?"

llm = OpenAI(model="gpt-3.5-turbo-1106")
embed_model = OpenAIEmbedding()

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_index = VectorStoreIndex.from_vector_store(db,
                                                  service_context=service_context,
                                                  storage_context=storage_context,
                                                  show_progress=True)

query_engine = vector_index.as_query_engine(similarity_top_k=3, db_kwargs={"deep_memory": True})

response_vector = query_engine.query(query)
print(response_vector.response)

Paul worked on writing and programming before college. He wrote short stories and also tried writing programs on the IBM 1401 using an early version of Fortran.


In [29]:
# Generate validation queries
validation_questions, validation_relevances = generate_queries(docs, ids, n=40)

# Launch the evaluation function
recalls = db.vectorstore.deep_memory.evaluate(
    queries=validation_questions,
    relevance=validation_relevances,
    embedding_function=embeddings.embed_documents
)

100%|██████████| 40/40 [00:43<00:00,  1.08s/it]


Embedding queries took 1.83 seconds
---- Evaluating without Deep Memory ---- 
Recall@1:	  60.0%
Recall@3:	  92.5%
Recall@5:	  95.0%
Recall@10:	  100.0%
Recall@50:	  100.0%
Recall@100:	  100.0%
---- Evaluating with Deep Memory ---- 
Recall@1:	  42.5%
Recall@3:	  60.0%
Recall@5:	  75.0%
Recall@10:	  90.0%
Recall@50:	  100.0%
Recall@100:	  100.0%


A significant focus was on Activeloop's Deep Memory, which was integrated into RAG systems to enhance embedding retrieval accuracy. Deep Memory outperforms traditional methods like BM25 using lexical search and vector search using cosine similarity. We demonstrated it by getting higher recall values. It also efficiently reduces token usage in LLM prompts compared to query reformulation or transformation.