# Using ColBERT in-memory: Index-Free Encodings & Search

Sometimes, building an index doesn't make sense. Maybe you're working with a really small dataset, or one that is really fleeting nature, and will only be relevant to the lifetime of your current instance. In these cases, it can be more efficient to skip all the time-consuming index optimisation, and keep your encodings in-memory to perform ColBERT's magical MaxSim on-the-fly. This doesn't scale very well, but can be very useful in certain settings.

In this quick example, we'll use the `RAGPretrainedModel` magic class to demonstrate how to **encode documents in-memory**, before **retrieving them with `search_encoded_docs`**.

First, as usual, let's load up a pre-trained ColBERT model:

In [1]:
from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

  from .autonotebook import tqdm as notebook_tqdm


[Jan 27, 16:56:39] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...




Now that our model is loaded, we can load and preprocess some data, as in the previous tutorials:

In [2]:
from ragatouille.utils import get_wikipedia_page
from ragatouille.data import CorpusProcessor

corpus_processor = CorpusProcessor()

documents = [get_wikipedia_page("Hayao Miyazaki"), get_wikipedia_page("Studio Ghibli"), get_wikipedia_page("Princess Mononoke"), get_wikipedia_page("Shrek")]
documents = corpus_processor.process_corpus(documents, chunk_size=200)

Our documents are now fully ready to be encoded! 

One important note: `encode()` itself will not split your documents, you must pre-process them yourself (using corpus_processor or your preferred chunking approach). However, `encode()` will dynamically set the maximum token length, calculated based on the token length distribution in your corpus, up to the maximum length supported by the model you're using.

Just like normal indexing, `encode()` also supports adding metadata to the encoded documents, which will be returned as part of query results:

In [3]:
RAG.encode([x['content'] for x in documents], document_metadatas=[{"about": "ghibli"} for _ in range(len(documents))])

Encoding 212 documents...


100%|██████████| 7/7 [00:20<00:00,  2.90s/it]

Shapes:
encodings: torch.Size([212, 256, 128])
doc_masks: torch.Size([212, 256])
Documents encoded!





In [4]:
RAG.search_encoded_docs(query = "What's Gihbli's famous policy?", k=3)

[{'content': 'The studio is also known for its strict "no-edits" policy in licensing their films abroad due to Nausicaä of the Valley of the Wind being heavily edited for the film\'s release in the United States as Warriors of the Wind.\n\n\n=== Independent era ===\nBetween 1999 and 2005, Studio Ghibli was a subsidiary brand of Tokuma Shoten; however, that partnership ended in April 2005, when Studio Ghibli was spun off from Tokuma Shoten and was re-established as an independent company with relocated headquarters.\nOn February 1, 2008, Toshio Suzuki stepped down from the position of Studio Ghibli president, which he had held since 2005, and Koji Hoshino (former president of Walt Disney Japan) took over. Suzuki said he wanted to improve films with his own hands as a producer, rather than demanding this from his employees.',
  'score': 15.333166122436523,
  'rank': 0,
  'result_index': 80,
  'document_metadata': {'about': 'ghibli'}},
 {'content': 'Saeko Himuro\'s novel Umi ga Kikoeru wa

And that's pretty much it for index-free encoding & querying!

But wait, what if your application needs to update dynamically, and accept new documents? Well, that's easy too! A `RAGPretrainedModel` will keep its encoded docs in-memory, and further `encode()` calls will add to it:

In [5]:
my_new_document = [
    "I'm a new document about the importance of Curry! I love curry, it's the best food! Do you like Curry too?",
    "I'm a second new document!"
]
RAG.encode(my_new_document, document_metadatas=[{"about": "new_document"} for _ in range(len(my_new_document))])
RAG.search_encoded_docs(query = "What's the best food?", k=1)

Encoding 2 documents...


100%|██████████| 1/1 [00:00<00:00, 10.43it/s]

Shapes:
encodings: torch.Size([2, 256, 128])
doc_masks: torch.Size([2, 256])
Documents encoded!





[{'content': "I'm a new document about the importance of Curry! I love curry, it's the best food! Do you like Curry too?",
  'score': 18.96149444580078,
  'rank': 0,
  'result_index': 212,
  'document_metadata': {'about': 'new_document'}}]

What if you want to keep your current `RAGPretrainedModel` loaded, but empty the in-memory encodings because the docs are expired and you need to encode new ones? You can do that easily too: just call `clear_encoded_docs()`. By default, this will wait for 10 seconds before deleting everything, but you can pass `force=True` to delete immediately:

In [6]:
RAG.clear_encoded_docs()

All in-memory encodings will be deleted in 10 seconds, interrupt now if you want to keep them!
...


And we can now encode new documents and query them, with no trace of the previous encodings:

In [7]:
RAG.encode(documents=["This a really good document about Ratatouille. Ratatouille is a French dish...",
                      "This is a document that is absolutely and utterly relevant to anything"])

Encoding 2 documents...


  0%|          | 0/1 [00:00<?, ?it/s]

100%|██████████| 1/1 [00:00<00:00,  4.49it/s]

Shapes:
encodings: torch.Size([2, 256, 128])
doc_masks: torch.Size([2, 256])
Documents encoded!





In [8]:
RAG.search_encoded_docs(query = "What do you know about dishes? Curry maybe?", k=1)

[{'content': 'This a really good document about Ratatouille. Ratatouille is a French dish...',
  'score': 8.764448165893555,
  'rank': 0,
  'result_index': 0}]

Here it is! No trace of our previous, very important document about curry, but we can enjoy some Ratatouille facts instead.