Using Colbert purely as a re-ranker #6

hiranya911 · 2024-01-04T22:29:46Z

Is there a way to use this library only as a re-ranker? Something like:

results = RAG.rerank(question = '...', docs = [...])

The text was updated successfully, but these errors were encountered:

okhat · 2024-01-04T22:35:13Z

That would be so cool! I have some code for this, @bclavie, I can get it.

QQ for @hiranya911 , do you want the docs to be pre-encoded? Or supplied at query time?

santhnm2 · 2024-01-04T23:56:53Z

The search function in ColBERT accepts a pids argument which can be used to rank only the given documents.

hiranya911 · 2024-01-05T00:22:08Z

@okhat I think I want to pass the documents as raw text. Kind of similar to how the MsMarco cross encoder API is set up. But I'm sure passing the pre-encoded docs is valid use case too.

bclavie · 2024-01-06T10:33:27Z

Hey @hiranya911, this is definitely something that I'll be adding to the roadmap (@okhat, please do share the code you have 😄), thanks for the suggestion!

timothepearce · 2024-01-06T20:00:46Z

@bclavie When you're done, ping me here, and I'll PR weaviate/reranker-transformers to add RAGatouille there!

bclavie · 2024-01-06T21:40:58Z

Will do @timothepearce

This is (probably) the last feature I'll push before spending some time on important housekeeping (setting up CI, tests, better documentation, tutorials for training on a new language, etc...), but I'm hoping to have it out next week (in beta, just like everything else in RAGatouille at the moment 😄) !

bclavie · 2024-01-10T18:51:15Z

Hey @hiranya911 @timothepearce, closing this issue as it's now available in 0.0.4a1 #31 🥳

hiranya911 · 2024-01-29T23:08:34Z

This is working like a charm. Thanks for the quick turnaround 🙏

Couple of questions when you have a moment:

What are the scores returned by the rerank() API? Are they logits (log probabilities) or some other scaled values?
Are there any recommendations on the content length of documents passed into rerank()?

bclavie · 2024-01-29T23:30:45Z

What are the scores returned by the rerank() API? Are they logits (log probabilities) or some other scaled values?

This is a good question and could do with more explaining. It's non-normalised MaxSim scores, which is how ColBERT score documents: for each query token, compare cosine distance w/ all document tokens and keep the max score in memory, and the total score is the sum of all those individual scores. (a good, slightly longer explanation can be found here). This could be normalised to give a "relevance" estimate.

Are there any recommendations on the content length of documents passed into rerank()?

Anything up to your ColBERT's base model maximum length (for ColBERTv2, bert-base-uncased, so 512) is fine, but the longer the documents, the slower the process is. I think it's mostly about finding the sweet spot for you between doc length and efficiency constraints!

bclavie added the enhancement New feature or request label Jan 5, 2024

bclavie mentioned this issue Jan 6, 2024

README Indexing fails on two GPUs #17

Closed

bclavie added the ongoing Feature is currently being worked on label Jan 7, 2024

bclavie self-assigned this Jan 7, 2024

bclavie closed this as completed Jan 10, 2024

bclavie removed the ongoing Feature is currently being worked on label Jan 10, 2024

eyloncaplan mentioned this issue Apr 19, 2024

Faiss error with NCCL #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Colbert purely as a re-ranker #6

Using Colbert purely as a re-ranker #6

hiranya911 commented Jan 4, 2024

okhat commented Jan 4, 2024 •

edited

Loading

santhnm2 commented Jan 4, 2024

hiranya911 commented Jan 5, 2024

bclavie commented Jan 6, 2024

timothepearce commented Jan 6, 2024

bclavie commented Jan 6, 2024

bclavie commented Jan 10, 2024

hiranya911 commented Jan 29, 2024 •

edited

Loading

bclavie commented Jan 29, 2024

Using Colbert purely as a re-ranker #6

Using Colbert purely as a re-ranker #6

Comments

hiranya911 commented Jan 4, 2024

okhat commented Jan 4, 2024 • edited Loading

santhnm2 commented Jan 4, 2024

hiranya911 commented Jan 5, 2024

bclavie commented Jan 6, 2024

timothepearce commented Jan 6, 2024

bclavie commented Jan 6, 2024

bclavie commented Jan 10, 2024

hiranya911 commented Jan 29, 2024 • edited Loading

bclavie commented Jan 29, 2024

okhat commented Jan 4, 2024 •

edited

Loading

hiranya911 commented Jan 29, 2024 •

edited

Loading