Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can anyone explain how exactly the reranker is calculating the score? #197

Open
nithinreddyyyyyy opened this issue Apr 12, 2024 · 2 comments
Labels
question Further information is requested

Comments

@nithinreddyyyyyy
Copy link

Could someone explain how the reranker calculates its scores? I'm observing scores such as 15.67 and 12.33, which are unexpected because I anticipated scores like 97 or 96. I use LangChain's get_relevant_docs to retrieve documents with relevance scores such as 98 and 96, and then send these snippets to the ColBERT reranker for reranking. The reranker then returns scores like 15.67 and 12.33. Should I consider a higher reranker score as indicative of a better snippet? Additionally, I would like to understand the technique used by the reranker to compute these scores.

@bclavie
Copy link
Owner

bclavie commented Apr 12, 2024

Hey! The scores are raw MaxSim scores, as used by ColBERT. MaxSim is well explained in this Vespa blog post about ColBERT, but it's basically the sum of token similarities between your queries and the most relevant document queries. The scores we return aren't normalised, so they aren't between 0 and 1 (or 0 and 100) as you might usually expect from retrieval services that provide normalised relevance scores.

As a general rule, you cannot compare scores outputted by different retrieval models, they only really make sense within the context of a single model. One model's 0.978 similarity might be another model's 0.784 (completely made-up numbers!), as they aren't absolute but relative scores: they're only useful in the context of comparing scores given to different documents by the same model.

@bclavie bclavie added the question Further information is requested label Apr 12, 2024
@levnikolaevich
Copy link

levnikolaevich commented Apr 19, 2024

@bclavie, good afternoon!

Could you please help me with the following...

  1. You have an example where colbert is used for re-ranking documents retrieved from another index.
    https://github.com/bclavie/RAGatouille/blob/main/examples/04-reranking.ipynb

  2. Here in the post, re-ranking is applied to documents from an index that were retrieved using Colbert itself.
    https://til.simonwillison.net/llms/colbert-ragatouille

Question: Does it make sense to re-rank documents that were found in an index created by RAGatouille itself, or is there no point in doing that?

Thank you very much in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants