You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could someone explain how the reranker calculates its scores? I'm observing scores such as 15.67 and 12.33, which are unexpected because I anticipated scores like 97 or 96. I use LangChain's get_relevant_docs to retrieve documents with relevance scores such as 98 and 96, and then send these snippets to the ColBERT reranker for reranking. The reranker then returns scores like 15.67 and 12.33. Should I consider a higher reranker score as indicative of a better snippet? Additionally, I would like to understand the technique used by the reranker to compute these scores.
The text was updated successfully, but these errors were encountered:
Hey! The scores are raw MaxSim scores, as used by ColBERT. MaxSim is well explained in this Vespa blog post about ColBERT, but it's basically the sum of token similarities between your queries and the most relevant document queries. The scores we return aren't normalised, so they aren't between 0 and 1 (or 0 and 100) as you might usually expect from retrieval services that provide normalised relevance scores.
As a general rule, you cannot compare scores outputted by different retrieval models, they only really make sense within the context of a single model. One model's 0.978 similarity might be another model's 0.784 (completely made-up numbers!), as they aren't absolute but relative scores: they're only useful in the context of comparing scores given to different documents by the same model.
Could someone explain how the reranker calculates its scores? I'm observing scores such as 15.67 and 12.33, which are unexpected because I anticipated scores like 97 or 96. I use LangChain's
get_relevant_docs
to retrieve documents with relevance scores such as 98 and 96, and then send these snippets to the ColBERT reranker for reranking. The reranker then returns scores like 15.67 and 12.33. Should I consider a higher reranker score as indicative of a better snippet? Additionally, I would like to understand the technique used by the reranker to compute these scores.The text was updated successfully, but these errors were encountered: