-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ai-agents] Introduce the re-rank agent with MMR ranking #502
Conversation
Using this agent, is it possible to say query the vector store for 100 semantically similar results but only return the 10 most diverse from that larger set? |
I had the same thought, let me add this feature, and configure a "max" property |
comment is maybe a little late, but question I had looking at this: I haven't had time to dig into if there's a best practice of which similarity metrics to use for the relevance and diversity. However, I wonder if we want to allow the user to optionally choose between cosine and bm25. My thought process here: Given that our primary use case is genAI and most of our customers aren't doing those preprocessing steps, bm25 may not actually work as well as it should and they may be better off using cosine similarity for relevance. |
Currently for MMR we need both BM25 and cosine similarity. With this new agent you can retrieve more documents from the database and keep only a selection that is "diverse enough", but still "relevant" additional note: With LangStream it is pretty easy to pre-process the data before inserting the documents in the vector database and you can also apply the same preprocessing to normalise the "query" in your chat completion pipeline. Let's follow up on slack or maybe you can open a "Discussion" or a GH Ticket, this PR has been closed, so nobody will find it easily |
could you explain more about why we need both? I read the original MMR paper and it says you can use the same similarity function for both the relevancy and diversity. |
Cosine similarity is not enough to reduce reduntant documents. With IDF we can ensure that we are not passing duplicate content to the LLM, as tokens have a cost. |
Summary
Description about MMR
In order to perform MMR you need two functions, one to compute "diversity" and one to give a "relevance".
We are using BM25 + IDF in order to compute "relevance", and we use the average "cosine similarity" to compute "diversity".
The are a few parameters to tune the algorithms.
How do I use this feature ?
You need to provide both a "query" and a set of "documents", and for each of them you must also provide the "embeddings" (for the query and for each document).
This is easy in the standard chatbot pipeline because we compute the embeddings of the query before performing the vector search and when you perform the vector search you can get both the text and the embeddings stored on the database.
With the "max" parameter you can limit the number of documents, this is pretty useful in case you want to get many documents from the vector database but then you can use only fewer of them to build the prompt.
Parameters
Notes
you must pre-compute embeddings on the query and in all of the documents, but this is usually easy that you usually perform a vector search before this step, so you have to get both the text and the embeddings vector for each document.