Description
FirstPassGroupingCollector is pretty awesome, but we are fairly restricted on the things that we are actually grouping by.
Search is evolving and the desire to diversify results (to feed to an LLM, or even just to show users), is getting more and more important.
I don't have a fully concrete idea, but it seems to me that Lucene should be able to support a "grouping" by some statistics or requirements of another field.
Two examples that come to mind:
- Maximum marginal relevance
- Diversification based on clusters of vectors (e.g. kmeans)
Both of these will be complicated in their own ways because the groupings end up being dynamic as more data is seen (instead of having a natural static upper limit based on cardinality).
But it seems generally useful for search.