Skip to content

A new "Diversification" Type collector akin to FirstPassGroupingCollector #15190

@benwtrent

Description

@benwtrent

Description

FirstPassGroupingCollector is pretty awesome, but we are fairly restricted on the things that we are actually grouping by.

Search is evolving and the desire to diversify results (to feed to an LLM, or even just to show users), is getting more and more important.

I don't have a fully concrete idea, but it seems to me that Lucene should be able to support a "grouping" by some statistics or requirements of another field.

Two examples that come to mind:

  • Maximum marginal relevance
  • Diversification based on clusters of vectors (e.g. kmeans)

Both of these will be complicated in their own ways because the groupings end up being dynamic as more data is seen (instead of having a natural static upper limit based on cardinality).

But it seems generally useful for search.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions