docs: Add docs for `MultiRetriever` and `TextEmbeddingRetriever` by sjrl · Pull Request #11219 · deepset-ai/haystack

sjrl · 2026-04-29T08:03:16Z

Related Issues

follow up to feat: Add new components TextEmbeddingRetriever and MultiRetriever #10872

Proposed Changes:

Adds docs for MultiRetriever and TextEmbeddingRetriever

How did you test it?

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

vercel · 2026-04-29T08:03:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
haystack-docs	Ready	Preview, Comment	Apr 29, 2026 8:22am

claude · 2026-04-29T08:36:39Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

julian-risch

Looks good to me! The only thing I am wondering about is the deduplication, in particular the ranking. As a user, I would like to know more details about the deduplication. So adding something like "keeping the duplicate with the highest score if a score is present" to the docs could help. Related to that I thought that reciprocal rank fusion might actually make more sense here? For example, having a deduplicate_with_rff utility function in

haystack/haystack/utils/misc.py

Line 129 in 3cf91d9

def _deduplicate_documents(documents: list["Document"]) -> list["Document"]:

. WDYT?

julian-risch · 2026-04-29T09:09:38Z

Related to the above, I guess the ranking of results depends right now on which retriever in the ThreadPool returns results first? So not reproducible? Or are the results appended always in the order of how the retrievers were set in self.retrievers? https://github.com/deepset-ai/haystack/pull/10872/changes#diff-ff77361ca084c4e9e679a82219a5ad716408e40fb9d61d1549235dad1c6a2393R183

sjrl · 2026-04-29T09:15:31Z

Looks good to me! The only thing I am wondering about is the deduplication, in particular the ranking. As a user, I would like to know more details about the deduplication. So adding something like "keeping the duplicate with the highest score if a score is present" to the docs could help.

I think the comment about the highest score is misleading even if it's true. The scores coming from different retrievers can't be compared so I don't want to imply to the user that the score matters.

Related to that I thought that reciprocal rank fusion might actually make more sense here? For example, having a deduplicate_with_rff utility function in

That's fair it could be worth adding some built-in ranking methods like rrf. I'll see about extracting a utility function that can be reused as you say.

sjrl · 2026-04-29T09:19:15Z

Related to the above, I guess the ranking of results depends right now on which retriever in the ThreadPool returns results first? So not reproducible? Or are the results appended always in the order of how the retrievers were set in self.retrievers? https://github.com/deepset-ai/haystack/pull/10872/changes#diff-ff77361ca084c4e9e679a82219a5ad716408e40fb9d61d1549235dad1c6a2393R183

That's true the order would depend on which retrievers finish first. I wasn't concerned about a rank or an order for this component since users should use Rankers afterwards if they want definitive ranking of their documents. I do agree adding something like RRF can make sense in this component since it's not possible to run that ranking after merging all results into a single list. We can have this on by default, but if off I think random order is fine since ranking by dict key order in retrievers doesn't feel that intuitive.

But for all other rankings I'd say we point users to our Rankers.

julian-risch · 2026-04-29T09:29:37Z

Okay, thanks for looking into this! Feel free to open an issue or directly a PR. I would argue rff should be the default result of this new component. Mainly because of reproducibility, easy to calculate and any other ranking doesn't have a better meaning.

julian-risch · 2026-04-29T09:30:07Z

I think the comment about the highest score is misleading even if it's true. The scores coming from different retrievers can't be compared so I don't want to imply to the user that the score matters.

👍

sjrl · 2026-04-29T09:36:31Z

Okay, thanks for looking into this! Feel free to open an issue or directly a PR. I would argue rff should be the default result of this new component. Mainly because of reproducibility, easy to calculate and any other ranking doesn't have a better meaning.

I agree setting it on by default makes sense.

Add docs for new components

63df219

sjrl self-assigned this Apr 29, 2026

vercel Bot deployed to Preview April 29, 2026 08:05 View deployment

updates

6ed3fc7

vercel Bot deployed to Preview April 29, 2026 08:22 View deployment

sjrl marked this pull request as ready for review April 29, 2026 08:31

sjrl requested a review from a team as a code owner April 29, 2026 08:31

sjrl requested review from anakin87 and julian-risch and removed request for a team April 29, 2026 08:31

julian-risch approved these changes Apr 29, 2026

View reviewed changes

sjrl merged commit f990a75 into main Apr 29, 2026
22 checks passed

sjrl deleted the multiretriever-docs branch April 29, 2026 09:39

sjrl mentioned this pull request Apr 29, 2026

feat: Add reciprocal rank fusion to MultiRetriever #11220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add docs for `MultiRetriever` and `TextEmbeddingRetriever`#11219

docs: Add docs for `MultiRetriever` and `TextEmbeddingRetriever`#11219
sjrl merged 2 commits intomainfrom
multiretriever-docs

sjrl commented Apr 29, 2026

Uh oh!

vercel Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

julian-risch left a comment

Uh oh!

julian-risch commented Apr 29, 2026 •

edited

Loading

Uh oh!

sjrl commented Apr 29, 2026 •

edited

Loading

Uh oh!

sjrl commented Apr 29, 2026

Uh oh!

julian-risch commented Apr 29, 2026

Uh oh!

julian-risch commented Apr 29, 2026

Uh oh!

sjrl commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sjrl commented Apr 29, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Apr 29, 2026

Code review

Uh oh!

julian-risch left a comment

Choose a reason for hiding this comment

Uh oh!

julian-risch commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjrl commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjrl commented Apr 29, 2026

Uh oh!

julian-risch commented Apr 29, 2026

Uh oh!

julian-risch commented Apr 29, 2026

Uh oh!

sjrl commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 29, 2026 •

edited

Loading

julian-risch commented Apr 29, 2026 •

edited

Loading

sjrl commented Apr 29, 2026 •

edited

Loading