Skip to content

docs: Add docs for MultiRetriever and TextEmbeddingRetriever#11219

Merged
sjrl merged 2 commits intomainfrom
multiretriever-docs
Apr 29, 2026
Merged

docs: Add docs for MultiRetriever and TextEmbeddingRetriever#11219
sjrl merged 2 commits intomainfrom
multiretriever-docs

Conversation

@sjrl
Copy link
Copy Markdown
Contributor

@sjrl sjrl commented Apr 29, 2026

Related Issues

Proposed Changes:

Adds docs for MultiRetriever and TextEmbeddingRetriever

How did you test it?

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@sjrl sjrl self-assigned this Apr 29, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
haystack-docs Ready Ready Preview, Comment Apr 29, 2026 8:22am

Request Review

@sjrl sjrl marked this pull request as ready for review April 29, 2026 08:31
@sjrl sjrl requested a review from a team as a code owner April 29, 2026 08:31
@sjrl sjrl requested review from anakin87 and julian-risch and removed request for a team April 29, 2026 08:31
@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Copy link
Copy Markdown
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! The only thing I am wondering about is the deduplication, in particular the ranking. As a user, I would like to know more details about the deduplication. So adding something like "keeping the duplicate with the highest score if a score is present" to the docs could help. Related to that I thought that reciprocal rank fusion might actually make more sense here? For example, having a deduplicate_with_rff utility function in

def _deduplicate_documents(documents: list["Document"]) -> list["Document"]:
. WDYT?

@julian-risch
Copy link
Copy Markdown
Member

julian-risch commented Apr 29, 2026

Related to the above, I guess the ranking of results depends right now on which retriever in the ThreadPool returns results first? So not reproducible? Or are the results appended always in the order of how the retrievers were set in self.retrievers? https://github.com/deepset-ai/haystack/pull/10872/changes#diff-ff77361ca084c4e9e679a82219a5ad716408e40fb9d61d1549235dad1c6a2393R183

@sjrl
Copy link
Copy Markdown
Contributor Author

sjrl commented Apr 29, 2026

Looks good to me! The only thing I am wondering about is the deduplication, in particular the ranking. As a user, I would like to know more details about the deduplication. So adding something like "keeping the duplicate with the highest score if a score is present" to the docs could help.

I think the comment about the highest score is misleading even if it's true. The scores coming from different retrievers can't be compared so I don't want to imply to the user that the score matters.

Related to that I thought that reciprocal rank fusion might actually make more sense here? For example, having a deduplicate_with_rff utility function in

That's fair it could be worth adding some built-in ranking methods like rrf. I'll see about extracting a utility function that can be reused as you say.

@sjrl
Copy link
Copy Markdown
Contributor Author

sjrl commented Apr 29, 2026

Related to the above, I guess the ranking of results depends right now on which retriever in the ThreadPool returns results first? So not reproducible? Or are the results appended always in the order of how the retrievers were set in self.retrievers? https://github.com/deepset-ai/haystack/pull/10872/changes#diff-ff77361ca084c4e9e679a82219a5ad716408e40fb9d61d1549235dad1c6a2393R183

That's true the order would depend on which retrievers finish first. I wasn't concerned about a rank or an order for this component since users should use Rankers afterwards if they want definitive ranking of their documents. I do agree adding something like RRF can make sense in this component since it's not possible to run that ranking after merging all results into a single list. We can have this on by default, but if off I think random order is fine since ranking by dict key order in retrievers doesn't feel that intuitive.

But for all other rankings I'd say we point users to our Rankers.

@julian-risch
Copy link
Copy Markdown
Member

Okay, thanks for looking into this! Feel free to open an issue or directly a PR. I would argue rff should be the default result of this new component. Mainly because of reproducibility, easy to calculate and any other ranking doesn't have a better meaning.

@julian-risch
Copy link
Copy Markdown
Member

I think the comment about the highest score is misleading even if it's true. The scores coming from different retrievers can't be compared so I don't want to imply to the user that the score matters.

👍

@sjrl
Copy link
Copy Markdown
Contributor Author

sjrl commented Apr 29, 2026

Okay, thanks for looking into this! Feel free to open an issue or directly a PR. I would argue rff should be the default result of this new component. Mainly because of reproducibility, easy to calculate and any other ranking doesn't have a better meaning.

I agree setting it on by default makes sense.

@sjrl sjrl merged commit f990a75 into main Apr 29, 2026
22 checks passed
@sjrl sjrl deleted the multiretriever-docs branch April 29, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants