New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Cross Encoder Recommendation for RAG #2640

Open

yildize opened this issue May 12, 2024 · 0 comments

yildize commented May 12, 2024

In your documentation (https://www.sbert.net/docs/pretrained_cross-encoders.html), I see many different cross-encoders trained on different datasets like msmarco, squad, sts, nli, ...

What would be your suggestion for an asymmetric "retrieval augmented generation" pipeline (retrieving passages for given queries)?

Would it be "ms-marco cross encoders"? Why not STSbenchmark or SQuAD models?
What should I consider to choose my model?

Why not train a cross encoder on all (or at least multiple) of those datasets just like you did for "all" bi-encoders?

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment