Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: allow to use embedding models for the sentence reranking task #280

Open
massi-ang opened this issue Dec 19, 2023 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@massi-ang
Copy link
Collaborator

While cross encoders have shown better performance than using cosine similarity scores on sentence embeddings, there are no multilingual cross encoders, making this solution only viable for English. Experiments show that an English trained cross-encoder does not capture semantic meaning in languages other than English but only word similarity.

This feature suggests to add using embedding models and normalized vector similarity (like cosine similarity) between pairs of passages to score the semantic relevance of passages to queries.

@massi-ang massi-ang added the enhancement New feature or request label Dec 19, 2023
@ystoneman
Copy link

ystoneman commented Apr 15, 2024

Cohere's Rerank 3 is multilingual @massi-ang, supporting over 100 languages. It also has 4k context length, whereas cross-encoder/ms-marco-MiniLM-L-12-v2 has a max context length of 512.

This project already supports a wide range of external APIs as options for the text-text models, so why not also include the option for folks to include their Cohere API key and use Rerank 3?

I already did a quick comparison between Rerank 2 and marco MiniLM, and Rerank 2 ranked the results much better: with my query asking about image generation regulations, Rerank 2 put image-generated related responses at the top of the list, whereas marco's responses did not include anything about image generation.

I could not find an open-source model with as much language support as Cohere Rerank 3.

If I add support for Cohere Rerank 3, would that resolve this issue @massi-ang?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants