Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoQueryEncoder not using query prefixes (for E5-like and other models) #1812

Open
orionw opened this issue Mar 11, 2024 · 1 comment
Open

Comments

@orionw
Copy link

orionw commented Mar 11, 2024

Hi there!

Sorry, would make this a PR but I didn't have enough time to get around to it. Thanks again for adding prefixes so we could use models like E5. (issue #1720)

When using an E5-like model though, the code uses AutoQueryEncoder which ignores the prefix for the query. It's a pretty simple fix (a few lines). See https://github.com/castorini/pyserini/blob/master/pyserini/search/faiss/_searcher.py#L371

It should be like it is for some of the other classes (like how DkrrDprQueryEncoder handles it) if I understand correctly:

query = f'{self.prefix} {query}'

and

self.prefix = prefix

If I am misunderstanding which class E5 models go to, lmk, but I think they are supposed to go to AutoQueryEncoder.

@MXueguang
Copy link
Member

Hi @orionw , in pyserini.search.faiss, we are currently using AutoQueryEncoder from pyserini.encode, rather than pyserini.search.faiss._searcher. The AutoQueryEncoder in _searcher is deprecated and would be removed.

https://github.com/castorini/pyserini/blob/2bb342acc124c69ec4fe13ebc3be0bd5a5bf497c/pyserini/search/faiss/__main__.py#L27C61-L27C77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants