Skip to content

Added reranker api#36

Merged
SearchSavior merged 6 commits intoSearchSavior:1.0.6from
mwrothbe:1.0.6
Oct 22, 2025
Merged

Added reranker api#36
SearchSavior merged 6 commits intoSearchSavior:1.0.6from
mwrothbe:1.0.6

Conversation

@mwrothbe
Copy link
Contributor

@mwrothbe mwrothbe commented Oct 13, 2025

Adds '/v1/rerank' reranking service. API can be used for RAG flows to refine document retrieval ahead of LLM. Tested with Qwen3-Reranker but should support other models that can be used with optimum. A few things to note:

  • The API accepts a PreTrainedTokenizerConfig optional input. Initially I thought this would provide max flexibility for other models that might require different config options. As it turned out, the tokenization is done it two stages with config options differing, so the options are currently hard coded. If the hard coded options are universal for all models, we can remove the PreTrainedTokenizerConfig input, but I left it there for now in case it ends up being needed for more flexibility.
  • The API accepts optional 'prefix' and 'suffix' inputs that I believe are model specific instruction. The default strings (set by the models\optimum.py\RerankerConfig) are for the Qwen3-Reranker model. If you don't want this set as the default assumption, they could be set as empty strings and make the parameters required by user.
  • The 'task' parameter is also optional, but I think the default is pretty generic, and I would think would work with all models.
  • Collect metrics is not yet implemented
  • This PR includes a cli option to load all models in the config

@SearchSavior
Copy link
Owner

@mwrothbe

Found an issue with some discussion on how they implement tokenizers in transformers.

huggingface/transformers#31375

Zooming out, inheritance pattern extends into model definitions as well... explains how some optimum model PRs can be so small-

huggingface/optimum-intel#1401

I don't think it's feasible to scope out model requirements and build them into codebase before we have support targets using your pretrainedtokenizer approach.

Instead I think we should just add models as we go, and take care of flexibility once it becomes a problem if it does. Notes in this review will reflect this in review.

Maybe using Tokenizers rust bindings directly could be a good approach?

https://github.com/huggingface/tokenizers/tree/main/bindings/python/py_src/tokenizers

Please open a discussion thread on this and give some motivation for your approach. In the meantime, AutoTokenizers takes many arguments that are overrides, maybe explore these

@SearchSavior SearchSavior self-assigned this Oct 20, 2025
@SearchSavior
Copy link
Owner

@mwrothbe Ok, to get this merged before release target later this week please

  • use your ReRankConfig with AutoTokenizers for now. Maybe revise emb while you are in there. Target qwen embedding and rerank for simplicity, at least for now.
  • for ease, don't commit the cli file and merge should be clean.
  • came up with a solution for queuing model loads, will add notes to Auto load models at start #34 and merge tonight.

Great work on a cool feature, appreciate your time!! If you won't be able to make changes this week lmk and I can make them so we can push 2.0 with rerank

@mwrothbe
Copy link
Contributor Author

Just looking at this now. Trying to figure out how to undo the cli commit....

@mwrothbe
Copy link
Contributor Author

Alright, well, I couldn't figure out how to undo the cli commit, so I just added a new commit to my fork that reverted the original CLI code. Looking at the PR code changes, that seem to have the same effect. Hope this works OK for you. If not, let me know.

copy and pasted cli from main 1.0.6 into fork. essentially I've used a screwdriver as a chisel
@SearchSavior
Copy link
Owner

ok, I'm making some quick changes after resolving conflicts

@SearchSavior
Copy link
Owner

@mwrothbe what is --task for rerank?

@mwrothbe
Copy link
Contributor Author

@mwrothbe what is --task for rerank?

Ah. It's an optional input as part of the prompt instruction given to the model. The 'task' is handed to the format_instruction function in optimum_rr.py line 37 as 'instruction'. ...probably should have kept the naming consistent there, but it made sense to me at the time to have an interface name more descriptive (even if it's not.) The default is "Given a search query, retrieve relevant passages that answer the query" and the idea with making this an input is you might want to tweak the command for performance or model dependencies.

@SearchSavior SearchSavior mentioned this pull request Oct 22, 2025
SearchSavior added a commit that referenced this pull request Oct 22, 2025
@SearchSavior SearchSavior merged commit 0839688 into SearchSavior:1.0.6 Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants