Conversation
|
Found an issue with some discussion on how they implement tokenizers in transformers. huggingface/transformers#31375 Zooming out, inheritance pattern extends into model definitions as well... explains how some optimum model PRs can be so small- huggingface/optimum-intel#1401 I don't think it's feasible to scope out model requirements and build them into codebase before we have support targets using your pretrainedtokenizer approach. Instead I think we should just add models as we go, and take care of flexibility once it becomes a problem if it does. Notes in this review will reflect this in review. Maybe using Tokenizers rust bindings directly could be a good approach? https://github.com/huggingface/tokenizers/tree/main/bindings/python/py_src/tokenizers Please open a discussion thread on this and give some motivation for your approach. In the meantime, AutoTokenizers takes many arguments that are overrides, maybe explore these |
|
@mwrothbe Ok, to get this merged before release target later this week please
Great work on a cool feature, appreciate your time!! If you won't be able to make changes this week lmk and I can make them so we can push 2.0 with rerank |
|
Just looking at this now. Trying to figure out how to undo the cli commit.... |
|
Alright, well, I couldn't figure out how to undo the cli commit, so I just added a new commit to my fork that reverted the original CLI code. Looking at the PR code changes, that seem to have the same effect. Hope this works OK for you. If not, let me know. |
copy and pasted cli from main 1.0.6 into fork. essentially I've used a screwdriver as a chisel
|
ok, I'm making some quick changes after resolving conflicts |
|
@mwrothbe what is |
Ah. It's an optional input as part of the prompt instruction given to the model. The 'task' is handed to the format_instruction function in optimum_rr.py line 37 as 'instruction'. ...probably should have kept the naming consistent there, but it made sense to me at the time to have an interface name more descriptive (even if it's not.) The default is "Given a search query, retrieve relevant passages that answer the query" and the idea with making this an input is you might want to tweak the command for performance or model dependencies. |
Adds '/v1/rerank' reranking service. API can be used for RAG flows to refine document retrieval ahead of LLM. Tested with Qwen3-Reranker but should support other models that can be used with optimum. A few things to note: