chore: Check that we are compatible with transformers v5 #10587
chore: Check that we are compatible with transformers v5 #10587
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Pull Request Test Coverage Report for Build 22065664803Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
…to specify text-generation no matter what type of model is being used.
julian-risch
left a comment
There was a problem hiding this comment.
Looks quite good to me already! My main change request is to mention that the default model has been changed from "google/flan-t5-base" to "Qwen/Qwen3-0.6B". Other than that just two small questions and a simplification with walrus operator.
| huggingface_pipeline_kwargs["model"], token=huggingface_pipeline_kwargs["token"] | ||
| ).pipeline_tag # type: ignore[assignment] # we'll check below if task is in supported tasks | ||
|
|
||
| if task not in PIPELINE_SUPPORTED_TASKS: |
There was a problem hiding this comment.
We could add a check here for task == "text2text-generation" and trigger a DeprecationWarning. However, as the deprecation comes directly from huggingface transformers, it's not strictly necessary. Still could be helpful for Haystack users.
There was a problem hiding this comment.
Same for HuggingFaceLocalGenerator
There was a problem hiding this comment.
We'd also need to check for the transformers version technically since it's only deprecated for v5+. So I'm not entirely sure if it's worth the effort.
There was a problem hiding this comment.
I'll look into to see if there is an easy way to make this check
|
|
||
| self._model_name_or_path = model_name_or_path | ||
| self.tokenizer: AutoTokenizer | None = None | ||
| self.tokenizer: TokenizersBackend | SentencePieceBackend | None = None |
There was a problem hiding this comment.
Will instances of AutoTokenizer still work as expected or is there a breaking change that we should mention in the release notes?
There was a problem hiding this comment.
Hmm this is just a type change so nothing should break, but it's possible imports of these types would fail if they weren't present before. I'll double check locally.
There was a problem hiding this comment.
You're right the imports aren't present so I think I'll go with ignoring the type issues from mypy so we can support both versions
There was a problem hiding this comment.
Updated to make type Any so mypy is satisfied and works in both major versions of transformers
Related Issues
Proposed Changes:
hf_device_mapis not always present anymore. Seems to only be set when mixed device loading is actually configured. Updated code to check ifhf_device_mapis present and if not then we fall back to using the regulardeviceattribute.text2text-generationno longer exists as a valid HF pipeline anymore. It seems we should always usetext-generation.HuggingFaceLocalGeneratorto match changes made in the Chat variant. Needed to make some sort of change since the default task wastext2text-generationwhich fails when using transformers v5.text2text-generationwas removed as a valid HF Pipeline in this PR Delete Text2TextPipeline, SummarizationPipeline and TranslationPipeline huggingface/transformers#43256 even though it still exists in the code base. Just not as straightforward to use anymore so I'll list it as a deprecated option in our components if users are using transformers v5+How did you test it?
Notes for the reviewer
All of these changes are compatible with both transformers v4 and v5 (v4 tested locally). Not straightforward to test for both versions in our CI.
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.