You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast
Description:
I encountered a TypeError when using the AutoTokenizer with tokenizer_name after upgrading to Transformers version 4.42.0 and tokenizers version 0.19.1. The error message is as follows:
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'padding_side'
Before upgrading to Transformers version 4.42.0, this error did not occur.
The issue arises with tokenizers version 0.19.1 and later.
Setting use_fast=False when initializing the tokenizer resolves the error, indicating the issue is specific to the fast tokenizer implementation.
Request:
Please investigate this issue, as it seems to be a regression related to the handling of the padding_side argument in the fast tokenizer. Any guidance or fix would be appreciated.
Expected behavior
The AutoTokenizer should correctly handle the padding_side argument without raising a TypeError when using the fast tokenizer implementation. Specifically, when calling the tokenizer with parameters such as padding, truncation, and return_token_type_ids, it should process the input text as expected, regardless of the version of the tokenizers library being used. This behavior should be consistent across versions, ensuring backward compatibility and seamless upgrades.
The text was updated successfully, but these errors were encountered:
System Info
absl-py==2.1.0
accelerate==1.6.0
aiohappyeyeballs==2.3.5
aiohttp==3.10.2
aiosignal==1.3.1
aniso8601==9.0.1
annotated-types==0.7.0
anyio==4.4.0
async-timeout==4.0.3
attrs==24.2.0
blinker==1.8.2
certifi==2024.7.4
charset-normalizer==3.3.2
chinesebert==0.2.1
click==8.1.7
confluent-kafka==2.5.0
datasets==1.18.3
dill==0.3.8
elastic-transport==8.15.0
elasticsearch==8.14.0
exceptiongroup==1.2.2
fastapi==0.112.0
fastcore==1.3.29
filelock==3.15.4
Flask==3.0.3
Flask-Cors==4.0.1
Flask-RESTful==0.3.10
frozenlist==1.4.1
fsspec==2024.6.1
gevent==24.10.3
greenlet==3.1.1
grpcio==1.65.4
gunicorn==23.0.0
h11==0.14.0
huggingface-hub==0.24.5
idna==3.7
importlib_metadata==8.2.0
iocextract==1.16.1
itsdangerous==2.2.0
Jinja2==3.1.4
joblib==1.4.2
kazoo==2.5.0
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
pandas==2.2.2
protobuf==4.25.4
psutil==7.0.0
pyarrow==17.0.0
pydantic==2.8.2
pydantic_core==2.20.1
Pygments==2.18.0
pykafka==2.8.0
PyMySQL==1.1.1
pypinyin==0.38.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.2
regex==2024.7.24
requests==2.32.3
requests-file==2.1.0
rich==13.7.1
sacremoses==0.1.1
safetensors==0.5.3
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
sympy==1.13.1
tabulate==0.9.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
tldextract==5.1.2
tokenizers==0.19.1
torch==2.4.0
tqdm==4.66.5
transformers==4.42.0
triton==3.0.0
typer==0.12.3
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
uvicorn==0.30.5
Werkzeug==3.0.3
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.2
zope.event==5.0
zope.interface==7.1.1
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast
Description:
I encountered a
TypeError
when using theAutoTokenizer
withtokenizer_name
after upgrading to Transformers version 4.42.0 and tokenizers version 0.19.1. The error message is as follows:Details:
Code Snippet:
Observations:
tokenizers
version 0.19.1 and later.use_fast=False
when initializing the tokenizer resolves the error, indicating the issue is specific to the fast tokenizer implementation.Request:
Please investigate this issue, as it seems to be a regression related to the handling of the
padding_side
argument in the fast tokenizer. Any guidance or fix would be appreciated.Expected behavior
The AutoTokenizer should correctly handle the padding_side argument without raising a TypeError when using the fast tokenizer implementation. Specifically, when calling the tokenizer with parameters such as padding, truncation, and return_token_type_ids, it should process the input text as expected, regardless of the version of the tokenizers library being used. This behavior should be consistent across versions, ensuring backward compatibility and seamless upgrades.
The text was updated successfully, but these errors were encountered: