Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast #37989

yunqianluo · 2025-05-07T07:31:05Z

System Info

absl-py==2.1.0
accelerate==1.6.0
aiohappyeyeballs==2.3.5
aiohttp==3.10.2
aiosignal==1.3.1
aniso8601==9.0.1
annotated-types==0.7.0
anyio==4.4.0
async-timeout==4.0.3
attrs==24.2.0
blinker==1.8.2
certifi==2024.7.4
charset-normalizer==3.3.2
chinesebert==0.2.1
click==8.1.7
confluent-kafka==2.5.0
datasets==1.18.3
dill==0.3.8
elastic-transport==8.15.0
elasticsearch==8.14.0
exceptiongroup==1.2.2
fastapi==0.112.0
fastcore==1.3.29
filelock==3.15.4
Flask==3.0.3
Flask-Cors==4.0.1
Flask-RESTful==0.3.10
frozenlist==1.4.1
fsspec==2024.6.1
gevent==24.10.3
greenlet==3.1.1
grpcio==1.65.4
gunicorn==23.0.0
h11==0.14.0
huggingface-hub==0.24.5
idna==3.7
importlib_metadata==8.2.0
iocextract==1.16.1
itsdangerous==2.2.0
Jinja2==3.1.4
joblib==1.4.2
kazoo==2.5.0
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
pandas==2.2.2
protobuf==4.25.4
psutil==7.0.0
pyarrow==17.0.0
pydantic==2.8.2
pydantic_core==2.20.1
Pygments==2.18.0
pykafka==2.8.0
PyMySQL==1.1.1
pypinyin==0.38.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.2
regex==2024.7.24
requests==2.32.3
requests-file==2.1.0
rich==13.7.1
sacremoses==0.1.1
safetensors==0.5.3
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
sympy==1.13.1
tabulate==0.9.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
tldextract==5.1.2
tokenizers==0.19.1
torch==2.4.0
tqdm==4.66.5
transformers==4.42.0
triton==3.0.0
typer==0.12.3
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
uvicorn==0.30.5
Werkzeug==3.0.3
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.2
zope.event==5.0
zope.interface==7.1.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast

Description:

I encountered a TypeError when using the AutoTokenizer with tokenizer_name after upgrading to Transformers version 4.42.0 and tokenizers version 0.19.1. The error message is as follows:

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'padding_side'

Details:

Code Snippet:

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

def tokenize_and_align_train_labels(examples):
    tokenized_inputs = tokenizer(
        examples[text_column_name],
        max_length=args.cutoff_len,
        padding=False,
        truncation=True,
        return_token_type_ids=False,
    )

Observations:
- Before upgrading to Transformers version 4.42.0, this error did not occur.
- The issue arises with tokenizers version 0.19.1 and later.
- Setting use_fast=False when initializing the tokenizer resolves the error, indicating the issue is specific to the fast tokenizer implementation.

Request:

Please investigate this issue, as it seems to be a regression related to the handling of the padding_side argument in the fast tokenizer. Any guidance or fix would be appreciated.

Expected behavior

The AutoTokenizer should correctly handle the padding_side argument without raising a TypeError when using the fast tokenizer implementation. Specifically, when calling the tokenizer with parameters such as padding, truncation, and return_token_type_ids, it should process the input text as expected, regardless of the version of the tokenizers library being used. This behavior should be consistent across versions, ensuring backward compatibility and seamless upgrades.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-05-07T12:31:12Z

Hi @yunqianluo, this might be specific to the tokenizer class you're using. Can you give us some code we can run that will show the issue?

cc @itazap

yunqianluo added the bug label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast #37989

Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast #37989

yunqianluo commented May 7, 2025

Rocketknight1 commented May 7, 2025

Uh oh!

Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast #37989

Bug Report: Unexpected Keyword Argument 'padding_side' in PreTrainedTokenizerFast #37989

Comments

yunqianluo commented May 7, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented May 7, 2025

Uh oh!