Describe the bug
The HookedTransformer tokenizer has the padding side set to "right" for Gemma 2 2b. However, the huggingface autotokenizer has the padding side set to "left." I'm not sure why these are inconsistent.
Code example
from transformer_lens import HookedTransformer
from transformers import AutoTokenizer
model = HookedTransformer.from_pretrained('google/gemma-2-2b')
tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b')
print(model.tokenizer.padding_side)
print(tokenizer.padding_side)
Output:
System Info
Linux system: installed using pip in a Python 3.10.12 virtualenv. Package versions are:
- transformer_lens: 2.9.0
- transformers: 4.46.1
Checklist
Describe the bug
The HookedTransformer tokenizer has the padding side set to "right" for Gemma 2 2b. However, the huggingface autotokenizer has the padding side set to "left." I'm not sure why these are inconsistent.
Code example
Output:
System Info
Linux system: installed using pip in a Python 3.10.12 virtualenv. Package versions are:
Checklist