-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the issue as clearly as possible:
In #90 an error was fixed regarding the Salamandra and OpenCoder tokenizers. The same error has now returned, and I see that the fix from that PR is nowhere to be seen in the code base anymore. Was it replaced by something else?
Steps/code to reproduce the bug:
from outlines_core.fsm.regex import reduced_vocabulary
from outlines.models.vllm import adapt_tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BSC-LT/salamandra-2b")
tokenizer = adapt_tokenizer(tokenizer)
vocabulary = reduced_vocabulary(tokenizer)
Expected result:
No error message.
Error message:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/alex-admin/euroeval/.venv/bin/euroeval", line 8, in <module>
[rank0]: sys.exit(benchmark())
[rank0]: ^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
[rank0]: return self.main(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1082, in main
[rank0]: rv = self.invoke(ctx)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
[rank0]: return ctx.invoke(self.callback, **ctx.params)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 788, in invoke
[rank0]: return __callback(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/cli.py", line 277, in benchmark
[rank0]: benchmarker.benchmark(model=models)
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmarker.py", line 461, in benchmark
[rank0]: benchmark_output_or_err = self._benchmark_single(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmarker.py", line 768, in _benchmark_single
[rank0]: scores = generate(
[rank0]: ^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/generation.py", line 84, in generate
[rank0]: test_scores = generate_single_iteration(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/generation.py", line 163, in generate_single_iteration
[rank0]: model_output = model.generate(inputs=batch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmark_modules/vllm.py", line 361, in generate
[rank0]: logits_processor = JSONLogitsProcessor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/processors/structured.py", line 187, in __init__
[rank0]: super().__init__(regex_string=regex_string, tokenizer=tokenizer)
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/processors/structured.py", line 151, in __init__
[rank0]: guide = RegexGuide.from_regex(regex_string, tokenizer)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/fsm/guide.py", line 92, in from_regex
[rank0]: return super().from_regex(
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 212, in from_regex
[rank0]: ) = _create_states_mapping(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/fsm/guide.py", line 76, in cached_create_states_mapping
[rank0]: return uncached_create_states_mapping(regex_string, tokenizer, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 141, in create_states_mapping
[rank0]: return create_states_mapping_from_fsm(regex_fsm, tokenizer, frozen_tokens)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 178, in create_states_mapping_from_fsm
[rank0]: states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/regex.py", line 473, in create_fsm_index_tokenizer
[rank0]: tokens_to_token_ids, empty_token_ids = reduced_vocabulary(tokenizer)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/regex.py", line 426, in reduced_vocabulary
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: Cannot convert token `�?` (217017) to bytes: �?
Outlines/Python version information:
Outlines version: 0.2.3
Python version: 3.12.3
Packages installed:
absl-py==2.2.2
accelerate==1.4.0
aiofiles==23.2.1
aiohappyeyeballs==2.4.8
aiohttp==3.11.13
aiosignal==1.3.2
airportsdata==20250224
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
attrs==25.1.0
bert-score==0.3.13
bitsandbytes==0.45.3
blake3==1.0.4
cachetools==5.5.2
certifi==2025.1.31
charset-normalizer==3.4.1
chex==0.1.89 08:29:11 [146/1972]
click==8.1.8
cloudpickle==3.1.1
compressed-tensors==0.9.3
contourpy==1.3.1
cupy-cuda12x==13.4.1
cycler==0.12.1
datasets==3.5.0
demjson3==3.0.6
Deprecated==1.2.18
depyf==0.18.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
dnspython==2.7.0
einops==0.8.1
email_validator==2.2.0
etils==1.12.2
EuroEval @ git+https://github.com/EuroEval/EuroEval@6db11af4ee14cb832065f312d78266b5c1b46a26
evaluate==0.4.3
fastapi==0.115.11
fastapi-cli==0.0.7
fastrlock==0.8.3
fbgemm_gpu==1.1.0
ffmpy==0.5.0
filelock==3.17.0
flash_attn==2.7.4.post1
flax==0.10.4
fonttools==4.56.0
frozenlist==1.5.0
fsspec==2024.12.0
genson==1.3.0
gguf==0.16.0
googleapis-common-protos==1.70.0
gradio==5.20.0
gradio_client==1.7.2
groovy==0.1.2
grpcio==1.71.0
h11==0.14.0
hf-xet==1.0.2
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.30.1
humanize==4.12.2
idna==3.10
importlib_metadata==8.0.0
importlib_resources==6.5.2
interegular==0.3.3
iso3166==2.1.1
jax==0.5.3
jaxlib==0.5.3
Jinja2==3.1.5
jiter==0.8.2
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kiwisolver==1.4.8
lark==1.2.2
Levenshtein==0.27.1
litellm==1.65.1
llguidance==0.7.11
llvmlite==0.44.0
lm-format-enforcer==0.10.11
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.10.1
mdurl==0.1.2
mistral_common==1.5.4
ml_dtypes==0.5.1
more-itertools==10.6.0
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
multiprocess==0.70.16
nanobind==2.6.1
nest-asyncio==1.6.0
networkx==3.4.2
ninja==1.11.1.4
nltk==3.9.1
numba==0.61.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2 08:29:11 [54/1972]
nvidia-ml-py==12.570.86
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
ollama==0.4.7
openai==1.70.0
opencv-python-headless==4.11.0.86
opentelemetry-api==1.26.0
opentelemetry-exporter-otlp==1.26.0
opentelemetry-exporter-otlp-proto-common==1.26.0
opentelemetry-exporter-otlp-proto-grpc==1.26.0
opentelemetry-exporter-otlp-proto-http==1.26.0
opentelemetry-proto==1.26.0
opentelemetry-sdk==1.26.0
opentelemetry-semantic-conventions==0.47b0
opentelemetry-semantic-conventions-ai==0.4.3
opt_einsum==3.4.0
optax==0.2.4
orbax-checkpoint==0.11.10
orjson==3.10.15
outlines==0.2.3
outlines_core==0.1.26
packaging==24.2
pandas==2.2.3
partial-json-parser==0.2.1.1.post5
peft==0.15.0
pillow==11.1.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
propcache==0.3.0
protobuf==3.20.3
psutil==7.0.0
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==19.0.1
pycountry==24.6.1
pydantic==2.10.6
pydantic_core==2.27.2
pydub==0.25.1
Pygments==2.19.1
pyinfer==0.0.3
pyparsing==3.2.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==3.3.0
python-multipart==0.0.20
pytz==2025.1
PyYAML==6.0.2
pyzmq==26.2.1
RapidFuzz==3.12.2
ray==2.43.0
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rich-toolkit==0.13.2
rouge_score==0.1.2
rpds-py==0.23.1
ruff==0.9.9
sacremoses==0.1.1
safehttpx==0.1.6
safetensors==0.5.3
ScandEval==14.0.0
scikit-learn==1.5.2
scipy==1.15.2
semantic-version==2.10.0
sentencepiece==0.2.0
seqeval==1.2.2
setuptools==75.8.2
shellingham==1.5.4
simplejson==3.20.1
six==1.17.0
sniffio==1.3.1
starlette==0.46.0
sympy==1.13.1
tabulate==0.9.0
tenacity==9.0.0
tensorstore==0.1.72
termcolor==2.5.0
threadpoolctl==3.5.0
tiktoken==0.9.0
tokenizers==0.21.1
tomlkit==0.13.2
toolz==1.0.0
torch==2.6.0
torchaudio==2.6.0
torchvision==0.21.0
tqdm==4.67.1
transformers==4.51.3
treescope==0.1.9
triton==3.2.0
typer==0.15.2
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
vllm==0.8.5.post1
watchfiles==1.0.4
websockets==15.0
wheel==0.45.1
wrapt==1.17.2
xformers==0.0.29.post2
xgrammar==0.1.18
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
Context for the issue:
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working