ValueError bug in `greedy_until` "too many values to unpack" #628

danielfleischer · 2023-06-29T13:05:24Z

Some tokenizers (Llama, Alpaca) return more than one token for "\n" even with add_special_tokens=False. It causes a value error in:

Line 420 in 72b7f0c

(primary_until,) = self.tok_encode(until[0])

One can replace it with

primary_until = self.tok_encode(until[0])[0]

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2023-06-29T13:56:09Z

Thanks for raising this, and apologies for the bug!

In our upcoming version release, we handle multi-token stop sequences in a more principled+unified way (see here).

I've patched this + added a warning for the hf-causal model in #628 , and confirmed it doesn't crop up in the hf-causal-experimental case.

haileyschoelkopf mentioned this issue Jun 29, 2023

Add error handling for multi-token stopseq and hf-causal model type #630

Merged

haileyschoelkopf closed this as completed in #630 Jun 29, 2023

Provide feedback