v1.3.4
codec_bpe.extend_tokenizernow accepts an arbitrary list of tokens to append to the existing tokenizers' additional special token list via the--additional_special_tokensargument.
For example:
python -m codec_bpe.extend_tokenizer \
--existing_tokenizer mistralai/Mistral-7B-v0.1 \
--codec_bpe_tokenizer output/encodec_bpe_4cb_30k \
--additional_special_tokens "<audio>" "</audio>" # optional