from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
tokenizer.encode(tokenizer.eos_token) # [64790, 64792, 2893, 30917, 30994]
tokenizer.eos_token_id # 2
XTuner uses tokenizer.encode(tokenizer.eos_token) instead of tokenizer.eos_token_id to process data. This causes that the fine-tuned ChatGLM2 cannot stop the generation.
XTuner uses
tokenizer.encode(tokenizer.eos_token)instead oftokenizer.eos_token_idto process data. This causes that the fine-tuned ChatGLM2 cannot stop the generation.