-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
Description
软件环境
- paddlepaddle:2.4.0
- paddlepaddle-gpu: 2.4.0
- paddlenlp: 2.5.2重复问题
- I have searched the existing issues
错误描述
GPTChinese的tokenizer和model的特殊字符不对应,tokenizer.bos_token_id超出了词表范围稳定复现步骤 & 代码

import paddle
import paddle.nn as nn
import paddlenlp
from paddlenlp.transformers import GPTChineseTokenizer,GPTLMHeadModel
import time
from tqdm import tqdm
tokenizer = GPTChineseTokenizer.from_pretrained('gpt-cpm-small-cn-distill')
model = GPTLMHeadModel.from_pretrained('gpt-cpm-small-cn-distill')
print(tokenizer.bos_token_id,tokenizer.eos_token_id)
print(model.bos_token_id,model.eos_token_id)