Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调模型时疑似报错:he OrderedVocab you are attempting to save contains a hole for index 12084, your vocabulary could be corrupted ! #25

Open
zjcjason opened this issue Jun 3, 2023 · 2 comments

Comments

@zjcjason
Copy link

zjcjason commented Jun 3, 2023

我查询了部分资料,问题可能时出在uie_base_pytorch/vocab.txt中了。但是我无法解决这个问题,希望各位大佬帮忙指导!

@LiShaoyu5
Copy link

这应该是ernie tokenizer的问题,我最近用ernie-3.0的时候也有一样的warning,检查了一下确实是tokenizer里缺了一个(tokenizer.json和vocab.txt里没有12084对应的token)。不过这个应该不影响结果。

@fatty-tiger
Copy link

检查一下原始词表是否有重复字符

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants