Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[update] tiktoken base upgrade #191

Closed
ohotto opened this issue May 12, 2024 · 2 comments
Closed

[update] tiktoken base upgrade #191

ohotto opened this issue May 12, 2024 · 2 comments
Assignees
Labels
version up Pro version features of the devolution of the open source version

Comments

@ohotto
Copy link

ohotto commented May 12, 2024

对接了三家中转api

按照对应的人民币价格*10设定了后台价格

但是实际使用时chatnio计算的token远远大于中转api后台的实际值,大约是1.5-3.2倍之间

image

image

例如上述情况,使用模型为gpt-3.5-turbo-0125,中转api价格为 input: 0.0005/ktokens | output: 0.0015/ktokens

16/1000*0.0005+328/1000*0.0015=0.0005

chatnio后台设定价格为 input: 0.005/ktokens | output: 0.015/ktokens,但前台反馈消耗点数 0.014145

0.014145/10 = 0.0014145 >> 0.0005

0.0014145 / 0.0005 = 2.829 倍

即测得chatnio计算token消耗为实际消耗 2.829 倍

经过反复验证,对于gpt3.5、gpt4系列的各种模型都存在上述问题,每次计算的token值倍数还不一致,最低观测到是实际消耗的1.5倍,最高达到3.2倍左右,其余情况集中在2.5-2.9倍之间,最近几次测得的倍数为:
2.57、2.71、2.92、2.51、2.89、2.77、2.98、2.72

项目基于ubuntu-amd64,存在1panel环境,使用docker-compose搭建,使用OpenResty(Nginx)反代,已经尝试切换stable、latest两个镜像都复现该问题

@AnnaStreeter
Copy link

开源版目前 Tokenizer 使用 Tiktoken Legacy,关于对齐新版 OpenAI GPT-3 计费是有问题的。商业版无误。商业版下放工作不在我的工作范围内,开源版何时修复待定。

@zmh-program
Copy link
Member

zmh-program commented May 24, 2024

不是bug, tiktoken版本没更新
token计算器有出入罢了, 更新一下编码就好

@zmh-program zmh-program changed the title [bug] token计算错误 [update] tiktoken base upgrade May 24, 2024
@zmh-program zmh-program added the version up Pro version features of the devolution of the open source version label May 29, 2024
Sh1n3zZ added a commit that referenced this issue Jun 21, 2024
Co-Authored-By: Minghan Zhang <112773885+zmh-program@users.noreply.github.com>
@Sh1n3zZ Sh1n3zZ assigned Sh1n3zZ and unassigned XiaomaiTX Jun 21, 2024
Sh1n3zZ added a commit that referenced this issue Jun 21, 2024
Co-Authored-By: Minghan Zhang <112773885+zmh-program@users.noreply.github.com>
Sh1n3zZ added a commit that referenced this issue Jun 21, 2024
Co-Authored-By: Minghan Zhang <112773885+zmh-program@users.noreply.github.com>
zmh-program added a commit that referenced this issue Jun 22, 2024
…ifferent device types (#204); optimize tiktoken performance (#191) and function calling fields
@Sh1n3zZ Sh1n3zZ closed this as completed Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version up Pro version features of the devolution of the open source version
Projects
None yet
Development

No branches or pull requests

5 participants