Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6B-模型推理需要多大显存? #20

Closed
Copilot-X opened this issue Nov 6, 2023 · 10 comments
Closed

6B-模型推理需要多大显存? #20

Copilot-X opened this issue Nov 6, 2023 · 10 comments
Assignees
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. doc-complete Your PR changes impact docs and the related docs have been already added. performance question Further information is requested

Comments

@Copilot-X
Copy link

6B-模型推理需要多大显存? 直接加载推理爆显存, 24G不够, 跟chatglm2-3的只需要15G左右不一样么?

@wangye01inf
Copy link

@Copilot-X 请问你的运行代码是怎么样的呢?理论上用 bf16/fp16 加载模型只需要 12GB 左右显存

@Liangdi
Copy link

Liangdi commented Nov 6, 2023

我跑了 demo , 加载了模型后 13G 左右显存占用, 推理时候再多 500MB 左右

@Copilot-X
Copy link
Author

我跑了 demo , 加载了模型后 13G 左右显存占用, 推理时候再多 500MB 左右

加载推理的代码有么? 我对比一下看看

@Liangdi
Copy link

Liangdi commented Nov 6, 2023

我跑了 demo , 加载了模型后 13G 左右显存占用, 推理时候再多 500MB 左右

加载推理的代码有么? 我对比一下看看

就仓库的呀: https://github.com/01-ai/Yi/blob/main/demo/text_generation.py

@ZhaoFancy ZhaoFancy added the question Further information is requested label Nov 6, 2023
@learninmou
Copy link

目前模型是用bfloat16数据类型,6B模型至少需要13GB左右的显存。

@DumoeDss
Copy link

DumoeDss commented Nov 6, 2023

200k上下文的6B与34B模型分别需要多少显存?

@mwmif
Copy link

mwmif commented Nov 6, 2023

image


Yi\demo\text_generation.py 文件中
加两个参数(需要安装一些依赖库,没安装会报错)后,4G 显存也能跑,但是速度超级慢。
还是要依赖llama.cpp 这种优化方案,否则小显存设备基本没法玩

image

ChatGLM3 6B 也是使用chatglm.cpp 量化到4 后,才跑的飞起,使用官方量化方案,也基本十几分钟才有回复。

@ZhaoFancy ZhaoFancy added performance doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Nov 7, 2023
@cutoutsy
Copy link

cutoutsy commented Nov 7, 2023

想问下,推理速度有多少tokens / s

@ZhaoFancy
Copy link
Contributor

本次 Chat 版本的发布特地增加了该部分内容。

@garbe-github-support
Copy link

按照readme给的代码,用的6B chat 11GB模型,8G显存,显卡是3070Ti
能跑但是很慢很慢,10分钟多了
但是同样的机器我跑chatglm3-6b 也是11GB的模型很快呀,几秒钟就开始输出了,一两分钟就输出完了,
难道是因为这个是一次性输出的?

@Yimi81 Yimi81 added the doc-complete Your PR changes impact docs and the related docs have been already added. label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. doc-complete Your PR changes impact docs and the related docs have been already added. performance question Further information is requested
Projects
None yet
Development

No branches or pull requests

10 participants