使用本地部署的模型效率如何改进 #448
Unanswered
Phoenix0809
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用本地部署的模型Qwen2.5-14B-Instruct-GPTQ-Int8在配置中添加"rope_scaling": {
"factor": 3.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
来适配报告的长文本,但是效率实在是太慢了,而且在执行生成报告时会阻塞我其他项目的模型调用,导致其他项目超时,请问该怎么优化或解决?
Beta Was this translation helpful? Give feedback.
All reactions