运行7B模型32G的T4不够嘛，每次都被kill了 #22

ruterfu · 2023-04-11T08:20:59Z

按照【快速开始】来进行快速使用，python执行后，GenerateLm吃了95%的内存，然后load_model(model, args.load_model_path)运行到一半直接被kill了(我后来加了个20G虚拟内存依然被kill了)，还是我操作有问题

fengyh3 · 2023-04-12T09:51:36Z

可以尝试一下：https://github.com/fengyh3/llama_inference
上述脚本是针对tencentpretrain的llama做inference的～

至于你提到的问题，我之前也试过，大概率是运行内存不够的问题。我运行内存是14G，然后加了24G的swap内存，没问题。

ruterfu · 2023-04-13T01:57:10Z

啊。。我那台GPU是32G的内存。。。GenerateLm运行完毕后直接吃掉95%，然load_model，我分配了20G的虚拟内存，全部吃光然后被kill。。。分配30G虚拟内存还没试。。。腾讯云T4 GPU

wujianming1996 · 2023-04-20T06:36:26Z

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
请问在部署微服务的时候如果没有GPU可以用CPU吗？

fengyh3 · 2023-04-20T07:02:12Z

@wujianming1996 尝试注释掉llama_server.py的第13行

wujianming1996 · 2023-04-20T12:33:13Z

@fengyh3 谢谢，我试试。

ruterfu closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2023

ruterfu reopened this Apr 11, 2023

ruterfu closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行7B模型32G的T4不够嘛，每次都被kill了 #22

运行7B模型32G的T4不够嘛，每次都被kill了 #22

ruterfu commented Apr 11, 2023 •

edited

fengyh3 commented Apr 12, 2023 •

edited

ruterfu commented Apr 13, 2023

wujianming1996 commented Apr 20, 2023

fengyh3 commented Apr 20, 2023 •

edited

wujianming1996 commented Apr 20, 2023

运行7B模型32G的T4不够嘛，每次都被kill了 #22

运行7B模型32G的T4不够嘛，每次都被kill了 #22

Comments

ruterfu commented Apr 11, 2023 • edited

fengyh3 commented Apr 12, 2023 • edited

ruterfu commented Apr 13, 2023

wujianming1996 commented Apr 20, 2023

fengyh3 commented Apr 20, 2023 • edited

wujianming1996 commented Apr 20, 2023

ruterfu commented Apr 11, 2023 •

edited

fengyh3 commented Apr 12, 2023 •

edited

fengyh3 commented Apr 20, 2023 •

edited