Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU方式运行模型服务出错 #42

Open
a652 opened this issue Oct 31, 2023 · 3 comments
Open

GPU方式运行模型服务出错 #42

a652 opened this issue Oct 31, 2023 · 3 comments

Comments

@a652
Copy link

a652 commented Oct 31, 2023

按照README运行命令:
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080

报错信息如下:
ggml_metal_init: GPU name: Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 558.13 MB
llama_new_context_with_model: max tensor size = 224.77 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 4096.00 MB, offs = 0
ggml_metal_add_buffer: allocated 'data ' buffer, size = 486.91 MB, offs = 4059267072, ( 4583.53 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1346.00 MB, ( 5929.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 552.02 MB, ( 6481.55 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1459: false
Abort trap: 6

电脑信息:
M1 MacBook Pro
MacOS Sonoma 14.1

@a652
Copy link
Author

a652 commented Nov 1, 2023

相关issue:ggerganov/llama.cpp#2048

建议列一个对应的硬件指标。

@qqggwm
Copy link

qqggwm commented Nov 2, 2023

llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = codeshell
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 70144
llm_load_print_meta: n_merges = 72075
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 42
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 16384
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q4_0
llm_load_print_meta: model params = 7.98 B
llm_load_print_meta: model size = 4.25 GiB (4.58 BPW)
llm_load_print_meta: general.name = CodeShell
llm_load_print_meta: BOS token = 70000 '<|endoftext|>'
llm_load_print_meta: EOS token = 70000 '<|endoftext|>'
llm_load_print_meta: UNK token = 70000 '<|endoftext|>'
llm_load_print_meta: PAD token = 70000 '<|endoftext|>'
llm_load_print_meta: LF token = 28544 'ÄĬ'
llm_load_tensors: ggml ctx size = 0.17 MB
llm_load_tensors: mem required = 4355.64 MB
..............................................................................................
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 1344.00 MB
llama_new_context_with_model: compute buffer total size = 558.13 MB
Segmentation fault (core dumped)

楼主解决了吗遇到类似的问题

@a652
Copy link
Author

a652 commented Nov 3, 2023

llm_load_print_meta: format = GGUF V2 (latest) llm_load_print_meta: arch = codeshell llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 70144 llm_load_print_meta: n_merges = 72075 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 42 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 0.0e+00 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 16384 llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 7.98 B llm_load_print_meta: model size = 4.25 GiB (4.58 BPW) llm_load_print_meta: general.name = CodeShell llm_load_print_meta: BOS token = 70000 '<|endoftext|>' llm_load_print_meta: EOS token = 70000 '<|endoftext|>' llm_load_print_meta: UNK token = 70000 '<|endoftext|>' llm_load_print_meta: PAD token = 70000 '<|endoftext|>' llm_load_print_meta: LF token = 28544 'ÄĬ' llm_load_tensors: ggml ctx size = 0.17 MB llm_load_tensors: mem required = 4355.64 MB .............................................................................................. llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: kv self size = 1344.00 MB llama_new_context_with_model: compute buffer total size = 558.13 MB Segmentation fault (core dumped)

楼主解决了吗遇到类似的问题

没有,我的电脑只有8G内存,推测是不够用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants