Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

昇腾NPU使用API推理报错 #3796

Closed
1 task done
msqp opened this issue May 17, 2024 · 4 comments
Closed
1 task done

昇腾NPU使用API推理报错 #3796

msqp opened this issue May 17, 2024 · 4 comments
Labels
solved This problem has been already solved.

Comments

@msqp
Copy link

msqp commented May 17, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

华为昇腾NPU910b使用API推理时,能够正常启动。但通过API调用报错,错误信息:

05/17/2024 18:45:13 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference.
05/17/2024 18:45:13 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.
05/17/2024 18:45:13 - INFO - llmtuner.model.loader - all params: 7721324544
Visit http://localhost:7703/docs for API document.
INFO:     Started server process [40093]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7703 (Press CTRL+C to quit)
INFO:      - "GET /docs HTTP/1.1" 200 OK
INFO:      - "GET /openapi.json HTTP/1.1" 200 OK
05/17/2024 18:46:14 - INFO - llmtuner.api.chat - ==== request ====
{
  "model": "",
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ]
}
[E OpParamMaker.cpp:273] call aclnnEqScalar failed, detail:EZ9999: Inner Error!
EZ9999: 2024-05-17-18:46:14.721.067  Op Equal does not has any binary.
        TraceBack (most recent call last):
        Kernel Run failed. opType: 88, Equal
        launch failed for Equal, errno:561000.

[ERROR] 2024-05-17-18:46:14 (PID:40093, Device:0, RankID:-1) ERR01005 OPS internal error
Exception raised from operator() at third_party/op-plugin/op_plugin/ops/base_ops/opapi/EqKernelNpuOpApi.cpp:60 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x68 (0xffff8e44a538 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xffff8e3f78a0 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x952c90 (0xfffda8b83c90 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0xe27f0c (0xfffda9058f0c in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x56b960 (0xfffda879c960 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x56bd88 (0xfffda879cd88 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x569d90 (0xfffda879ad90 in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0xafe0c (0xffff8e47ce0c in miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x7a80 (0xffffa59f7a80 in /lib64/libpthread.so.0)
frame #9: <unknown function> + 0xe4d0c (0xffffa582bd0c in /lib64/libc.so.6)


INFO:     113.248.23.222:12592 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/cors.py", line 93, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/cors.py", line 148, in simple_response
    await self.app(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "codex/llama-factory/src/llmtuner/api/app.py", line 85, in create_chat_completion
    return await create_chat_completion_response(request, chat_model)
  File "codex/llama-factory/src/llmtuner/api/chat.py", line 101, in create_chat_completion_response
    responses = await chat_model.achat(
  File "codex/llama-factory/src/llmtuner/chat/chat_model.py", line 56, in achat
    return await self.engine.chat(messages, system, tools, image, **input_kwargs)
  File "codex/llama-factory/src/llmtuner/chat/hf_engine.py", line 252, in chat
    return await loop.run_in_executor(pool, self._chat, *input_args)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "codex/llama-factory/src/llmtuner/chat/hf_engine.py", line 142, in _chat
    generate_output = model.generate(**gen_kwargs)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1431, in generate
    model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 463, in _prepare_attention_mask_for_generation
    is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
  File "miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/_tensor.py", line 1091, in __contains__
    return (element == self).any().item()  # type: ignore[union-attr]
RuntimeError: The Inner error is reported as above.
 Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-05-17-18:46:14 (PID:40093, Device:0, RankID:-1) ERR00100 PTA call acl api failed

运行命令:

ASCEND_RT_VISIBLE_DEVICES=6 python src/api.py --model_name_or_path Qwen/Qwen1.5-7B-Chat --template qwen

Expected behavior

No response

System Info

torch-npu=2.2.0
torch=2.2.0
Ascend-cann-toolkit_8.0.RC1_linux-aarch64
Ascend-cann-kernels-910b_8.0.RC1_linux

尝试更换cann的版本还是不行

Others

No response

@hiyouga
Copy link
Owner

hiyouga commented May 17, 2024

设置 ASCEND_LAUNCH_BLOCKING=1 重新发一下报错

@hiyouga hiyouga added the pending This problem is yet to be addressed. label May 17, 2024
@msqp
Copy link
Author

msqp commented May 17, 2024

报错如下,感谢支持~

05/17/2024 21:03:59 - INFO - llmtuner.api.chat - ==== request ====
{
  "model": "",
  "messages": [
    {
      "role": "user",
      "content": "hello"
    }
  ]
}
INFO:      - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/cors.py", line 93, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/cors.py", line 148, in simple_response
    await self.app(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/codex/llama-factory/src/llmtuner/api/app.py", line 85, in create_chat_completion
    return await create_chat_completion_response(request, chat_model)
  File "/codex/llama-factory/src/llmtuner/api/chat.py", line 101, in create_chat_completion_response
    responses = await chat_model.achat(
  File "/codex/llama-factory/src/llmtuner/chat/chat_model.py", line 56, in achat
    return await self.engine.chat(messages, system, tools, image, **input_kwargs)
  File "/codex/llama-factory/src/llmtuner/chat/hf_engine.py", line 252, in chat
    return await loop.run_in_executor(pool, self._chat, *input_args)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/codex/llama-factory/src/llmtuner/chat/hf_engine.py", line 142, in _chat
    generate_output = model.generate(**gen_kwargs)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1431, in generate
    model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 463, in _prepare_attention_mask_for_generation
    is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
  File "/miniconda3/envs/llama_factory_py3.9/lib/python3.9/site-packages/torch/_tensor.py", line 1091, in __contains__
    return (element == self).any().item()  # type: ignore[union-attr]
RuntimeError: call aclnnEqScalar failed, detail:EZ9999: Inner Error!
EZ9999: 2024-05-17-21:03:59.298.450  Cannot parse json for config file [/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe//kernel/config/ascend910/equal.json].
        TraceBack (most recent call last):
        Failed to parse kernel in equal.json.
        AclOpKernelInit failed opType
        Op Equal does not has any binary.
        Kernel Run failed. opType: 88, Equal
        launch failed for Equal, errno:561000.

[ERROR] 2024-05-17-21:03:59 (PID:183739, Device:0, RankID:-1) ERR01005 OPS internal error

@msqp
Copy link
Author

msqp commented May 21, 2024

安装Ascend-cann-kernels-910_8.0.RC1_linux后解决~

@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels May 21, 2024
@hiyouga hiyouga closed this as completed May 21, 2024
@forest-sys
Copy link

训练出现类似错误,请帮忙看看,感谢支持!
训练命令:
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 llamafactory-cli train examples/full_multi_gpu/interlm2_full_sft.yaml
Ascend-cann-kernels版本:
Ascend-cann-kernels-910b_8.0.RC1.alpha003_linux.run
信息如下:
RuntimeError: call aclnnCast failed, detail:EZ9999: Inner Error!
EZ9999 Cannot parse json for config file [/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe//kernel/config/ascend910/cast.json].
TraceBack (most recent call last):
Failed to parse kernel in cast.json.
AclOpKernelInit failed opType
Op Cast does not has any binary.
Kernel Run failed. opType: 53, Cast
launch failed for Cast, errno:561000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants