Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++推理的问题 / Problem with c++ inference #40473

Closed
AriouatI opened this issue Mar 11, 2022 · 9 comments
Closed

c++推理的问题 / Problem with c++ inference #40473

AriouatI opened this issue Mar 11, 2022 · 9 comments

Comments

@AriouatI
Copy link

AriouatI commented Mar 11, 2022

你好,
在构建 paddle c++ 推理之后,我尝试运行 paddleOCR 和 paddleDetection 但我遇到了多个问题:

PaddleOCR:
用 cpu 运行它确实有效,但是当我尝试用 GPU 运行它时,我得到了这个:
malloc(): invalid size (unsorted) Aborted (core dumped)
我试图找到错误的来源,它似乎来自paddle/fluid/inference/api/details/zero_copy_tensor.cc,下面的行更具体:
auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>(pool.Get(gpu_place));

关于可能导致这种情况的任何想法?以及如何解决?

PaddleDETECTION:
我收到此错误 (在 CPU 和 GPU 上):

`terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():

C++ Traceback (most recent call last):
0 void paddle_infer::Tensor::CopyToCpuImpl(float*, void*, void ()(void), void*) const

Error Message Summary:
InvalidArgumentError: The type of data we are trying to retrieve does not match the type of data currently contained in the container.`

它来自 Paddle/paddle/phi/core/dense_tensor.cc(在另一个问题中谈到 #4174
谢谢。

系统信息

GPU: RTX 3080
OS ubuntu 20.04
GIT COMMIT ID: a40ea45
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
WITH_ROCM: OFF
WITH_ASCEND_CL: OFF
WITH_ASCEND_CXX11: OFF
CUDA version: 11.5
CUDNN version: v8.3
CXX compiler version: 7.5.0


Hello,
After building paddle c++ inference, i tried to run paddleOCR and paddleDetection but i came across multiple issues :

PaddleOCR :
running it with cpu does work but when i try to run it with GPU i get this :
malloc(): invalid size (unsorted) Aborted (core dumped)
i tried to locate the source of the bug and it seems to come from paddle/fluid/inference/api/details/zero_copy_tensor.cc, the line below to be more specific :
auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>(pool.Get(gpu_place));

Any idea on what might be causing this ? and how to solve it ?

PaddleDETECTION:
i get this error (on CPU and GPU):

`terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():

C++ Traceback (most recent call last):
0 void paddle_infer::Tensor::CopyToCpuImpl(float*, void*, void ()(void), void*) const

Error Message Summary:
InvalidArgumentError: The type of data we are trying to retrieve does not match the type of data currently contained in the container.`

It's coming from Paddle/paddle/phi/core/dense_tensor.cc (talked about in another issue #4174
Thanks.

System information

GPU: RTX 3080
OS ubuntu 20.04
GIT COMMIT ID: a40ea45
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
WITH_ROCM: OFF
WITH_ASCEND_CL: OFF
WITH_ASCEND_CXX11: OFF
CUDA version: 11.5
CUDNN version: v8.3
CXX compiler version: 7.5.0

@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@wangxinxin08
Copy link
Contributor

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32
  2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确

@AriouatI
Copy link
Author

你好 @wangxinxin08
谢谢您的答复。

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32

我对代码进行了更深入的研究,试图查看哪个变量不是浮点数,我发现问题出在预测过程的这一部分:
deploy/cpp/src/object_detector.cc

auto inference_start = std::chrono::steady_clock::now();
for (int i = 0; i < repeats; i++) {
predictor_->Run();
// Get output tensor
out_tensor_list.clear();
output_shape_list.clear();
auto output_names = predictor_->GetOutputNames();
for (int j = 0; j < output_names.size(); j++) {
auto output_tensor = predictor_->GetOutputHandle(output_names[j]);
std::vector<int> output_shape = output_tensor->shape();
int out_num = std::accumulate(
output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
output_shape_list.push_back(output_shape);
if (output_tensor->type() == paddle_infer::DataType::INT32) {
out_bbox_num_data_.resize(out_num);
output_tensor->CopyToCpu(out_bbox_num_data_.data());
} else {
std::vector<float> out_data;
out_data.resize(out_num);
output_tensor->CopyToCpu(out_data.data());
out_tensor_list.push_back(out_data);
}
}

在“else”中,out_data 是一个浮点向量,在我的情况下它包含浮点值(全为 0?这是否指向另一个问题?)也许你可以告诉我这个变量对应什么?为什么这条线 output_tensor->CopyToCpu(out_data.data()); 即使它是浮点向量也会抛出错误?

2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确

如前所述,错误发生在 Paddle/fluid/inference/api/details/zero_copy_tensor.cc
这个调用:pool.Get(gpu_place); 是导致程序崩溃的原因。我检查了 CUDA 的路径,一切似乎都是正确的,我相信这与 paddleOCR 有关,因为这行代码在运行 PaddleDetection 时正确执行(所以我怀疑这是一个全球性问题)。

谢谢你。


Hello @wangxinxin08 ,
thank you for your response.

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32
    i went a little deeper in the code trying to see what variable wasn't a float i found that the problem was in this part of the prediction process :
    deploy/cpp/src/object_detector.cc

auto inference_start = std::chrono::steady_clock::now();
for (int i = 0; i < repeats; i++) {
predictor_->Run();
// Get output tensor
out_tensor_list.clear();
output_shape_list.clear();
auto output_names = predictor_->GetOutputNames();
for (int j = 0; j < output_names.size(); j++) {
auto output_tensor = predictor_->GetOutputHandle(output_names[j]);
std::vector<int> output_shape = output_tensor->shape();
int out_num = std::accumulate(
output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
output_shape_list.push_back(output_shape);
if (output_tensor->type() == paddle_infer::DataType::INT32) {
out_bbox_num_data_.resize(out_num);
output_tensor->CopyToCpu(out_bbox_num_data_.data());
} else {
std::vector<float> out_data;
out_data.resize(out_num);
output_tensor->CopyToCpu(out_data.data());
out_tensor_list.push_back(out_data);
}
}
`

in the "else" out_data is a vector of float in my case it contains values which are floats (all 0 ? does this point to another problem ?) maybe you could tell me what this variable corresponds to ? and why does this line output_tensor->CopyToCpu(out_data.data()); throw an error even though it's a float vector ?

2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确

as mentioned before the error occurs in Paddle/fluid/inference/api/details/zero_copy_tensor.cc
This call : pool.Get(gpu_place); is what makes the program crash. I checked the CUDA's path and everything seems to be correct and i believe it is something related to paddleOCR because this line of code is executed correctly when running PaddleDetection (so i doubt it's a global problem).

Thank you.

@AriouatI
Copy link
Author

AriouatI commented Mar 16, 2022

@MissPenguin

有验证Paddle安装是否成功吗:https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html#old-version-anchor-8-%E4%B8%89%E3%80%81%E9%AA%8C%E8%AF%81%E5%AE%89%E8%A3%85

好吧,我成功地使用sudo make inference_lib_dist构建了库

在 cpu 上运行 ocr 会给出正确的输出,所以我认为 paddle 已经正确构建。
构建 ppocr (sudo ./tools/build.sh) 也不会在终端上显示任何错误

我之前提到过导致崩溃的确切行,这是我在 gpu 和 cpu 上得到的输出。
PaddleOCR :

CPU :

mode: det
total images num: 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Fused 0 subgraphs into layer_norm op.
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- fused 0 pairs of fc gru patterns
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0316 13:17:40.582302 895653 fuse_pass_base.cc:57] --- detected 56 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
I0316 13:17:40.589912 895653 fuse_pass_base.cc:57] --- detected 1 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0316 13:17:40.596658 895653 memory_optimize_pass.cc:216] Cluster name : x size: 12
I0316 13:17:40.596663 895653 memory_optimize_pass.cc:216] Cluster name : relu_6.tmp_0 size: 2048
I0316 13:17:40.596665 895653 memory_optimize_pass.cc:216] Cluster name : conv2d_120.tmp_0 size: 1024
I0316 13:17:40.596666 895653 memory_optimize_pass.cc:216] Cluster name : relu_12.tmp_0 size: 4096
I0316 13:17:40.596668 895653 memory_optimize_pass.cc:216] Cluster name : relu_13.tmp_0 size: 8192
I0316 13:17:40.596668 895653 memory_optimize_pass.cc:216] Cluster name : relu_2.tmp_0 size: 1024
I0316 13:17:40.596670 895653 memory_optimize_pass.cc:216] Cluster name : batch_norm_51.tmp_3 size: 8192
I0316 13:17:40.596671 895653 memory_optimize_pass.cc:216] Cluster name : conv2d_113.tmp_0 size: 8192
--- Running analysis [ir_graph_to_program_pass]
I0316 13:17:40.632854 895653 analysis_predictor.cc:1000] ======= optimize end =======
I0316 13:17:40.634990 895653 naive_executor.cc:101] --- skip [feed], feed -> x
I0316 13:17:40.636521 895653 naive_executor.cc:101] --- skip [sigmoid_0.tmp_0], fetch -> fetch
Detected boxes num: 2
The detection visualized image saved in ./ocr_vis.png

GPU:

mode: det
total images num: 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0316 13:19:03.287709 895718 fuse_pass_base.cc:57] --- detected 56 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I0316 13:19:03.320012 895718 fuse_pass_base.cc:57] --- detected 4 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0316 13:19:03.323369 895718 ir_params_sync_among_devices_pass.cc:79] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0316 13:19:03.350996 895718 memory_optimize_pass.cc:216] Cluster name : x size: 12
I0316 13:19:03.351004 895718 memory_optimize_pass.cc:216] Cluster name : relu_2.tmp_0 size: 1024
I0316 13:19:03.351006 895718 memory_optimize_pass.cc:216] Cluster name : batch_norm_50.tmp_4 size: 2048
I0316 13:19:03.351007 895718 memory_optimize_pass.cc:216] Cluster name : relu_6.tmp_0 size: 2048
I0316 13:19:03.351008 895718 memory_optimize_pass.cc:216] Cluster name : conv2d_120.tmp_0 size: 1024
I0316 13:19:03.351009 895718 memory_optimize_pass.cc:216] Cluster name : relu_12.tmp_0 size: 4096
I0316 13:19:03.351011 895718 memory_optimize_pass.cc:216] Cluster name : conv2d_124.tmp_0 size: 256
I0316 13:19:03.351012 895718 memory_optimize_pass.cc:216] Cluster name : batch_norm_48.tmp_3 size: 8192
I0316 13:19:03.351013 895718 memory_optimize_pass.cc:216] Cluster name : relu_13.tmp_0 size: 8192
--- Running analysis [ir_graph_to_program_pass]
I0316 13:19:03.380610 895718 analysis_predictor.cc:1000] ======= optimize end =======
I0316 13:19:03.382716 895718 naive_executor.cc:101] --- skip [feed], feed -> x
I0316 13:19:03.383641 895718 naive_executor.cc:101] --- skip [sigmoid_0.tmp_0], fetch -> fetch
W0316 13:19:03.415621 895718 gpu_context.cc:244] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.5
W0316 13:19:03.417965 895718 gpu_context.cc:272] device: 0, cuDNN Version: 8.3.
malloc(): invalid size (unsorted)
Aborted (core dumped)

@MissPenguin
Copy link
Contributor

按照我上面提供的验证方式:
image
验证paddle安装正确性的输出是啥样的,可以贴出来看下吗

@AriouatI
Copy link
Author

AriouatI commented Mar 16, 2022

按照我上面提供的验证方式: image 验证paddle安装正确性的输出是啥样的,可以贴出来看下吗

这是输出:

import paddle
paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0316 15:11:13.183279 897518 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.0
W0316 15:11:13.196525 897518 device_context.cc:422] device: 0, cuDNN Version: 8.3.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

谢谢。
@MissPenguin

@litao-zhx
Copy link

请问你那个输入输出类型不匹配的问题解决了吗

@paddle-bot paddle-bot bot closed this as completed Jul 2, 2024
Copy link

paddle-bot bot commented Jul 2, 2024

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants