Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

Closed
gzpyy opened this issue Aug 14, 2018 · 4 comments
Closed

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

gzpyy opened this issue Aug 14, 2018 · 4 comments
Labels

Comments

@gzpyy
Copy link

gzpyy commented Aug 14, 2018

Description

hi,

I am using the c++ api for prediction based on mobilenet v2. And I found that the codes belows runs 100+ms while I implement the same function using the c api, 10ms only. Both are running on GPU.

c++ version:
executor = net.SimpleBind(global_ctx, args_map, map<string, NDArray>(),
map<string, OpReqType>(), aux_map);
executor->Forward(false); // around 110ms

c version:
MXPredCreate((const char *) json_data.GetBuffer(),
(const char *) param_data.GetBuffer(),
static_cast<size_t>(param_data.GetLength()),
dev_type,
dev_id,
num_input_nodes,
input_keys,
input_shape_indptr,
input_shape_data,
&pred_hnd);
MXPredSetInput(pred_hnd, "data", image_data.data(), image_size);

// Do Predict Forward
MXPredForward(pred_hnd); // 10ms

Is there something that i am doing wrong?

@vdantu
Copy link
Contributor

vdantu commented Aug 15, 2018

@XIAOJINGXIE : This is a good question. Could you also post this on discuss.mxnet.io ? The question will receive a larger audience.

@mxnet-label-bot [C++, C, Question]

@marcoabreu marcoabreu added C++ Related to C++ Question labels Aug 16, 2018
@leleamol
Copy link
Contributor

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

@gzpyy
Copy link
Author

gzpyy commented Apr 7, 2019

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

Thanks a lot.

By the way, the method name Forward() is a bit misleading.

@gzpyy gzpyy closed this as completed Apr 7, 2019
@sheep94lion
Copy link

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

The running time of C API MXPredForward is much shorter than the running time of MXPredGetOutput:

auto start = std::chrono::high_resolution_clock::now();
MXPredForward(pred_hnd);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredForward: %d microseconds.", duration.count());
std::vector<float> data(size);
start = std::chrono::high_resolution_clock::now();
MXPredGetOutput(pred_hnd, output_index, &(data[0]), static_cast<mx_uint>(size));
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredGetOutput: %d microseconds.", duration.count());

The result is:

I/MXNET: MXPredForward: 106 microseconds.
I/MXNET: MXPredGetOutput: 3748967 microseconds.

Why? Is it something related to lazy evaluation?
The code runs on Pixel3 with Snapdragon 835.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants