C++ api executor->Forward(false) is much slower than MXPredForward? #12159

gzpyy · 2018-08-14T10:59:33Z

Description

hi,

I am using the c++ api for prediction based on mobilenet v2. And I found that the codes belows runs 100+ms while I implement the same function using the c api, 10ms only. Both are running on GPU.

c++ version:
executor = net.SimpleBind(global_ctx, args_map, map<string, NDArray>(),
map<string, OpReqType>(), aux_map);
executor->Forward(false); // around 110ms

c version:
MXPredCreate((const char *) json_data.GetBuffer(),
(const char *) param_data.GetBuffer(),
static_cast<size_t>(param_data.GetLength()),
dev_type,
dev_id,
num_input_nodes,
input_keys,
input_shape_indptr,
input_shape_data,
&pred_hnd);
MXPredSetInput(pred_hnd, "data", image_data.data(), image_size);

// Do Predict Forward
MXPredForward(pred_hnd); // 10ms

Is there something that i am doing wrong?

vdantu · 2018-08-15T16:26:03Z

@XIAOJINGXIE : This is a good question. Could you also post this on discuss.mxnet.io ? The question will receive a larger audience.

@mxnet-label-bot [C++, C, Question]

leleamol · 2019-03-14T23:56:15Z

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

gzpyy · 2019-04-07T15:31:53Z

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

Thanks a lot.

By the way, the method name Forward() is a bit misleading.

sheep94lion · 2019-06-27T02:12:57Z

Hi @XIAOJINGXIE

The Executor->Forward() method performs more tasks as compared to MXPredForward(). It invokes following 2 C APIs internally
MXExecutorForward()
and
MXExecutorOutputs()
where as MXPredForward() only runs the forward pass and does not retrieve the outputs.
In addition, the outputs of the forward pass are copied to the 'output' array in the Executor object, for faster retrieval later.

Therefore, we can not compare MXPredForward() with Executor->Forward(). For the correct comparison, we should find out the time required to invoke MXPredGetOutputShape() and MXPredGetOutput() after invoking the MXPredForward()

I hope this answers the question.
@mxnet-label-bot add [Pending Requester Info]

The running time of C API MXPredForward is much shorter than the running time of MXPredGetOutput:

auto start = std::chrono::high_resolution_clock::now();
MXPredForward(pred_hnd);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredForward: %d microseconds.", duration.count());
std::vector<float> data(size);
start = std::chrono::high_resolution_clock::now();
MXPredGetOutput(pred_hnd, output_index, &(data[0]), static_cast<mx_uint>(size));
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredGetOutput: %d microseconds.", duration.count());

The result is:

I/MXNET: MXPredForward: 106 microseconds.
I/MXNET: MXPredGetOutput: 3748967 microseconds.

Why? Is it something related to lazy evaluation?
The code runs on Pixel3 with Snapdragon 835.

marcoabreu added C++ Related to C++ Question labels Aug 16, 2018

marcoabreu added the Pending Requester Info label Mar 14, 2019

gzpyy closed this as completed Apr 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

gzpyy commented Aug 14, 2018

vdantu commented Aug 15, 2018

leleamol commented Mar 14, 2019

gzpyy commented Apr 7, 2019

sheep94lion commented Jun 27, 2019

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

C++ api executor->Forward(false) is much slower than MXPredForward? #12159

Comments

gzpyy commented Aug 14, 2018

Description

vdantu commented Aug 15, 2018

leleamol commented Mar 14, 2019

gzpyy commented Apr 7, 2019

sheep94lion commented Jun 27, 2019