You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
I don't understand this because copying data from gpu to cpu should be really fast. For example, the following code takes only 0.2ms to run.
# speed test
import time
import mxnet as mx
a = mx.nd.random_uniform(shape=(256, 1000), ctx=mx.cpu())
b = mx.nd.random_uniform(shape=(256, 1000), ctx=mx.gpu())
t0 = time.time()
b.copyto(a)
print time.time()-t0
Am I doing this in a wrong way? Any help is highly appreciated. Thanks.
-- John
PS: The architecture is resnet50.
The text was updated successfully, but these errors were encountered:
The output dimension should be the same but for some reason the data copying time is reduced a lot. Cannot figure out why
-- John
johnbroughton2017
changed the title
How to speed up prediction run time? Copying gpu->cpu takes a long time
How to speed up mxnet prediction? Copying gpu->cpu takes a long time
Feb 25, 2018
mod.forward and 'NDArray.copyto are async functions. It's not accurate to simply time the line of python code. You need to insertwaitall()function betweenforwardandget_outputs` and measure the time difference between the beginning and the end of each code block. More precisely, you can use the MXNet built-in profiler to get the accurate execution time of each operation.
Hi all,
Doing prediction using mxnet has two major part: forward pass and copy results from gpu to cpu memory, as
I did a quick timing based on batch size (see below). It seems like the second operation above takes a lot of time when batch size increases.
I don't understand this because copying data from gpu to cpu should be really fast. For example, the following code takes only 0.2ms to run.
Am I doing this in a wrong way? Any help is highly appreciated. Thanks.
-- John
PS: The architecture is resnet50.
The text was updated successfully, but these errors were encountered: