You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
With model.hybridize()
With mxnet-cu100 >= 1.6.0b20191102, the time cost of the first batch is more than 283s with a single GPU, more than 1000s with 4 GPUs.
With mxnet-cu100==1.6.0b20191101, the time cost of the first batch is about 11s with a single GPU.
With model.hybridize() and os.environ['MXNET_USE_FUSION'] = '0'
for both mxnet-cu100 == 1.6.0b201911102 and 1.6.0b20191215, the time cost of the first batch is around 11s.
Without hybridize():
The performance are nearly the same.
Error Message
(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
Some increase in the time of the first batch is expected with fusion enabled, but that is definitely excessive. Just to confirm - by mxnet-cu100 >= 1.6.0b20191102 you mean that you tested both builds form 11/02 and 12/15? There was a change #16783 that went to master on November 12 (and backported to 1.6 on November 15) that sped up the compilation by caching the results.
Ok, I can reproduce it. The first observation is that it is not actually the compilation time that takes so much time (so probably the fusion graph pass?).
I will dig further into it. At least when comparing the second epoch between fusion-enabled and disabled runs the fusion is faster (83.8s vs 87s) ;-)
I created the PR with a fix to this. Locally the graph pass takes now ~4.5 s on a single GPU, down from over 200 s (since you are not using multiple processes in that script, the time for multiple GPU would still scale linearly unfortunately, but changing that would require much bigger changes to MXNet).
Description
I'm running the script at https://github.com/dmlc/gluon-nlp/tree/master/scripts/language_model/run_glue.py. The command line to reproduce:
python run_glue.py --task MRPC --gpu 8 --batch_size 32
I have the following observation:
With model.hybridize()
With mxnet-cu100 >= 1.6.0b20191102, the time cost of the first batch is more than 283s with a single GPU, more than 1000s with 4 GPUs.
With mxnet-cu100==1.6.0b20191101, the time cost of the first batch is about 11s with a single GPU.
With model.hybridize() and os.environ['MXNET_USE_FUSION'] = '0'
for both mxnet-cu100 == 1.6.0b201911102 and 1.6.0b20191215, the time cost of the first batch is around 11s.
Without hybridize():
The performance are nearly the same.
Error Message
(Paste the complete error message. Please also include stack trace by setting environment variable
DMLC_LOG_STACK_TRACE_DEPTH=10
before running your script.)To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
The text was updated successfully, but these errors were encountered: