You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Bug
Description
When running MXNET in imperative (not hybrid) mode using multiple GPUs, it seems like the GPUs do not run in parallel.
(might be related to #8884 )
Environment info (Required)
Package used: Python
Build info (Required if built from source)
N/A
Error Message:
N/A
Minimum reproducible example
Running GluonCV modelzoo cifar_resnet110_v2 on CIFAR10:
HYBRID profiler output
IMPERATIVE profiler output
IMPERATIVE profiler output zoomed
Steps to reproduce
Reproduce using the train_cifar10.py script from https://gluon-cv.mxnet.io/model_zoo/classification.html#cifar10 (download link https://gluon-cv.mxnet.io/_downloads/54189a15ba652c5a2587928303cc2171/train_cifar10.py ).
and add MXNET's profiler to the forward pass.
Or use the train_cifar10.py script including profiler code that can be found in https://gist.github.com/igolan/511b61d17da0694a817a1ac3f9bd8f95
Run:
python train_cifar10.py --num-epochs 200 --mode hybrid --num-gpus 4 -j 2 --batch-size 128 --wd 0.0001 --lr 0.1 --lr-decay 0.1 --lr-decay-epoch 100,150 --model cifar_resnet110_v2
Vs.
python train_cifar10.py --num-epochs 200 --mode imperative --num-gpus 4 -j 2 --batch-size 128 --wd 0.0001 --lr 0.1 --lr-decay 0.1 --lr-decay-epoch 100,150 --model cifar_resnet110_v2
What have you tried to solve it?
N/A
The text was updated successfully, but these errors were encountered: