-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Related to Model/Framework(s)
resnet50v1.5/tensorflow
Describe the bug
mixed precsion+xla+batch size 256 1gpu can only get 786 img/s
mixed precsion+xla+batch size 256 4gpu get 3068 img/s.
much slower than benchmark 1270 img/s(1gpu)
To Reproduce
Steps to reproduce the behavior:
we did not change code, just use below code to start
mpiexec --allow-run-as-root --bind-to none --map-by slot -np 2 python main.py \
--mode=training_benchmark \
--use_xla \
--warmup_steps 200 \
--num_iter 500 \
--iter_unit batch \
--batch_size 256 \
--data_dir=/ssd2/imagenet/tfrecord/train \
--results_dir=${work_dirs} \
--use_tf_amp \
--use_static_loss_scaling \
--loss_scale=128
we use mpiexec --allow-run-as-root --bind-to none --map-by slot -np xxx
to start job because --bind-to socket will cause failed to bind memory warning and decrease speed.
and use our code start there wiil be one warning, seems like our environment donot have openib.
i donot know if this message will decrease speed.
Environment
Please provide at least:
- Container version (e.g. pytorch:19.05-py3): nvcr.io/nvidia/tensorflow:20.06-tf1-py3
- GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 4*Tesla V100-SXM2-32GB
- CUDA driver version (e.g. 418.67): 450.80.02, cuda11
- CPU:Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz * 16
*CODE: NGC-20.06.5 official code
** log ***
[
log.txt
](url)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working