[resnet50v1.5/tensorflow] Training performance cannot reproduce

Related to **Model/Framework(s)** 
resnet50v1.5/tensorflow

**Describe the bug**
mixed precsion+xla+batch size 256 1gpu can only get 786 img/s
mixed precsion+xla+batch size 256 4gpu get 3068 img/s.
much slower than benchmark 1270 img/s(1gpu)

**To Reproduce**
Steps to reproduce the behavior:
we did not change code, just use below code to start 
```
mpiexec --allow-run-as-root --bind-to none --map-by slot -np 2 python main.py \
    --mode=training_benchmark \
	--use_xla \
	--warmup_steps 200 \
	--num_iter 500 \
	--iter_unit batch \
	--batch_size 256 \
	--data_dir=/ssd2/imagenet/tfrecord/train \
	--results_dir=${work_dirs} \
	--use_tf_amp \
	--use_static_loss_scaling \
	--loss_scale=128
```

we use ``` mpiexec --allow-run-as-root --bind-to none --map-by slot -np xxx``` to start job because --bind-to socket will cause failed to bind memory warning and decrease speed.

and use our code start there wiil be one warning, seems like our environment donot have openib. 
i donot know if this message will decrease speed.
![image](https://user-images.githubusercontent.com/28926237/108584346-779ece80-737b-11eb-8e06-196dfa91c917.png)




**Environment**
Please provide at least:
* Container version (e.g. pytorch:19.05-py3):  nvcr.io/nvidia/tensorflow:20.06-tf1-py3  
* GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 4*Tesla V100-SXM2-32GB
* CUDA driver version (e.g. 418.67): 450.80.02, cuda11
* CPU：Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz * 16 
*CODE: NGC-20.06.5 official code

** log ***
[
[log.txt](https://github.com/NVIDIA/DeepLearningExamples/files/6014151/log.txt)
](url)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[resnet50v1.5/tensorflow] Training performance cannot reproduce #837

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[resnet50v1.5/tensorflow] Training performance cannot reproduce #837

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions