Performance doesn't improve (scalability issue) with # GPUs with running train_imagenet.py

While training AlexNet CNN with ImageNet data, i don't see performance improvement (in-fact i see slight performance degradation) with increasing number of GPUs

python train_imagenet.py --data-train /local/ImageNet/MXNet_data/MXNet_data.rec --data-val /local/ImageNet/MXNet_data/MXNet_data_test.rec --gpus 0,1,2,3 --network alexnet --batch-size 256  --num-epochs 1 --kv-store device

Per epoch (and batch-size/GPU : 64),
With 1 GPU, Time-cost : 910 sec
With 2 GPU, Time-cost : 924 sec
With 4 GPU, Time-cost : 964 sec

I have 4 Titan Xps

However, with synthetic data (as shown in the demo https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/README.md) i see good scalability.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance doesn't improve (scalability issue) with # GPUs with running train_imagenet.py #7813

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance doesn't improve (scalability issue) with # GPUs with running train_imagenet.py #7813

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions