This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Description when using one gpu to train this example(contributed by @Answeror ), it can achieve about 93% acc, so all is correct! and the log first epoch is like this:
Node[0] Epoch[0] Batch [100] Speed: 213.04 samples/sec Train-accuracy=0.193594
Node[0] Epoch[0] Batch [150] Speed: 212.40 samples/sec Train-accuracy=0.232187
Node[0] Epoch[0] Batch [200] Speed: 213.00 samples/sec Train-accuracy=0.252500
Node[0] Epoch[0] Batch [250] Speed: 211.22 samples/sec Train-accuracy=0.276875
Node[0] Epoch[0] Batch [300] Speed: 212.90 samples/sec Train-accuracy=0.307188
Node[0] Epoch[0] Batch [350] Speed: 212.46 samples/sec Train-accuracy=0.328906
but when using multi-gpus(>=2), the acc will not go up all the time, the log of the first two epoch is like this:
Node[0] Epoch[0] Batch [100] Speed: 851.39 samples/sec Train-accuracy=0.107031
Node[0] Epoch[0] Batch [150] Speed: 867.40 samples/sec Train-accuracy=0.101406
Node[0] Epoch[0] Batch [200] Speed: 863.88 samples/sec Train-accuracy=0.096875
Node[0] Epoch[0] Batch [250] Speed: 833.26 samples/sec Train-accuracy=0.100469
Node[0] Epoch[0] Batch [300] Speed: 856.32 samples/sec Train-accuracy=0.098281
Node[0] Epoch[0] Batch [350] Speed: 684.60 samples/sec Train-accuracy=0.100000
Node[0] Epoch[0] Resetting Data Iterator
Node[0] Epoch[0] Time cost=64.076
Node[0] Epoch[0] Validation-accuracy=0.100277
Node[0] Epoch[1] Batch [50] Speed: 883.92 samples/sec Train-accuracy=0.094531
Node[0] Epoch[1] Batch [100] Speed: 880.32 samples/sec Train-accuracy=0.104844
Node[0] Epoch[1] Batch [150] Speed: 687.25 samples/sec Train-accuracy=0.096094
Node[0] Epoch[1] Batch [200] Speed: 870.87 samples/sec Train-accuracy=0.097500
Node[0] Epoch[1] Batch [250] Speed: 865.23 samples/sec Train-accuracy=0.094062
Node[0] Epoch[1] Batch [300] Speed: 899.02 samples/sec Train-accuracy=0.100312
Node[0] Epoch[1] Batch [350] Speed: 823.62 samples/sec Train-accuracy=0.106875
i have checked the kvstore in this examples, but found nothing wrong. so do somebody what's wrong? thank~
Reactions are currently unavailable