Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

kyoro1 · 2017-09-15T00:52:51Z

We'd like to use Fast/Faster R-CNN, and it takes about 30 minutes for a bunch of images under NC6 Azure environment.

As far as I checked, there seemed to be no tremendous improvements regarding training time only when I changed the environment(NCxx). i.e. It also took about 30 minutes with those kind of procedures under NC12 or NC24.

Questions:

[Scale-up strategy] If we want to shorten the training time for those procedures under NC12 or NC24, what kind of parameter setting is needed?
[Other strategies] I wonder if we can shorten training time with other kind of setting etc..

The text was updated successfully, but these errors were encountered:

cha-zhang · 2017-09-15T14:35:57Z

We know Faster R-CNN's speed can be improved by writing custom C++ layers rather than Python layers, and use GPU implementation for non-max suppression. This is on-going work and we will gradually integrate.

kyoro1 · 2017-09-15T14:40:53Z

@cha-zhang Thanks. Can we use multi-GPU processes with higher NCxx at present? Or, should we wait for the above implementation?

cha-zhang · 2017-09-15T14:54:34Z

Multi-GPU would certainly help if you need it immediately. NCCL 2 is integrated in v2.2 (releasing today), so multi-machine training should work well.

kyoro1 · 2017-09-15T14:57:53Z

@cha-zhang Really?:) Once the releasing is completed, can you share the tutorial link here?

cha-zhang · 2017-09-15T15:01:37Z

Will post release notes on main page once it's out. Or, follow us on Twitter @mscntk.

kyoro1 · 2017-09-21T05:28:24Z

@cha-zhang @pkranen I tried to train Fast R-CNN as follows:

mpiexec -n 2 python A2_RunWithPyModel.py

If NC24 is selected, 2-GPU are used then, but the processing time is almost same as normal processing:
python A2_RunWithPyModel.py
In looking at the console log, each GPU seemed to calculate the same process in parallel, and they are just parallel-simulation, and could be waste of resource?

Anyway, can we shorten processing time with mpiexec command for this python module?
Also, the situation is same for FasterRCNN_train.py ?

cha-zhang · 2017-09-21T14:57:45Z

This script is not ready for distributed learning. Check scripts like this one:
https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10_Distributed.py
to see how to make things distributed.

kyoro1 · 2017-09-23T13:11:07Z

In accordance with the comment as above, I tried to proceed Fast R-CNN with distributed version, at first. Here are the conditions. Related script is this

Conditions:

Environment: NC24(w/4 GPU), Windows sv in Azure
original script in Fast R-CNN: A2_RunWithPyModel.py

## original learner
learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)

changed script in Fast R-CNN: A2_RunWithPyModel_distributed.py

## preparation of distributed learning
from cntk import distributed
 :
## original learner was re-named to local_learner, which is taken over into data_parallel_distributed_learner
local_learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
learner = distributed.data_parallel_distributed_learner(local_learner, num_quantization_bits=1, distributed_after=1)

samples sizes(Grocery: default sample in this module)
- training samples: 25
- test samples: 5
epoch numbers: 20
command: mpiexec -n 4 python A2_RunWithPyModel_distributed.py

My Questions:

As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.
Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?
Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);

Original log

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A2_RunWithPyModel.py
--------------------------------------------------------------
2017-09-23 12:49:33
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Selected GPU[1] Tesla K80 as the process wide default device.
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Learning rate per 1 samples: 1e-05
Momentum per 1 samples: 0.9048374180359595
Finished Epoch[1 of 20]: [Training] loss = 3153.350937 * 25, metric = 21.16% * 25 10.572s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 251.568496 * 25, metric = 2.41% * 25 6.971s (  3.6 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 147.409863 * 25, metric = 1.94% * 25 6.982s (  3.6 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 101.552354 * 25, metric = 1.70% * 25 6.965s (  3.6 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 79.782490 * 25, metric = 1.37% * 25 6.994s (  3.6 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 68.687617 * 25, metric = 1.25% * 25 6.964s (  3.6 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 60.549863 * 25, metric = 1.11% * 25 6.986s (  3.6 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 54.716392 * 25, metric = 0.99% * 25 6.976s (  3.6 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 50.048423 * 25, metric = 0.97% * 25 7.013s (  3.6 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 40.542100 * 25, metric = 0.71% * 25 7.001s (  3.6 samples/s);
Learning rate per 1 samples: 1e-06
Finished Epoch[11 of 20]: [Training] loss = 35.926621 * 25, metric = 0.60% * 25 6.995s (  3.6 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.639031 * 25, metric = 0.56% * 25 7.010s (  3.6 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 33.962507 * 25, metric = 0.54% * 25 6.990s (  3.6 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 33.616445 * 25, metric = 0.55% * 25 7.003s (  3.6 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 33.219561 * 25, metric = 0.53% * 25 7.016s (  3.6 samples/s);
Learning rate per 1 samples: 1e-07
Finished Epoch[16 of 20]: [Training] loss = 32.881428 * 25, metric = 0.53% * 25 7.012s (  3.6 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 32.816619 * 25, metric = 0.53% * 25 7.009s (  3.6 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 32.781428 * 25, metric = 0.52% * 25 6.999s (  3.6 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 32.750801 * 25, metric = 0.52% * 25 7.003s (  3.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 32.719143 * 25, metric = 0.52% * 25 7.016s (  3.6 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:52:19
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:813: RuntimeWarning: overflow encountered in exp
  e = np.exp(w)
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:814: RuntimeWarning: invalid value encountered in true_divide
  dist = e / np.sum(e, axis=1)[:, np.newaxis]
Number of rois before non-maxima surpression: 3183
Number of rois  after non-maxima surpression: 461
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 0.5000
AP for         mustard = 1.0000
Mean AP = 0.8524
DONE.

Distributed log

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ mpiexec -n 4 python A2_RunWithPyModel_distributed.py
Selected GPU[0] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[2] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[3] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[1] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.065s ( 13.1 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.004s ( 13.3 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.056s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.000s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.035s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.052s ( 13.1 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.040s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.768s (  2.3 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.038s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.043s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.007s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 11.277s (  2.2 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.060s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.047s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.064s ( 13.1 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 9.772s (  2.6 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.070s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:48:18
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
Number of rois before non-maxima surpression: 3184
Number of rois  after non-maxima surpression: 487
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 1.0000
AP for         mustard = 1.0000
Mean AP = 0.8837
DONE.

cha-zhang · 2017-09-26T00:34:47Z

@kyoro1 Thanks for the detailed info. To answer your questions:

As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.

It is indeed a little bit strange, although your result seems ok.

Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?

First epoch CNTK does auto-tuning of convolution algorithms, plus overhead of allocating buffers, validate model architecture, etc.

Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

With such a small data set, fluctuation is normal.

kyoro1 · 2017-09-26T00:59:06Z

@cha-zhang Thanks for your comment. Then, trial as above is almost expected except the log architecture, isn't it?
Also, do you have a plan to develop distributed Fast R-CNN scripts in near future? Or, should I send a pull-request to master?

cha-zhang · 2017-09-26T01:01:36Z

Not in the short term that we will work on this. It would be great if you could send us a PR. :)

kyoro1 · 2017-11-03T01:04:31Z

Here is the 1st step for Fast R-CNN with distributed learning. 1312bf8

kyoro1 changed the title ~~Shorten training time as to Fast/Faster R-CNN without any changes of algorithms.~~ Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. Sep 15, 2017

sayanpa assigned cha-zhang Sep 18, 2017

kyoro1 mentioned this issue Sep 26, 2017

Iteration Plan (September - October 2017) #2410

Closed

19 tasks

kyoro1 mentioned this issue Sep 26, 2017

Modified Fast R-CNN scripts for distributed learning. #2414

Closed

cha-zhang added the pull in progress label Oct 13, 2017

kyoro1 closed this as completed Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 21, 2017 •

edited

Loading

cha-zhang commented Sep 21, 2017

kyoro1 commented Sep 23, 2017 •

edited

Loading

cha-zhang commented Sep 26, 2017

kyoro1 commented Sep 26, 2017

cha-zhang commented Sep 26, 2017

kyoro1 commented Nov 3, 2017

Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

Comments

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 15, 2017

cha-zhang commented Sep 15, 2017

kyoro1 commented Sep 21, 2017 • edited Loading

cha-zhang commented Sep 21, 2017

kyoro1 commented Sep 23, 2017 • edited Loading

Conditions:

My Questions:

cha-zhang commented Sep 26, 2017

kyoro1 commented Sep 26, 2017

cha-zhang commented Sep 26, 2017

kyoro1 commented Nov 3, 2017

kyoro1 commented Sep 21, 2017 •

edited

Loading

kyoro1 commented Sep 23, 2017 •

edited

Loading