Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. #2354

Closed
kyoro1 opened this issue Sep 15, 2017 · 12 comments
Closed
Assignees

Comments

@kyoro1
Copy link
Contributor

kyoro1 commented Sep 15, 2017

We'd like to use Fast/Faster R-CNN, and it takes about 30 minutes for a bunch of images under NC6 Azure environment.

As far as I checked, there seemed to be no tremendous improvements regarding training time only when I changed the environment(NCxx). i.e. It also took about 30 minutes with those kind of procedures under NC12 or NC24.

Questions:

  • [Scale-up strategy] If we want to shorten the training time for those procedures under NC12 or NC24, what kind of parameter setting is needed?
  • [Other strategies] I wonder if we can shorten training time with other kind of setting etc..
@kyoro1 kyoro1 changed the title Shorten training time as to Fast/Faster R-CNN without any changes of algorithms. Shorten training time as to Fast/Faster R-CNN without any changes on algorithms. Sep 15, 2017
@cha-zhang
Copy link
Member

We know Faster R-CNN's speed can be improved by writing custom C++ layers rather than Python layers, and use GPU implementation for non-max suppression. This is on-going work and we will gradually integrate.

@kyoro1
Copy link
Contributor Author

kyoro1 commented Sep 15, 2017

@cha-zhang Thanks. Can we use multi-GPU processes with higher NCxx at present? Or, should we wait for the above implementation?

@cha-zhang
Copy link
Member

Multi-GPU would certainly help if you need it immediately. NCCL 2 is integrated in v2.2 (releasing today), so multi-machine training should work well.

@kyoro1
Copy link
Contributor Author

kyoro1 commented Sep 15, 2017

@cha-zhang Really?:) Once the releasing is completed, can you share the tutorial link here?

@cha-zhang
Copy link
Member

Will post release notes on main page once it's out. Or, follow us on Twitter @mscntk.

@kyoro1
Copy link
Contributor Author

kyoro1 commented Sep 21, 2017

@cha-zhang @pkranen I tried to train Fast R-CNN as follows:

mpiexec -n 2 python A2_RunWithPyModel.py

If NC24 is selected, 2-GPU are used then, but the processing time is almost same as normal processing:
python A2_RunWithPyModel.py
In looking at the console log, each GPU seemed to calculate the same process in parallel, and they are just parallel-simulation, and could be waste of resource?

  1. Anyway, can we shorten processing time with mpiexec command for this python module?
  2. Also, the situation is same for FasterRCNN_train.py ?

@cha-zhang
Copy link
Member

This script is not ready for distributed learning. Check scripts like this one:
https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10_Distributed.py
to see how to make things distributed.

@kyoro1
Copy link
Contributor Author

kyoro1 commented Sep 23, 2017

In accordance with the comment as above, I tried to proceed Fast R-CNN with distributed version, at first. Here are the conditions. Related script is this

Conditions:

  • Environment: NC24(w/4 GPU), Windows sv in Azure
  • original script in Fast R-CNN: A2_RunWithPyModel.py
## original learner
learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
  • changed script in Fast R-CNN: A2_RunWithPyModel_distributed.py
## preparation of distributed learning
from cntk import distributed
 :
## original learner was re-named to local_learner, which is taken over into data_parallel_distributed_learner
local_learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
learner = distributed.data_parallel_distributed_learner(local_learner, num_quantization_bits=1, distributed_after=1)
  • samples sizes(Grocery: default sample in this module)

    • training samples: 25
    • test samples: 5
  • epoch numbers: 20

  • command: mpiexec -n 4 python A2_RunWithPyModel_distributed.py

My Questions:

  1. As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.
  2. Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?
  3. Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
  • Original log
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A2_RunWithPyModel.py
--------------------------------------------------------------
2017-09-23 12:49:33
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Selected GPU[1] Tesla K80 as the process wide default device.
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Learning rate per 1 samples: 1e-05
Momentum per 1 samples: 0.9048374180359595
Finished Epoch[1 of 20]: [Training] loss = 3153.350937 * 25, metric = 21.16% * 25 10.572s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 251.568496 * 25, metric = 2.41% * 25 6.971s (  3.6 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 147.409863 * 25, metric = 1.94% * 25 6.982s (  3.6 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 101.552354 * 25, metric = 1.70% * 25 6.965s (  3.6 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 79.782490 * 25, metric = 1.37% * 25 6.994s (  3.6 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 68.687617 * 25, metric = 1.25% * 25 6.964s (  3.6 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 60.549863 * 25, metric = 1.11% * 25 6.986s (  3.6 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 54.716392 * 25, metric = 0.99% * 25 6.976s (  3.6 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 50.048423 * 25, metric = 0.97% * 25 7.013s (  3.6 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 40.542100 * 25, metric = 0.71% * 25 7.001s (  3.6 samples/s);
Learning rate per 1 samples: 1e-06
Finished Epoch[11 of 20]: [Training] loss = 35.926621 * 25, metric = 0.60% * 25 6.995s (  3.6 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.639031 * 25, metric = 0.56% * 25 7.010s (  3.6 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 33.962507 * 25, metric = 0.54% * 25 6.990s (  3.6 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 33.616445 * 25, metric = 0.55% * 25 7.003s (  3.6 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 33.219561 * 25, metric = 0.53% * 25 7.016s (  3.6 samples/s);
Learning rate per 1 samples: 1e-07
Finished Epoch[16 of 20]: [Training] loss = 32.881428 * 25, metric = 0.53% * 25 7.012s (  3.6 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 32.816619 * 25, metric = 0.53% * 25 7.009s (  3.6 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 32.781428 * 25, metric = 0.52% * 25 6.999s (  3.6 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 32.750801 * 25, metric = 0.52% * 25 7.003s (  3.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 32.719143 * 25, metric = 0.52% * 25 7.016s (  3.6 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:52:19
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:813: RuntimeWarning: overflow encountered in exp
  e = np.exp(w)
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:814: RuntimeWarning: invalid value encountered in true_divide
  dist = e / np.sum(e, axis=1)[:, np.newaxis]
Number of rois before non-maxima surpression: 3183
Number of rois  after non-maxima surpression: 461
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 0.5000
AP for         mustard = 1.0000
Mean AP = 0.8524
DONE.
  • Distributed log
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ mpiexec -n 4 python A2_RunWithPyModel_distributed.py
Selected GPU[0] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[2] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[3] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[1] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.065s ( 13.1 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.004s ( 13.3 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.056s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.000s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.035s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.052s ( 13.1 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.040s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.768s (  2.3 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.038s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.043s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.007s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 11.277s (  2.2 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.060s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.047s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.064s ( 13.1 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 9.772s (  2.6 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.070s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:48:18
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
Number of rois before non-maxima surpression: 3184
Number of rois  after non-maxima surpression: 487
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 1.0000
AP for         mustard = 1.0000
Mean AP = 0.8837
DONE.

@cha-zhang
Copy link
Member

@kyoro1 Thanks for the detailed info. To answer your questions:

As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.

It is indeed a little bit strange, although your result seems ok. 

Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?

First epoch CNTK does auto-tuning of convolution algorithms, plus overhead of allocating buffers, validate model architecture, etc. 

Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

With such a small data set, fluctuation is normal. 

@kyoro1
Copy link
Contributor Author

kyoro1 commented Sep 26, 2017

@cha-zhang Thanks for your comment. Then, trial as above is almost expected except the log architecture, isn't it?
Also, do you have a plan to develop distributed Fast R-CNN scripts in near future? Or, should I send a pull-request to master?

@cha-zhang
Copy link
Member

Not in the short term that we will work on this. It would be great if you could send us a PR. :)

@kyoro1
Copy link
Contributor Author

kyoro1 commented Nov 3, 2017

Here is the 1st step for Fast R-CNN with distributed learning. 1312bf8

@kyoro1 kyoro1 closed this as completed Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants