Optimized accuracy calculation #5734

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
1 participant
Contributor

Noiredd commented Jul 3, 2017 edited

Accuracy calculation, particularly for segmentation tasks with top_k: 1, can be unnecessarily slow: note the two nested for loops (over the batch and over image pixels) with partial_sort inside. The main culprit here is the need to copy all data to a new container so we can sort it.

My proposal is to replace that with a dynamically updated priority_queue. Instead of copying every prediction vector and sorting it later, it is much faster to iterate over it just once and only copy an element when it's larger than the smallest of k currently best ones (provided by an automatically sorted container, priority queue being the fastest).

Benchmark settings:

  • net: FCN_AlexNet, batch size 1,
  • dataset: PASCAL_VOC (21 classes, 1464 images in training set, 1449 for validation, images are about 360x550) - about 250M top_k searches per epoch,
  • solver setup: max_iter: 14640, test_interval: 1464, test_iter: 1449,
  • hardware: Titan Z, 2x Xeon E5620 (2.4 GHz),
  • software: Ubuntu 14.04, CUDA 8.0, cuDNN 5.1, ATLAS, DIGITS 5.1-dev.

Each build was ran 4 times (Titan Z has two cores so I ran two nets in parallel, making it effectively 2 times 2 runs); times were measured from job initialization to completion (as reported by DIGITS).

Results:

top_k current master priority_queue optimal search*
1 90m 00s 58m 57s 58m 34s
5 98m 49s 88m 42s N/A

(*) for top_k==1 we don't need any container, just remember the best element's value and index - as we see, this is only marginally faster so for the sake of code clarity I opted against having a separate case for that.

For me this is about 10% faster for top-5 accuracy, and over 30% faster for top-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment