From 3562698afb4b1f12f51eca752740e279f85714c4 Mon Sep 17 00:00:00 2001 From: Evan Shelhamer Date: Fri, 14 Apr 2017 12:45:21 -0700 Subject: [PATCH] drop performance + hardware page and switch to sheet simpler to read and update --- docs/index.md | 9 +++--- docs/performance_hardware.md | 73 -------------------------------------------- 2 files changed, 5 insertions(+), 77 deletions(-) delete mode 100644 docs/performance_hardware.md diff --git a/docs/index.md b/docs/index.md index 302a7d56f88..bbfd91fc7b9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -23,15 +23,14 @@ Thanks to these contributors the framework tracks the state-of-the-art in both c **Speed** makes Caffe perfect for research experiments and industry deployment. Caffe can process **over 60M images per day** with a single NVIDIA K40 GPU\*. -That's 1 ms/image for inference and 4 ms/image for learning. -We believe that Caffe is the fastest convnet implementation available. +That's 1 ms/image for inference and 4 ms/image for learning and more recent library versions and hardware are faster still. +We believe that Caffe is among the fastest convnet implementations available. **Community**: Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join our community of brewers on the [caffe-users group](https://groups.google.com/forum/#!forum/caffe-users) and [Github](https://github.com/BVLC/caffe/).

-\* With the ILSVRC2012-winning [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model and caching IO. -Consult performance [details](/performance_hardware.html). +\* With the ILSVRC2012-winning [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model and prefetching IO.

## Documentation @@ -50,6 +49,8 @@ BAIR suggests a standard distribution format for Caffe models, and provides trai Guidelines for development and contributing to Caffe. * [API Documentation](/doxygen/annotated.html)
Developer documentation automagically generated from code comments. +* [Benchmarking](https://docs.google.com/spreadsheets/d/1Yp4rqHpT7mKxOPbpzYeUfEFLnELDAgxSSBQKp5uKDGQ/edit#gid=0)
+Comparison of inference and learning for different networks and GPUs. ### Examples diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md deleted file mode 100644 index fbf256842f1..00000000000 --- a/docs/performance_hardware.md +++ /dev/null @@ -1,73 +0,0 @@ ---- -title: Performance and Hardware Configuration ---- - -# Performance and Hardware Configuration - -To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model. - -For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified. - -**Acknowledgements**: BAIR members are very grateful to NVIDIA for providing several GPUs to conduct this research. - -## NVIDIA K40 - -Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory. - -Best settings with ECC off and maximum clock speed in standard Caffe: - -* Training is 26.5 secs / 20 iterations (5,120 images) -* Testing is 100 secs / validation set (50,000 images) - -Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn): - -* Training is 19.2 secs / 20 iterations (5,120 images) -* Testing is 60.7 secs / validation set (50,000 images) - -Other settings: - -* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set -* ECC on, default speed: training 31 secs / 20 iterations, test 117 secs / validation set -* ECC off, default speed: training 31 secs / 20 iterations, test 118 secs / validation set - -### K40 configuration tips - -For maximum K40 performance, turn off ECC and boost the clock speed (at your own risk). - -To turn off ECC, do - - sudo nvidia-smi -i 0 --ecc-config=0 # repeat with -i x for each GPU ID - -then reboot. - -Set the "persistence" mode of the GPU settings by - - sudo nvidia-smi -pm 1 - -and then set the clock speed with - - sudo nvidia-smi -i 0 -ac 3004,875 # repeat with -i x for each GPU ID - -but note that this configuration resets across driver reloading / rebooting. Include these commands in a boot script to initialize these settings. For a simple fix, add these commands to `/etc/rc.local` (on Ubuntu). - -## NVIDIA Titan - -Training: 26.26 secs / 20 iterations (5,120 images). -Testing: 100 secs / validation set (50,000 images). - -cuDNN Training: 20.25 secs / 20 iterations (5,120 images). -cuDNN Testing: 66.3 secs / validation set (50,000 images). - - -## NVIDIA K20 - -Training: 36.0 secs / 20 iterations (5,120 images). -Testing: 133 secs / validation set (50,000 images). - -## NVIDIA GTX 770 - -Training: 33.0 secs / 20 iterations (5,120 images). -Testing: 129 secs / validation set (50,000 images). - -cuDNN Training: 24.3 secs / 20 iterations (5,120 images). -cuDNN Testing: 104 secs / validation set (50,000 images).