Caffe: a fast open framework for deep learning.
C++ Python Cuda CMake Matlab Shell Makefile
#506 Compare This branch is 1094 commits ahead, 476 commits behind BVLC:master.
Failed to load latest commit information.
3rdparty CPU_ONLY mode removed, cleanup Jan 12, 2018
classification Python 3 syntax enforced Dec 30, 2017
cmake Travis&cleanup Mar 1, 2018
data Issue 490 fixed (FP16 support for SSD train) Apr 9, 2018
docker Pin the base image version for the GPU Dockerfile May 2, 2016
docs CPU_ONLY mode removed, cleanup Jan 12, 2018
examples Tuning Mar 18, 2018
include/caffe Issue 490 fixed (FP16 support for SSD train) Apr 9, 2018
matlab Templateless Blob serializer Oct 28, 2017
models Issue 490 fixed (FP16 support for SSD train) Apr 9, 2018
packaging/deb Travis&cleanup Mar 1, 2018
python all in one pr118 Mar 19, 2018
scripts Travis&cleanup Mar 2, 2018
src Merge branch 'caffe-0.17' into orig-caffe-0.17 Apr 9, 2018
tools all in one pr118 Mar 19, 2018
.Doxyfile update doxygen config to stop warnings Sep 3, 2014
.gitignore CUB checked in Aug 30, 2016
.travis.yml 0.17 Feb 28, 2018
CMakeLists.txt Travis&cleanup Mar 1, 2018 [docs] add which will appear on GitHub new Issue/PR p… Jul 30, 2015 clarify the license and copyright terms of the project Aug 7, 2014 installation questions -> caffe-users Oct 19, 2015
LICENSE Squashed commit of the following: Feb 25, 2018
Makefile Travis&cleanup Mar 1, 2018
Makefile.config.example Non-cuDNN path fixed Feb 28, 2018
NVCaffe-User-Guide.pdf User Guide added Jan 31, 2018 Tuning Mar 18, 2018 tuned bvlc_googlenet/lars* Nov 2, 2017 Squashed commit of the following: Apr 26, 2017 Squashed commit of the following: Apr 26, 2017 Squashed commit of the following: Apr 26, 2017 June 2017 release Jun 22, 2017


Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.


NVIDIA Caffe (NVIDIA Corporation ©2017) is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations. Here are the major features:

  • 16 bit (half) floating point train and inference support.
  • Mixed-precision support. It allows to store and/or compute data in either 64, 32 or 16 bit formats. Precision can be defined for every layer (forward and backward passes might be different too), or it can be set for the whole Net.
  • Layer-wise Adaptive Rate Control (LARC) and adaptive global gradient scaler for better accuracy, especially in 16-bit training.
  • Integration with cuDNN v7.
  • Automatic selection of the best cuDNN convolution algorithm.
  • Integration with v2.2 of NCCL library for improved multi-GPU scaling.
  • Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
  • Parallel data parser, transformer and image reader for improved I/O performance.
  • Parallel back propagation and gradient reduction on multi-GPU systems.
  • Fast solvers implementation with fused CUDA kernels for weights and history update.
  • Multi-GPU test phase for even memory load across multiple GPUs.
  • Backward compatibility with BVLC Caffe and NVCaffe 0.15 and higher.
  • Extended set of optimized models (including 16 bit floating point examples).

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}

Useful notes

Libturbojpeg library is used since 0.16.5. It has a packaging bug. Please execute the following (required for Makefile, optional for CMake):

sudo apt-get install libturbojpeg
sudo ln -s /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/