CuDNN low-level APIs in PyTorch

This version of Pytorch has forked from the main PyTorch repo. There are some modifications to provide Python APIs for those who want to have more access to the cuDNN backend.

Why do we need cuDNN low-level APIs in PyTorch?

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers [1].

PyTorch uses cuDNN as one of the backends to accelerate deep neural network training and inference. cuDNN provides APIs and different algorithms for each layer. As an example, cuDNN provides about seven algorithms for the convolution layer. Each of these algorithms has a different execution time and memory usage. Choosing such parameters affects both performance and energy consumption of Machine Learning (ML) applications in production.

Among all of the possible cuDNN APIs, very few of them have been exposed to the user in PyTorch. For example, for convolution layer in ML applications, PyTorch only provides one of cuDNN APIs that based on convolution layer parameters (such as input size, kernel size, stride, dilation, and etc) uses either a heuristic or an exhaustive search approach to choose the best convolution algorithm (execution time by default). In the current version of PyTorch, there is no cuDNN API for ML researchers and engineers to choose a specific implementation of convolution algorithms implemented in cuDNN.

In this version of PyTorch, we expose some cuDNN APIs in PyTorch to give ML researchers more freedom to carefully choose and fine-tune cuDNN parameters.

Supported layers

1- Convolution layer (forward)

PyTorch provides an API to choose between hueristic and exhaustive search approaches to perform convolution. By default, hueristic approach is enabled. In order to enable the exhaustive approach, the user has to enbale it by adding the below line to their Python script:

torch.backends.cudnn.benchmark = True

We expose an API to choose a convolution algorithm among the supported ones. To select the algorithm, add the following line to your Python script:

torch.backends.cudnn.conv_fwd_algo = N

N is the number of the selected algorithm. In the latest version of cuDNN (8.0.4), there are 7 algorithms for convolution but not all of them are supported on all GPU architectures. To see which ones are supported on your target machine, the current version of this repo prints the supported convolution versions when the user performs a forward propagation with their network. Below is the output of a script of a network with one convolution layer. We selected algorithm #1 for convolution:

Conv FWD algo set to: 1             // output after executing: torch.backends.cudnn.conv_fwd_algo = 1
                                    // (printed by cuDNN)

FwdAlgorithms profile results:      // result of exhaustive search algorithm (printed by cuDNN)
Algo,   time, 	    memory
0,    0.133632,	0
2,    0.16272,	7840000
5,    0.188288,	7468032
1,    0.230432,	7424
7,    0.603808,	20604480
4,    1.02371,	13928000
6,    -1,	      0
3,    -1,	      0
----------------- 
Supported FwdAlgorithms:            // supported algorithms sorted by exec. time (printed by cuDNN)
0,  2,  5,  1,  7,  4
-----------------
Requested fwd algorithm (1) is set. // selected convolution algorithm (printed by cuDNN)

How to build

The current version has been tested with CUDA v10.01 and cuDNN v8.0.4.
I found this post easy-to-follow.
Build using the instructions provided in PyTorch repository.

Known issues / TODOs

Version 0.1

1- Only forward propagation can be selected for now

2- The supported algorithms are unknown before running the heuristic/exhaustive algorithms. The supported algorithms are printed after running the convolution layer. So the user needs to run the code ones and see the printed supported algorithms and then choose one from them.

Name		Name	Last commit message	Last commit date
Latest commit History 30,750 Commits
.circleci		.circleci
.ctags.d		.ctags.d
.github		.github
.jenkins		.jenkins
android		android
aten		aten
benchmarks		benchmarks
binaries		binaries
c10		c10
caffe2		caffe2
cmake		cmake
docker		docker
docs		docs
ios		ios
mode		mode
modules		modules
scripts		scripts
submodules		submodules
test		test
third_party		third_party
tools		tools
torch		torch
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.cmakelintrc		.cmakelintrc
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.aten.yml		.travis.aten.yml
BUILD.bazel		BUILD.bazel
CITATION		CITATION
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GLOSSARY.md		GLOSSARY.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
WORKSPACE		WORKSPACE
aten.bzl		aten.bzl
codecov.yml		codecov.yml
docker.Makefile		docker.Makefile
mypy-strict.ini		mypy-strict.ini
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.py		setup.py
ubsan.supp		ubsan.supp
version.txt		version.txt

License

AliJahan/pytorch

Folders and files

Latest commit

History

Repository files navigation

CuDNN low-level APIs in PyTorch

Why do we need cuDNN low-level APIs in PyTorch?

Supported layers

1- Convolution layer (forward)

How to build

Known issues / TODOs

Version 0.1

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages