About

3D grouped convolutions in PyTorch are slow. This repo provides a way to tune faster grouped 3d convolution modules that can be used in exactly the same way as a standard pytorch module.

Motivation

Grouped convolutions for most cases should be faster than group-1 convolutions since there is less parameters and less operations. In pytorch 2D grouped convolutions are faster than their group-1 counterparts but for 3D convolutions this is not the case.

The image (source: Yani Ioannou) diagrams a 2D convolution with two groups.

The table demonstrates slow 3D grouped convolutions for 1000 forward-backward convolution iterations of image sized 50x50(x50 for 3D) with kernel size 3 and 64 input and output channels.

Groups	2D	3D
1	1.68339s	126.51723s
64	1.49539s	509.46911s

We can see that the grouped 2D convolution is slightly quicker than the group-1 version, but in 3D the grouped convolution is many times slower than the group-1 version.

Tensor Comprehensions (TC)

Tensor comprehensions is a FacebookResearch library that "automatically synthesize[s] high-performance machine learning kernels". TC is integrated with pytorch so we can use it to create fast GPU kernels for pytorch modules - although most modules implemented by PyTorch will be faster than any automatically generated versions.

Installation

Build or install TC library first.
- This dockerfile, a slightly modified version of TC build dockerfile, is what I used to build the TC library for PyTorch 0.4.0.
Clone this repo git clone https://github.com/MattPainter01/Grouped3DConvPyTorch
Link to python by:
- Adding to your python path through suitable export PYTHONPATH=... command or
- adding a path configuration file (.pth) in site-packages or
- pip install git+https://github.com/MattPainter01/Grouped3DConvPyTorch.git

Usage

Usage simple:

from Group3DConvTC.tc_conv import Conv3DTC
g3d = Conv3DTC(...)
output = g3d(data)

If the from_cache flag is False then the TC will be tuned using default settings or those provided under the tuner_config keyword. If from_cache is True then a pre-tuned operation will be loaded from the file provided with the cache_file keyword.

Pretuned Operations

Unfortunatly tuned tensor comprehensions are machine specific and cannot be ported to other machines (as far as I know). In the same way they are also strongly parameter specific, so you will need to tune new TCs for different kernel sizes, input/output channels, groups, etc.

WARNING: Tuning this TC is very slow, takes a couple hours to train well on my machine.

Timings

The table shows the timings from tc_timings.py for 100 forward-backward iterations of a 50x50x50 image, 64 input and output channels with 64 groups and kernel size 3.

Method	Time
PyTorch	42.01592s
TC	12.33247s

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Group3DConvTC		Group3DConvTC
Profiling		Profiling
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
example.py		example.py
group_conv.svg		group_conv.svg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Motivation

Tensor Comprehensions (TC)

Installation

Usage

Pretuned Operations

Timings

About

Releases

Packages

Languages

License

MattPainter01/Grouped3DConvPyTorch

Folders and files

Latest commit

History

Repository files navigation

About

Motivation

Tensor Comprehensions (TC)

Installation

Usage

Pretuned Operations

Timings

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages