Added C++ and CUDA bindings to `tile_matmul` for 1.1.2 Release #66

coreylammie · 2021-07-07T23:13:10Z

Using an NVIDIA GeForce GTX 1080, a tile shape of (25, 25), and two tensors of size (500, 500), the runtime of tile_matmul without quantization support is reduced by 2.45x and 5.48x, for CPU-bound and GPU-bound operation, respectively. With an ADC resolution of 4 bits and an overflow rate of 0.0, the runtime of tile_matmul with quantization support is reduced by 2.30x and 105.27x, for CPU-bound and GPU-bound operation, respectively.

Implementation	Runtime Without Quantization Support (s)	Runtime With Quantization Support (s)
Pure Python (Previous)	6.917784	27.099764
C++ (CPU-bound)	2.822265	11.736974
CUDA (GPU-bound)	1.262861	0.2574267

…or min, at::Tensor max)

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

…uantization Support

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

…uantization Support

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

codecov · 2021-07-08T00:01:35Z

Codecov Report

Merging #66 (9bd3245) into master (4e13843) will decrease coverage by 3.53%.
The diff coverage is 29.32%.

@@            Coverage Diff             @@
##           master      #66      +/-   ##
==========================================
- Coverage   90.88%   87.34%   -3.54%     
==========================================
  Files          52       53       +1     
  Lines        1963     2023      +60     
==========================================
- Hits         1784     1767      -17     
- Misses        179      256      +77

Flag	Coverage Δ
unittests	`87.34% <29.32%> (-3.54%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
memtorch/mn/Conv1d.py	`98.76% <ø> (ø)`
memtorch/mn/Conv2d.py	`97.64% <ø> (ø)`
memtorch/mn/Conv3d.py	`97.64% <ø> (ø)`
memtorch/submodules/__init__.py	`100.00% <ø> (ø)`
profile_tile_matmul.py	`0.00% <0.00%> (ø)`
setup.py	`0.00% <0.00%> (ø)`
tests/test_cpp_extensions.py	`100.00% <ø> (+4.54%)`	⬆️
memtorch/bh/crossbar/Tile.py	`65.00% <25.00%> (-21.96%)`	⬇️
memtorch/map/Module.py	`96.66% <50.00%> (-3.34%)`	⬇️
memtorch/bh/Quantize.py	`91.30% <90.47%> (-0.70%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e13843...9bd3245. Read the comment docs.

coreylammie and others added 30 commits June 14, 2021 17:10

Homogenizing C++/CUDA Bindings

aed8bdb

Added linear (evenly-spaced) to quantize.cpp

d39a214

Verified functionality of log and tanh in quantize.cpp

ca32dce

Added binding for void quantize(at::Tensor tensor, int bits, at::Tens…

7fbe231

…or min, at::Tensor max)

Removed pytorch-playground submodule

c10eca4

Integrated memtorch_bindings with memtorch.bh.Quantize

78961b3

Added parse_min_max and replaced all CPU-bound quantization instances

5add0b7

Temporarily removed all test_cpp_extensions

82ad934

Update test_cpp_extensions.py

60bf71b

Added linear (evenly-spaced) to quantize.cpp

b966174

Removed pytorch-playground submodule

4f1ff5f

Refactored CppExtensions and Added tile_matmul Binding

7d1ed6a

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#51)

e7a71ad

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Added Minimal Working Implementation of time_matmul in C++

747062b

Refactored and Optimized tile_matmul.cpp

5619165

Further optimized tile_matmul.cpp

5d9e330

Implemented Minimal Working Test CUDA Kernel

c0bcb48

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#54)

48c1e8d

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Added CUDA Kernel Logic and Updated Version Number

f1d03a4

Added Eigen submodule

090eb03

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#56)

459e862

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Resolved Eigen C++ library in CUDA kernel

37d8382

Parsed tensor.data_ptr<float>() Slice to Eigen::Matrix

f2f5f77

Implemented Minimal Working Example Pre-CUDA Integration

81e7248

To Implement Kernel Indexing Logic

3a2bbc2

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#59)

4cbe66e

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Parsed mat_a_tiles_shape and mat_b_tiles_shape

ad753a1

Debugging C++ Routine

f8e5ce3

Updated Working C++ and CUDA Implementations of tile_matmul without Q…

aee2162

…uantization Support

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#61)

66cfab1

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

github-actions bot and others added 25 commits July 8, 2021 09:08

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#56)

35a23e5

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Resolved Eigen C++ library in CUDA kernel

eb5bbce

Parsed tensor.data_ptr<float>() Slice to Eigen::Matrix

34a7621

Implemented Minimal Working Example Pre-CUDA Integration

7be6ea1

To Implement Kernel Indexing Logic

95be136

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#59)

b629a92

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Parsed mat_a_tiles_shape and mat_b_tiles_shape

2b16b82

Debugging C++ Routine

d8d9e38

Updated Working C++ and CUDA Implementations of tile_matmul without Q…

8c9bf9c

…uantization Support

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#61)

7ab9740

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Added C++ Implementation of tile_matmul With Quantization Support

f549515

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#62)

d2bcf08

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Implemented CUDA-compatible Linear Quantization

ce7dc79

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#63)

cd76541

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Implemented tile_matmul with Linear Quantization Support

d1e8ce9

Fully Integrated C++ and CUDA matmul Bindings

2b57094

Added Legacy tile_matmul Support

6caec3a

🎨 Enforced Python/C++/CUDA Code Formatting with Black and Clang (#65)

c95ffe2

Co-authored-by: coreylammie <coreylammie@users.noreply.github.com>

Updated MANIFEST.in and setup.py

9938319

Added submodules to actions/checkout@v2 in build_release.yml

7b9f9af

Update MANIFEST.in

e350f92

Updated build_release torch Version

ec18ca7

Added tags-ignore Flag to push_pull.yml

3276dd7

Updated CHANGELOG.md

2de04e7

Merge branch 'CUDA' of https://github.com/coreylammie/MemTorch into CUDA

9bd3245

coreylammie merged commit 055e036 into master Jul 8, 2021

coreylammie deleted the CUDA branch July 8, 2021 00:05

coreylammie linked an issue Jul 8, 2021 that may be closed by this pull request

The patching procedure is too slow. #43

Closed

coreylammie mentioned this pull request Jul 8, 2021

The patching procedure is too slow. #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added C++ and CUDA bindings to `tile_matmul` for 1.1.2 Release #66

Added C++ and CUDA bindings to `tile_matmul` for 1.1.2 Release #66

coreylammie commented Jul 7, 2021

codecov bot commented Jul 8, 2021

Added C++ and CUDA bindings to tile_matmul for 1.1.2 Release #66

Added C++ and CUDA bindings to tile_matmul for 1.1.2 Release #66

Conversation

coreylammie commented Jul 7, 2021

codecov bot commented Jul 8, 2021

Codecov Report

Added C++ and CUDA bindings to `tile_matmul` for 1.1.2 Release #66

Added C++ and CUDA bindings to `tile_matmul` for 1.1.2 Release #66