[CUDA] FP16 support #1413

nishi-t · 2018-07-10T09:42:51Z

This PR changes NVRTCCompile to compile cuda code with cuda_fp16.h disscussed in #699. The cuda's fp16 operator is not supported yet in this PR. I'll work for the fp16 operator support later :)

For now, I confirmed that following code works. It must set the environment variable CUDA_HOME to location of cuda in order to run the following code.

import tvm
import numpy as np

n = tvm.var("n")
m = tvm.var("m")
mysum = tvm.comm_reducer(lambda x, y: x+y,
    lambda t: tvm.const(0, dtype=t), name="mysum")
A = tvm.placeholder((n, m), dtype="float16", name='A')
k = tvm.reduce_axis((0, m), name='k')
B = tvm.compute((n,), lambda i: mysum(A[i, k], axis=k), name='B')

s = tvm.create_schedule(B.op)
bx, tx = s[B].split(B.op.axis[0], factor=64)
s[B].bind(bx, tvm.thread_axis("blockIdx.x"))
s[B].bind(tx, tvm.thread_axis("threadIdx.x"))
fun = tvm.build(s, [A, B], "cuda")
print("build done")
print(fun.imported_modules[0].get_source())

ctx = tvm.context("cuda", 0)

a = tvm.nd.array(np.array([[1, 2], [3, 4]]).astype("float16"), ctx)
b = tvm.nd.array(np.zeros((2,), B.dtype), ctx)

fun(a, b)
print(b.asnumpy())

@tqchen @masahi @merrymercy @abergeron Could you review and comment for this?

masahi · 2018-07-10T10:01:37Z

src/codegen/opt/build_cuda_on.cc

+
+    if (cudaHomePath != nullptr) {
+      includeOption += cudaHomePath;
+      includeOption += "/include";


Does this work on Windows? I'm not sure.

Oh, I overlooked that. I'll address it. thanks

tqchen · 2018-07-10T15:35:58Z

Please add a test case on this, the test case needed to be guarded by

if not gpu(0).exist or not have_fp16(gpu(0). compute_version):
    return

Let us enable fp16 by default when detecting fp16 is used in the code. Directly operating on fp16, while useful, may not be the most effective approach. We will also need to test vectorized load and vector operations (corresponds to half2 in CUDA)

nishi-t · 2018-07-11T03:51:15Z

@tqchen Ok, I'm working on it. Thanks.

kazum · 2018-07-11T10:20:12Z

src/codegen/opt/build_cuda_on.cc

+
+  if (include_path) {
+    std::string includeOption = "--include-path=";
+    const char* cudaHomePath = std::getenv("CUDA_HOME");


How about defining CUDA_HOME as a preprocessor macro in the .cmake file?

Does it mean that cudaHomePath is defined as a preprocessor macro at the time of building tvm? If so, I'm worried that its path depends strongly on the environment at the time of building tvm. For example, whether it will be a problem when distributing a pre-compiled tvm in the future. Please let me know your opinion.

Okay, never mind.

Then, I'd suggest using CUDA_PATH instead of CUDA_HOME for the environment variable, and defaulting to /usr/local/cuda. I think it's better to be consistent with how python/tvm/contrib/nvcc.py finds the cuda path.

@kazum Thank you for the suggestion. I'll address it.

tqchen

I have made some followup comments. @nishi-t can you also include a simple testcase that do vectorized add?

tqchen · 2018-07-17T16:29:11Z

python/tvm/contrib/cuda.py

@@ -0,0 +1,75 @@
+"""Utilith for CUDA backend"""


move this to nvcc.py

tqchen · 2018-07-17T16:30:44Z

src/codegen/opt/build_cuda_on.cc

@@ -43,6 +84,13 @@ std::string NVRTCCompile(const std::string& code) {
  ptx.resize(ptx_size);
  NVRTC_CALL(nvrtcGetPTX(prog, &ptx[0]));
  NVRTC_CALL(nvrtcDestroyProgram(&prog));
+
+  if (include_path) {
+    for (int i = 0; i < numCompileOptions; i++) {


use vector_style for variable names, Google Cstyle

tqchen · 2018-07-17T16:31:12Z

src/codegen/opt/build_cuda_on.cc

@@ -26,11 +28,50 @@ namespace codegen {
    }                                                                   \
  }

-std::string NVRTCCompile(const std::string& code) {
+std::string NVRTCCompile(const std::string& code, bool include_path = false) {
+  char *compileParams[2];


use std::string to store strings, avoid use malloc and free

tqchen · 2018-07-17T16:31:34Z

python/tvm/contrib/cuda.py

@@ -0,0 +1,75 @@
+"""Utilith for CUDA backend"""
+
+def parse_cc(compute_capability):


parse_compute_version

tqchen · 2018-07-17T16:33:01Z

docker/Dockerfile.ci_gpu

@@ -66,6 +66,7 @@ COPY install/ubuntu_install_redis.sh /install/ubuntu_install_redis.sh
 RUN bash /install/ubuntu_install_redis.sh

 # Environment variables
+ENV CUDA_HOME=/usr/local/cuda


Ideally, we should not rely on CUDA_HOME, and allow some local search happening(in contrib.nvcc), which is aware of CUDA_HOME, but will also search for /usr/local/cuda by default

nishi-t · 2018-07-18T02:01:48Z

@tqchen Sorry for the delay and thank you for the comments. I'll address your and reviwer's comment, soon.
By the way, I thought I already added simple vectorized add testcase in here. If you have any problem, please let me know.

… fp32

…ame.

tqchen

thanks for the updates, I just have one minor comment and it is good to go

tqchen · 2018-07-18T15:57:33Z

src/codegen/opt/build_cuda_on.cc

@@ -26,11 +30,65 @@ namespace codegen {
    }                                                                   \
  }

-std::string NVRTCCompile(const std::string& code) {
+
+std::string find_cuda_include_path() {


Function in CamelCase in C++ FindCUDAIncludePath

tqchen · 2018-07-18T15:58:49Z

@kazum @masahi please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

kazum · 2018-07-18T20:08:52Z

python/tvm/contrib/nvcc.py

+        major = int(split_ver[0])
+        minor = int(split_ver[1])
+        return major, minor
+


I'd suggest using exceptions:

try: major, minor = compute_version.split('.') return int(major), int(minor) except ValueError as err: ....

kazum · 2018-07-18T20:09:10Z

python/tvm/contrib/nvcc.py

+        minor = int(split_ver[1])
+        return major, minor
+
+    raise RuntimeError("the compute capability string is unsupported format: " + cc)


cc is not defined here.

kazum · 2018-07-18T20:11:10Z

tests/python/unittest/test_codegen_cuda.py

+        np.testing.assert_allclose(c.asnumpy(), a.asnumpy() + 1)
+
+    check_cuda("float32", 64, 2)
+    if not tvm.gpu(0).exist or not have_fp16(tvm.gpu(0).compute_version):


The check of tvm.gpu(0).exist is common to fp32 and fp16. It should be moved into check_cuda().

nishi-t · 2018-07-19T07:07:18Z

@masahi Thank you for review.
@tqchen @kazum I addressed. please review again.

kazum

Looks good to me, thanks.

tqchen

Some final comments on cross platform handling

tqchen · 2018-07-19T17:06:43Z

src/codegen/opt/build_cuda_on.cc

+  }
+
+  cuda_include_path = "/usr/local/cuda/include";
+  if (stat(cuda_include_path.c_str(), &st) == 0) {


stat function may not available in some of MSVC, consider only use stat query in linux and force user to set CUDA_PATH otherwise

see https://github.com/dmlc/tvm/blob/master/src/runtime/threading_backend.cc#L14

c.f. https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx Sometimes the stat function is not available, and need to use _stat instead. Given the proposed path do not work for windows anyway, let us just skip this in windows

tqchen · 2018-07-19T17:14:02Z

src/codegen/opt/build_cuda_on.cc

@@ -5,14 +5,18 @@
 *
 * \file build_cuda.cc
 */
+#include <sys/stat.h>


consider only include sys/stat.h when linux is detected.

https://github.com/dmlc/tvm/blob/master/src/runtime/threading_backend.cc#L14

nishi-t · 2018-07-20T03:06:01Z

@tqchen Thank you for the comment. I addressed. Please review again.

tqchen · 2018-07-20T03:40:33Z

Thanks @nishi-t @kazum @masahi , this is now merged!

nishi-t changed the title ~~Add half type support for CUDA~~ Add half type support to CUDA Jul 10, 2018

masahi reviewed Jul 10, 2018

View reviewed changes

tqchen changed the title ~~Add half type support to CUDA~~ [CUDA] FP16 support Jul 10, 2018

tqchen added the status: review in progress label Jul 10, 2018

tqchen self-requested a review July 10, 2018 15:36

kazum requested changes Jul 11, 2018

View reviewed changes

nishi-t force-pushed the cuda_fp16 branch 2 times, most recently from 2728ebd to c7457c5 Compare July 12, 2018 08:47

tqchen added the status: need update need update based on feedbacks label Jul 17, 2018

tqchen requested changes Jul 17, 2018

View reviewed changes

nishi-t force-pushed the cuda_fp16 branch 5 times, most recently from f5f264f to c7784d0 Compare July 18, 2018 08:58

Tatsuya.Nishiyama added 10 commits July 18, 2018 20:43

Add half type support for CUDA

e2aed89

fix lint

5869129

fix passing compile option to compiler

e6444cc

Add mixed precision test for cuda

60eadfc

Change fp16 test to vectorized operation test including both fp16 and…

9696905

… fp32

Move utility functions for cuda to nvcc.py, and change the variable n…

00dd0df

…ame.

[WIP] remove malloc

2d81274

[WIP] fix handling string

5977740

[WIP] Add findding cuda include path

1a9b593

[WIP] delete setting CUDA_HOME from dockerfile and contrib/cuda.py

2e186ee

nishi-t force-pushed the cuda_fp16 branch from c7784d0 to 2e186ee Compare July 18, 2018 11:43

tqchen requested changes Jul 18, 2018

View reviewed changes

kazum requested changes Jul 18, 2018

View reviewed changes

nishi-t force-pushed the cuda_fp16 branch from 7f9b9bc to 4e44df9 Compare July 19, 2018 05:09

fix c++ function name and test case

45f836e

nishi-t force-pushed the cuda_fp16 branch from 4e44df9 to 45f836e Compare July 19, 2018 05:34

masahi approved these changes Jul 19, 2018

View reviewed changes

kazum approved these changes Jul 19, 2018

View reviewed changes

tqchen requested changes Jul 19, 2018

View reviewed changes

Tatsuya.Nishiyama added 2 commits July 20, 2018 10:55

fix include path search for cross platform handling

2a1fa00

fix lint

bbd1ec8

tqchen approved these changes Jul 20, 2018

View reviewed changes

tqchen merged commit 5f7b4d5 into apache:master Jul 20, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks status: review in progress labels Jul 20, 2018

tqchen mentioned this pull request Jul 20, 2018

Support FP16 in CUDA #699

Closed

tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018

[CUDA] FP16 support (apache#1413)

3040904

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[CUDA] FP16 support (apache#1413)

b1ba9cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] FP16 support #1413

[CUDA] FP16 support #1413

nishi-t commented Jul 10, 2018 •

edited

Loading

masahi Jul 10, 2018

nishi-t Jul 10, 2018

tqchen commented Jul 10, 2018

nishi-t commented Jul 11, 2018

kazum Jul 11, 2018

nishi-t Jul 11, 2018 •

edited

Loading

kazum Jul 11, 2018

nishi-t Jul 12, 2018

tqchen left a comment

tqchen Jul 17, 2018

tqchen Jul 17, 2018

tqchen Jul 17, 2018

tqchen Jul 17, 2018

tqchen Jul 17, 2018

nishi-t commented Jul 18, 2018 •

edited

Loading

tqchen left a comment

tqchen Jul 18, 2018

tqchen commented Jul 18, 2018

kazum Jul 18, 2018

kazum Jul 18, 2018

kazum Jul 18, 2018

nishi-t commented Jul 19, 2018

kazum left a comment

tqchen left a comment

tqchen Jul 19, 2018

tqchen Jul 19, 2018

tqchen Jul 19, 2018

tqchen Jul 19, 2018

tqchen Jul 19, 2018

nishi-t commented Jul 20, 2018

tqchen commented Jul 20, 2018

		@@ -0,0 +1,75 @@
		"""Utilith for CUDA backend"""

		def parse_cc(compute_capability):

[CUDA] FP16 support #1413

[CUDA] FP16 support #1413

Conversation

nishi-t commented Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 10, 2018

nishi-t commented Jul 11, 2018

Choose a reason for hiding this comment

nishi-t Jul 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nishi-t commented Jul 18, 2018 • edited Loading

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nishi-t commented Jul 19, 2018

kazum left a comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nishi-t commented Jul 20, 2018

tqchen commented Jul 20, 2018

nishi-t commented Jul 10, 2018 •

edited

Loading

nishi-t Jul 11, 2018 •

edited

Loading

nishi-t commented Jul 18, 2018 •

edited

Loading