-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] FP16 support #1413
[CUDA] FP16 support #1413
Conversation
src/codegen/opt/build_cuda_on.cc
Outdated
|
||
if (cudaHomePath != nullptr) { | ||
includeOption += cudaHomePath; | ||
includeOption += "/include"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work on Windows? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I overlooked that. I'll address it. thanks
Please add a test case on this, the test case needed to be guarded by if not gpu(0).exist or not have_fp16(gpu(0). compute_version):
return Let us enable fp16 by default when detecting fp16 is used in the code. Directly operating on fp16, while useful, may not be the most effective approach. We will also need to test vectorized load and vector operations (corresponds to half2 in CUDA) |
@tqchen Ok, I'm working on it. Thanks. |
src/codegen/opt/build_cuda_on.cc
Outdated
|
||
if (include_path) { | ||
std::string includeOption = "--include-path="; | ||
const char* cudaHomePath = std::getenv("CUDA_HOME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about defining CUDA_HOME as a preprocessor macro in the .cmake file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that cudaHomePath
is defined as a preprocessor macro at the time of building tvm? If so, I'm worried that its path depends strongly on the environment at the time of building tvm. For example, whether it will be a problem when distributing a pre-compiled tvm in the future. Please let me know your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, never mind.
Then, I'd suggest using CUDA_PATH instead of CUDA_HOME for the environment variable, and defaulting to /usr/local/cuda. I think it's better to be consistent with how python/tvm/contrib/nvcc.py finds the cuda path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kazum Thank you for the suggestion. I'll address it.
2728ebd
to
c7457c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some followup comments. @nishi-t can you also include a simple testcase that do vectorized add?
python/tvm/contrib/cuda.py
Outdated
@@ -0,0 +1,75 @@ | |||
"""Utilith for CUDA backend""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to nvcc.py
src/codegen/opt/build_cuda_on.cc
Outdated
@@ -43,6 +84,13 @@ std::string NVRTCCompile(const std::string& code) { | |||
ptx.resize(ptx_size); | |||
NVRTC_CALL(nvrtcGetPTX(prog, &ptx[0])); | |||
NVRTC_CALL(nvrtcDestroyProgram(&prog)); | |||
|
|||
if (include_path) { | |||
for (int i = 0; i < numCompileOptions; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use vector_style for variable names, Google Cstyle
src/codegen/opt/build_cuda_on.cc
Outdated
@@ -26,11 +28,50 @@ namespace codegen { | |||
} \ | |||
} | |||
|
|||
std::string NVRTCCompile(const std::string& code) { | |||
std::string NVRTCCompile(const std::string& code, bool include_path = false) { | |||
char *compileParams[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::string to store strings, avoid use malloc and free
python/tvm/contrib/cuda.py
Outdated
@@ -0,0 +1,75 @@ | |||
"""Utilith for CUDA backend""" | |||
|
|||
def parse_cc(compute_capability): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_compute_version
docker/Dockerfile.ci_gpu
Outdated
@@ -66,6 +66,7 @@ COPY install/ubuntu_install_redis.sh /install/ubuntu_install_redis.sh | |||
RUN bash /install/ubuntu_install_redis.sh | |||
|
|||
# Environment variables | |||
ENV CUDA_HOME=/usr/local/cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should not rely on CUDA_HOME, and allow some local search happening(in contrib.nvcc), which is aware of CUDA_HOME, but will also search for /usr/local/cuda by default
f5f264f
to
c7784d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates, I just have one minor comment and it is good to go
src/codegen/opt/build_cuda_on.cc
Outdated
@@ -26,11 +30,65 @@ namespace codegen { | |||
} \ | |||
} | |||
|
|||
std::string NVRTCCompile(const std::string& code) { | |||
|
|||
std::string find_cuda_include_path() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function in CamelCase in C++ FindCUDAIncludePath
python/tvm/contrib/nvcc.py
Outdated
major = int(split_ver[0]) | ||
minor = int(split_ver[1]) | ||
return major, minor | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using exceptions:
try:
major, minor = compute_version.split('.')
return int(major), int(minor)
except ValueError as err:
....
python/tvm/contrib/nvcc.py
Outdated
minor = int(split_ver[1]) | ||
return major, minor | ||
|
||
raise RuntimeError("the compute capability string is unsupported format: " + cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc
is not defined here.
np.testing.assert_allclose(c.asnumpy(), a.asnumpy() + 1) | ||
|
||
check_cuda("float32", 64, 2) | ||
if not tvm.gpu(0).exist or not have_fp16(tvm.gpu(0).compute_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check of tvm.gpu(0).exist
is common to fp32 and fp16. It should be moved into check_cuda().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final comments on cross platform handling
src/codegen/opt/build_cuda_on.cc
Outdated
} | ||
|
||
cuda_include_path = "/usr/local/cuda/include"; | ||
if (stat(cuda_include_path.c_str(), &st) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat function may not available in some of MSVC, consider only use stat query in linux and force user to set CUDA_PATH otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.f. https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx Sometimes the stat function is not available, and need to use _stat instead. Given the proposed path do not work for windows anyway, let us just skip this in windows
src/codegen/opt/build_cuda_on.cc
Outdated
@@ -5,14 +5,18 @@ | |||
* | |||
* \file build_cuda.cc | |||
*/ | |||
#include <sys/stat.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider only include sys/stat.h when linux is detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen Thank you for the comment. I addressed. Please review again. |
This PR changes NVRTCCompile to compile cuda code with cuda_fp16.h disscussed in #699. The cuda's fp16 operator is not supported yet in this PR. I'll work for the fp16 operator support later :)
For now, I confirmed that following code works. It must set the environment variable CUDA_HOME to location of cuda in order to run the following code.
@tqchen @masahi @merrymercy @abergeron Could you review and comment for this?