Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cublasGemmEx in tensordot_core when CUDA11 #3719

Merged
merged 8 commits into from
Aug 17, 2020

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Aug 4, 2020

This PR modifies to use cublasGemmEx, an extension of cublas<t>gemm, as the matrix multiply backend to be called in tensordot_core in case of CUDA11. cublasGemmEx is flexible, allowing users to specify the data types for each of the matrices A, B, C, the precision of computation and the matrix multiply algorithm to be used.
https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx

This is a kind of preparatory PR and the following PR will allow the use of TF32 (TensorFloat32) as the compute precision of the matrix multiply.

This is related to #3602

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a very quick 1st pass and left a few comments/questions.

Comment on lines 2896 to 2901
cdef struct cuComplex:
float x, y


cdef struct cuDoubleComplex:
double x, y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do this at the top instead, for consistency?

cdef extern from '../cupy_cuComplex.h':
ctypedef struct cuComplex 'cuComplex':
float x, y
ctypedef struct cuDoubleComplex 'cuDoubleComplex':
double x, y

double x, y


cpdef ndarray tensordot_core_v11(
Copy link
Member

@leofang leofang Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a lot of boilerplate code in this new function overlaps with its predecessor tensordot_core(), at least for input/output preparation. Can we defer the code splitting point to later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, code duplication is a concern of mine as well 😓
Since cublasGemmEx allows you to select a different data type for output matrix C than the data type of input matrices A and B, I was thinking of using this to reduce amount of copy after gemm (this was not implemented yet). That's why I was branching out early, but there aren't that many opportunities for copy reduction, so I'm going to prioritize reducing code duplication first.

Comment on lines 2929 to 2934
if m == 1 and n == 1:
_tensordot_core_mul_sum(
a.ravel(), b.ravel(), _manipulation._reshape(out, ()))
if out is not ret:
elementwise_copy(out, ret)
return ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example of code duplication mentioned above, note that #3678 is fixing this part, so if duplication is not avoided as much as possible, we'd need to fix it twice 😅

return ret


cdef int _get_cuda_dtype(ndarray a):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Need to propagate exception if it's raised
  2. Compare char directly
Suggested change
cdef int _get_cuda_dtype(ndarray a):
cdef int _get_cuda_dtype(ndarray a) except -1:
cdef str a_type = a.dtype.char

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this function should go to cupy/core/_dtype.pyx instead...?

Copy link
Member

@leofang leofang Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think a little refactoring would be great: note that the very same function is also needed in cuSPARSE and cuTENSOR, for example:

cupy/cupy/cusparse.py

Lines 70 to 80 in ca79633

def _dtype_to_DataType(dtype):
if dtype == 'f':
return runtime.CUDA_R_32F
elif dtype == 'd':
return runtime.CUDA_R_64F
elif dtype == 'F':
return runtime.CUDA_C_32F
elif dtype == 'D':
return runtime.CUDA_C_64F
else:
raise TypeError

cupy/cupy/cutensor.py

Lines 44 to 56 in 8299e83

def get_cuda_dtype(numpy_dtype):
if numpy_dtype == numpy.float16:
return runtime.CUDA_R_16F
elif numpy_dtype == numpy.float32:
return runtime.CUDA_R_32F
elif numpy_dtype == numpy.float64:
return runtime.CUDA_R_64F
elif numpy_dtype == numpy.complex64:
return runtime.CUDA_C_32F
elif numpy_dtype == numpy.complex128:
return runtime.CUDA_C_64F
else:
raise TypeError('Dtype {} is not supported'.format(numpy_dtype))

How about modifying the signature like this:

cdef int _dtype_to_cuda_type(dtype, bint is_half_allowed=False) except -1

and reuse it everywhere in the codebase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I also think it's better to reuse a function that convert from numpy data types to CUDA data types. I'd like to see the following implementation, what would you think on this?

cpdef int dtype_to_cuda_dtype(dtype_char, available_dtype_char=None) except -1:
    if available_dtype_char is None:
        available_dtype_char = 'fdFD'
    if dtype_char not in available_dtype_char:
        raise TypeError('dtype is not available: %s' % str(dtype_char))
    if dtype_char == 'e':
        return runtime.CUDA_R_16F
    elif dtype_char == 'f':
        return runtime.CUDA_R_32F
    elif dtype_char == 'd':
        return runtime.CUDA_R_64F
    elif dtype_char == 'F':
        return runtime.CUDA_C_32F
    elif dtype_char == 'D':
        return runtime.CUDA_C_64F
    else:
        raise TypeError('dtype is not supported: %s' % str(dtype_char))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anaruse My preference is to keep the NumPy dtype as input, because when raising an error it offers a better description than a single char. Also, we can avoid double comparison (your first not in and then the if's). Last, available_dtype_char is useless because the if branches are limited.

I think this could be simpler:

cpdef int dtype_to_cuda_dtype(dtype, bint is_half_allowed=False) except -1:
    cdef str dtype_char = dtype.char

    if dtype_char == 'e' and is_half_allowed:
        return runtime.CUDA_R_16F
    elif dtype_char == 'f':
        return runtime.CUDA_R_32F
    elif dtype_char == 'd':
        return runtime.CUDA_R_64F
    elif dtype_char == 'F':
        return runtime.CUDA_C_32F
    elif dtype_char == 'D':
        return runtime.CUDA_C_64F
    else:
        raise TypeError('dtype is not supported: {}'.format(dtype))

@anaruse
Copy link
Contributor Author

anaruse commented Aug 5, 2020

Thank you for your comment, @leofang ! I've updated the branch based on your comment. Could you take a look when you have time?

@@ -1,2 +1,3 @@
cpdef get_dtype(t)
cpdef tuple get_dtype_with_itemsize(t)
cpdef int dtype_to_cuda_dtype(dtype, bint is_half_allowed=?) except -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is the correct syntax? (See the Cython doc)

Suggested change
cpdef int dtype_to_cuda_dtype(dtype, bint is_half_allowed=?) except -1
cpdef int dtype_to_cuda_dtype(dtype, bint is_half_allowed=*) except -1

@@ -2856,14 +2866,15 @@ cpdef ndarray tensordot_core(
b.data.ptr, runtime.CUDA_R_16F, <int>ldb,
a.data.ptr, runtime.CUDA_R_16F, <int>lda,
<size_t>&zero_fp32,
c.data.ptr, Ctype, <int>m,
c.data.ptr, runtime.CUDA_R_16F, <int>m,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure c is of type float16 at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I checked the original code, and if dtype of matrix a and b is float16, then the dtype of matrix c will be always float16.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

runtime.CUDA_R_16F, <int>lda, 0, c.data.ptr, Ctype, <int>m)
b.data.ptr, runtime.CUDA_R_16F, <int>ldb,
a.data.ptr, runtime.CUDA_R_16F, <int>lda, 0,
c.data.ptr, runtime.CUDA_R_16F, <int>m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 2918 to 2919
compute_capability = int(device.get_compute_capability())
algo = cublas.CUBLAS_GEMM_DEFAULT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compute_capability = int(device.get_compute_capability())
algo = cublas.CUBLAS_GEMM_DEFAULT
cdef int compute_capability = int(device.get_compute_capability())
cdef int algo = cublas.CUBLAS_GEMM_DEFAULT

cdef double one_d, zero_d
cdef cuComplex one_F, zero_F
cdef cuDoubleComplex one_D, zero_D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef int compute_type

Comment on lines 2924 to 2926
a_cuda_dtype = dtype_to_cuda_dtype(a.dtype, is_half_allowed=True)
b_cuda_dtype = dtype_to_cuda_dtype(b.dtype, is_half_allowed=True)
c_cuda_dtype = dtype_to_cuda_dtype(c.dtype, is_half_allowed=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a_cuda_dtype = dtype_to_cuda_dtype(a.dtype, is_half_allowed=True)
b_cuda_dtype = dtype_to_cuda_dtype(b.dtype, is_half_allowed=True)
c_cuda_dtype = dtype_to_cuda_dtype(c.dtype, is_half_allowed=True)
cdef int a_cuda_dtype = dtype_to_cuda_dtype(a.dtype, is_half_allowed=True)
cdef int b_cuda_dtype = dtype_to_cuda_dtype(b.dtype, is_half_allowed=True)
cdef int c_cuda_dtype = dtype_to_cuda_dtype(c.dtype, is_half_allowed=True)

cdef cuDoubleComplex one_D, zero_D

if c.dtype.char in 'efF':
compute_type = cublas.CUBLAS_COMPUTE_32F
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we wanna use CUBLAS_COMPUTE_16F for half precision? Does it not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use CUBLAS_COMPUTE_16F, but I didn't use it here for a few reasons.

Performance: On a GPU with TensorCore, if the data types of matrix a, b and c are half precision, there is little difference in performance of matrix multiply between using CUBLAS_COMPUTE_16F and CUBLAS_COMPUTE_32F as the compute type.

Accuracy: If CUBLAS_COMPUTE_32F is used as the compute type, the accumulation in the matrix multiply is performance in float precision, which reduces the rounding-error accumulation compared to using CUBLAS_COMPUTE_16F, resulting in more accurate results.

Code maintenance: If you specify CUBLAS_COMPUTE_16F as the compute type, then the parameters alpha and beta of cublasGemmEx must be pointers of half dtype. However, half is not a 1st citizen in Cython, requiring a bit complicated code. I prefer to keep the source code simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @anaruse! It makes perfect sense 👍

Comment on lines 2929 to 2948
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one_f = 1
zero_f = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_f,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_f, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one_d = 1
zero_d = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_d,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_d, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like they can be combined?!

Suggested change
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one_f = 1
zero_f = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_f,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_f, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one_d = 1
zero_d = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_d,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_d, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
if compute_type in (cublas.CUBLAS_COMPUTE_32F, cublas.CUBLAS_COMPUTE_64F):
one = 1
zero = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb
<size_t>&zero, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to be able to do so, but the dtypes of the parameter alpha and beta of cublasGemmEx (in this case, one and zero) have to be float pointer when compute type is COMPUTE_32F and double pointer when compute type is COMPUTE_64F.
https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following implementation?

if compute_type == cublas.CUBLAS_COMPUTE_32F:
    one_f = 1
    zero_f = 0
    one_ptr = <size_t>&one_f
    zero_ptr = <size_t>&zero_f
elif compute_type == cublas.CUBLAS_COMPUTE_64F):
    ...
else:
    ...
cublas.gemmEx(
    handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
    one_ptr, a.data.ptr, a_cuda_dtype, <int>lda,
    b.data.ptr, b_cuda_dtype, <int>ldb                  
    zero_ptr, c.data.ptr, c_cuda_dtype, <int>ldc,
    compute_type, algo)n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, certainly, there is the way to do it. Thanks @asi1024 !

Comment on lines 2952 to 2973
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one_F = cuComplex(1, 0)
zero_F = cuComplex(0, 0)
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_F,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_F, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one_D = cuDoubleComplex(1, 0)
zero_D = cuDoubleComplex(0, 0)
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_D,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_D, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
else:
raise ValueError('Invalid compute type: {}'.format(compute_type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, something like

Suggested change
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one_F = cuComplex(1, 0)
zero_F = cuComplex(0, 0)
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_F,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_F, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one_D = cuDoubleComplex(1, 0)
zero_D = cuDoubleComplex(0, 0)
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_D,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_D, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
else:
raise ValueError('Invalid compute type: {}'.format(compute_type))
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one = cuComplex(1, 0)
zero = cuComplex(0, 0)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one = cuDoubleComplex(1, 0)
zero = cuDoubleComplex(1, 0)
else:
raise ValueError('Invalid compute type: {}'.format(compute_type))
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reasons as above, I'm afraid, we cannot do this either..

<void*>C, <runtime.DataType>Ctype, ldc,
<runtime.DataType>computeType, <GemmAlgo>algo)
if computeType >= CUBLAS_COMPUTE_16F:
status = cublasGemmEx_v11(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: It seems there's a C++ overloaded version of cublasGemmEx that supports the old cudaDataType?https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx
I wonder if using that could help, or is it to be deprecated soon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, even with cublasGemmEx of CUDA11, you can still specify the compute type with cudaDataType, as long as you're in C++. However, in the old way, you cannot specify, for example, TF32 (TensorFloat32) as a compute type, so you need to specify the compute type with cublasComputeType, which is added in CUDA11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, so this is why we need the new interface...

@asi1024 asi1024 added the cat:enhancement Improvements to existing features label Aug 7, 2020
Copy link
Member

@asi1024 asi1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will split tensordot_core and tensordot_core_v11 into another file _routines_linalg.pyx after the merge of this PR.

cupy/core/_dtype.pyx Outdated Show resolved Hide resolved
Comment on lines 2929 to 2948
if compute_type == cublas.CUBLAS_COMPUTE_32F:
one_f = 1
zero_f = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_f,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_f, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
elif compute_type == cublas.CUBLAS_COMPUTE_64F:
one_d = 1
zero_d = 0
cublas.gemmEx(
handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
<size_t>&one_d,
a.data.ptr, a_cuda_dtype, <int>lda,
b.data.ptr, b_cuda_dtype, <int>ldb,
<size_t>&zero_d, c.data.ptr, c_cuda_dtype, <int>ldc,
compute_type, algo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following implementation?

if compute_type == cublas.CUBLAS_COMPUTE_32F:
    one_f = 1
    zero_f = 0
    one_ptr = <size_t>&one_f
    zero_ptr = <size_t>&zero_f
elif compute_type == cublas.CUBLAS_COMPUTE_64F):
    ...
else:
    ...
cublas.gemmEx(
    handle, <int>transa, <int>transb, <int>m, <int>n, <int>k,
    one_ptr, a.data.ptr, a_cuda_dtype, <int>lda,
    b.data.ptr, b_cuda_dtype, <int>ldb                  
    zero_ptr, c.data.ptr, c_cuda_dtype, <int>ldc,
    compute_type, algo)n

@asi1024
Copy link
Member

asi1024 commented Aug 8, 2020

Jenkins, test this please.

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 752de1b:

@asi1024 asi1024 added this to the v8.0.0rc1 milestone Aug 8, 2020
@leofang
Copy link
Member

leofang commented Aug 8, 2020

@asi1024 @takagi @kmaehashi I think Jenkins is dead since yesterday.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

if use_sgemmEx:
Ctype = runtime.CUDA_R_16F if c.dtype == 'e' else runtime.CUDA_R_32F

global _cuda_runtime_version
Copy link
Member

@leofang leofang Aug 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think we no longer need to check this as we're on CUDA 9.0+ starting CuPy v8! I will send a PR to remove it from a few places, but for the ease of backport let's keep it here.

@@ -2856,14 +2866,15 @@ cpdef ndarray tensordot_core(
b.data.ptr, runtime.CUDA_R_16F, <int>ldb,
a.data.ptr, runtime.CUDA_R_16F, <int>lda,
<size_t>&zero_fp32,
c.data.ptr, Ctype, <int>m,
c.data.ptr, runtime.CUDA_R_16F, <int>m,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@leofang
Copy link
Member

leofang commented Aug 9, 2020

Jenkins, test this please.

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 752de1b:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 752de1b, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Aug 9, 2020

I will retrigger CI after chainer/chainer-test#593 is merged.

@leofang leofang mentioned this pull request Aug 11, 2020
@asi1024
Copy link
Member

asi1024 commented Aug 17, 2020

Jenkins, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 752de1b, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Aug 17, 2020

LGTM!

@asi1024 asi1024 merged commit 1713823 into cupy:master Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants