Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds nvcc as a RawKernel backend #1941

Merged
merged 24 commits into from Oct 13, 2019
Merged

Conversation

sjperkins
Copy link
Contributor

Adds nvcc as a backend for RawKernel (issue #1928). The nvcc.py is a cut and paste of much of the functionality in cupy_setup_build.py and install/*.py -- I've avoided refactoring this at the moment as I'd like to invite input as to whether you think this PR is useful.

@sjperkins sjperkins changed the title Adds nvcc as a RawKernel backend [WIP] Adds nvcc as a RawKernel backend Jan 14, 2019
@niboshi niboshi self-assigned this Jan 15, 2019
cupy/core/raw.pyx Outdated Show resolved Hide resolved
@sjperkins sjperkins changed the title [WIP] Adds nvcc as a RawKernel backend Adds nvcc as a RawKernel backend Jan 18, 2019
@okuta okuta added the cat:feature New features/APIs label Feb 17, 2019
@niboshi
Copy link
Member

niboshi commented Feb 28, 2019

Thank you for the PR and I'm sorry for late review.

Does it have to rely on distutils? How about implementing a class with the same interface as _NVRTCProgram (cupy/cuda/compiler.py)?

@kmaehashi
Copy link
Member

I discussed with @okuta and agreed to support NVCC in RawKernel.

cupy_setup_build.py and install/*.py has lots of clutters. Originally CuPy was using NVCC, so I think it's better to borrow some code from here, instead of using above files: https://github.com/cupy/cupy/blob/v1/cupy/cuda/compiler.py

@sjperkins
Copy link
Contributor Author

Thanks for the suggestions. I'll try find some time to incorporate them in the next while.

@sjperkins
Copy link
Contributor Author

I think this is now almost ready. One thing I'm still not quite sure how to handle with the nvcc route is the _preprocess call. compile_with_nvcc directly produces a cubin (a bytes type) while compile_with_nvrtc produces unicode from which a bytes type cubin is produced. I haven't succeeded in plugging the output of compile_using_nvcc into the preprocessing code, but this could be because I'm not experienced with unicode. I've tried running a .decode('utf-8') on the cubin but this fails on translation.

Any suggestions on how to handle this?

Copy link
Member

@niboshi niboshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review. LGTM except this comment.

cupy/core/carray.pxi Outdated Show resolved Hide resolved
leofang added a commit to leofang/cupy that referenced this pull request Apr 22, 2019
@niboshi
Copy link
Member

niboshi commented May 7, 2019

Jenkins, test this please

@niboshi niboshi added the st:test-and-merge (deprecated) Ready to merge after test pass. label May 7, 2019
@chainer-ci
Copy link
Member

Jenkins CI test (for commit a63b2a8, target branch master) failed with status FAILURE.

@sjperkins
Copy link
Contributor Author

Will take a look at the test case failures.

@niboshi
Copy link
Member

niboshi commented May 7, 2019

Those failures in TEST=chainer-py3/py35 (such as TestBias) are irrelevant. Please ignore them. I'm sorry for inconvenience.

Those in TEST=cupy-py??? seem relevant.

@sjperkins
Copy link
Contributor Author

sjperkins commented May 9, 2019

I think this is now almost ready. One thing I'm still not quite sure how to handle with the nvcc route is the _preprocess call. compile_with_nvcc directly produces a cubin (a bytes type) while compile_with_nvrtc produces unicode from which a bytes type cubin is produced. I haven't succeeded in plugging the output of compile_using_nvcc into the preprocessing code, but this could be because I'm not experienced with unicode. I've tried running a .decode('utf-8') on the cubin but this fails on translation.

OK so the test cases are failing because of a problem that I ran into previously. It doesn't seem possible to decode the test case cubin bytes produced by nvcc into unicode, as there's some byte that causes a failure in translation. So the following

with codecs.open(cubin_path, 'rb', encoding='utf-8') as bin_file:
    return bin_file.read()

produces:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position 18: invalid start byte

Without this conversion, it's not possible for the cubin to interact with other unicode variables in compiler.py.

Could someone with more unicode experience advise on how to handle this situation?

@leofang
Copy link
Member

leofang commented May 9, 2019

I am not familiar with unicode either, but does the follow piece of code help? It describes how a unicode object is generated from the ptx:

cupy/cupy/cuda/nvrtc.pyx

Lines 111 to 127 in f85374d

cpdef unicode getPTX(size_t prog):
cdef size_t ptxSizeRet
cdef bytes ptx
cdef char* ptx_ptr
with nogil:
status = nvrtcGetPTXSize(<Program>prog, &ptxSizeRet)
check_status(status)
ptx = b' ' * ptxSizeRet
ptx_ptr = ptx
with nogil:
status = nvrtcGetPTX(<Program>prog, ptx_ptr)
check_status(status)
# Strip the trailing NULL.
assert ptx.endswith(b'\x00')
ptx = ptx[:-1]
return ptx.decode('UTF-8')

@sjperkins
Copy link
Contributor Author

sjperkins commented May 9, 2019

I am not familiar with unicode either, but does the follow piece of code help? It describes how a unicode object is generated from the ptx:

Thanks for the suggestion of removing the trailing null, but unfortunately it still fails on the byte at position 18.

@sjperkins
Copy link
Contributor Author

Having thought about this a bit more, I think the issue is that compile_using_nvrtc returns text PTX assembly, while compile_using_nvcc produces a binary cubin. I probably need to modify the nvcc command within _preprocess to just return the PTX... Let me see what I can do.

@sjperkins
Copy link
Contributor Author

OK, got it working.

cupy/cuda/compiler.py Outdated Show resolved Hide resolved
@niboshi
Copy link
Member

niboshi commented May 10, 2019

Thanks.
Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5b205f3, target branch master) failed with status FAILURE.

@niboshi
Copy link
Member

niboshi commented May 10, 2019

The failure seems relevant.
Could you check again?

@niboshi niboshi removed the st:test-and-merge (deprecated) Ready to merge after test pass. label May 10, 2019
@sjperkins
Copy link
Contributor Author

@niboshi I've been taking a look at the logs. Somewhat difficult to diagnose as it's failing on python2 but succeeding on python3. It also succeeds on python2 on my machine.

The error is:

CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

This seems to be a somewhat broad range of errors. For example:

  1. It may be that the LD_LIBRARY_PATH needs to be set up to point at appropriate CUDA and NVIDIA drivers on the test machine. See NVRTC_ERROR: device kernel image is invalid fireice-uk/xmr-stak#2319 (comment) for instance.
  2. I see there's also a stackoverflow question for chainer on this issue. Are there multiple GPUs on the test system, for instance?

@niboshi
Copy link
Member

niboshi commented Oct 1, 2019

OK.
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit d4ff126:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit d4ff126, target branch master) failed with status FAILURE.

@leofang
Copy link
Member

leofang commented Oct 2, 2019

@niboshi Looks good, thanks. The two build test errors are to be fixed in #2514.

@leofang
Copy link
Member

leofang commented Oct 2, 2019

In 54ad318 I fixed TypeError: Expected unicode, got str observed in Jenkins.

Jenkins confirmed this fix btw.

@leofang

This comment has been minimized.

@leofang
Copy link
Member

leofang commented Oct 5, 2019

@niboshi I did four things:

  1. First revert back to d4ff126 as the -cubin flag was already added
  2. Enforce the correct backend selection in the tests
  3. Clear the preprocessing cache in tearDown()
  4. Add backend name to CompileException

I suspect there're some subtle cubin shadowing in the PY2 failure. Not sure if the changes 2 & 3 would fix it, but they're necessary anyway.

Could you kick off Jenkins again for me please? Sorry for keeping bugging you. Thanks.

@niboshi
Copy link
Member

niboshi commented Oct 10, 2019

chainer/chainer-test#523 has been merged.

@leofang Sure, sorry for delay.
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 45534ae:

@niboshi niboshi removed the st:blocked-by-another-pr Blocked by another pull-request label Oct 10, 2019
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 45534ae, target branch master) succeeded!

@leofang
Copy link
Member

leofang commented Oct 11, 2019

Ah, too bad PY2 is disabled! I'd love to know where it was fixed...Anyway, thanks for moving forward @niboshi.

Copy link
Member

@niboshi niboshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM except some cosmetics.

cupy/cuda/compiler.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_raw.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_raw.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_raw.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_raw.py Outdated Show resolved Hide resolved
Co-Authored-By: niboshi <niboshi000@gmail.com>
@leofang
Copy link
Member

leofang commented Oct 11, 2019

Thank you, @niboshi. Just committed.

@sjperkins
Copy link
Contributor Author

@leofang Thanks for taking over the PR!

Copy link
Member

@niboshi niboshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@niboshi
Copy link
Member

niboshi commented Oct 13, 2019

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit a3f1703:

@niboshi niboshi added this to the v7.0.0rc1 milestone Oct 13, 2019
@niboshi niboshi added the st:test-and-merge (deprecated) Ready to merge after test pass. label Oct 13, 2019
@chainer-ci
Copy link
Member

Jenkins CI test (for commit a3f1703, target branch master) succeeded!

@niboshi niboshi merged commit 8bb5efc into cupy:master Oct 13, 2019
@leofang
Copy link
Member

leofang commented Oct 13, 2019

Thanks to @sjperkins for laying the groundwork and to @niboshi for your help and patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature New features/APIs st:test-and-merge (deprecated) Ready to merge after test pass.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants