Skip to content

fix CUDA_ERROR_ILLEGAL_ADDRESS bug #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

1374839016
Copy link

I fix the memory access bug, which describe here #55 . I force cupy allocate memory on pytorch device.

fix CUDA_ERROR_ILLEGAL_ADDRESS
@sniklaus
Copy link
Owner

Huge thanks for bringing this up!

Could you provide some more technical details on how this makes a difference? Currently, all the involved tensors will be on the same device as the first input as per:

rbot0 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
rbot1 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
one = one.contiguous(); assert(one.is_cuda == True)
two = two.contiguous(); assert(two.is_cuda == True)
output = one.new_zeros([ one.shape[0], 81, one.shape[2], one.shape[3] ])

I am hence a little bit confused on what the proposed fix would change. 🤔

@1374839016
Copy link
Author

Sorry, I don't know, but I guess the code allocate shared memory on default device(GPU 0).

cupy_launch('kernel_Correlation_updateOutput', cupy_kernel('kernel_Correlation_updateOutput', {
    'rbot0': rbot0,
    'rbot1': rbot1,
    'top': output
}))(
    grid=tuple([ output.shape[3], output.shape[2], output.shape[0] ]),
    block=tuple([ 32, 1, 1 ]),
    shared_mem=one.shape[1] * 4,
    args=[ cupy.int32(n), rbot0.data_ptr(), rbot1.data_ptr(), output.data_ptr() ]
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants