There is bug in concat CUDA kernel. #11540

qingqing01 · 2018-06-18T11:13:38Z

Add unit test in python/paddle/fluid/tests/unittests/test_concat_op.py to reproduce the bug:

class TestConcatOp3(TestConcatOp):
    def init_test_data(self):
        self.x0 = np.random.random((1, 256, 170, 256)).astype('float32')
        self.x1 = np.random.random((1, 128, 170, 256)).astype('float32')
        self.x2 = np.random.random((1, 128, 170, 256)).astype('float32')
        self.axis = 1
    def test_check_grad(self):
        pass

The error is:

220: terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
220:   what():  cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/Paddle/paddle/fluid/memory/detail/system_allocator.cc:130]
220: PaddlePaddle Call Stacks:
220: 0       0x7fb228be5f9cp paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 572
220: 1       0x7fb229d1aec8p paddle::memory::detail::GPUAllocator::Free(void*, unsigned long, unsigned long) + 328
220: 2       0x7fb229d178d7p paddle::memory::detail::BuddyAllocator::Free(void*) + 1191
220: 3       0x7fb229c3468bp paddle::framework::Tensor::PlaceholderImpl<paddle::platform::CUDAPlace>::~PlaceholderImpl() + 43
220: 4       0x7fb229aa3139p paddle::framework::Vector<int>::~Vector() + 217
220: 5       0x7fb229aa7f94p paddle::operators::math::ConcatFunctor<paddle::platform::CUDADeviceContext, float>::operator()(paddle::platform::CUDADeviceContext const&, std::vector<paddle::framework::Tensor, std::allocator<paddle::framework::Tensor> > const&, int, paddle::framework::Tensor*) + 2916
220: 6       0x7fb22987a2dep paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 958

The text was updated successfully, but these errors were encountered:

qingqing01 mentioned this issue Jun 18, 2018

Make the CUDA kernel of concat correct and fix unit tests. #11541

Merged

qingqing01 added the Bug label Jun 18, 2018

qingqing01 added this to In progress in Computer Vision: Face Detection Model Jun 18, 2018

qingqing01 closed this as completed in #11541 Jun 19, 2018

Computer Vision: Face Detection Model automation moved this from In progress to Done Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is bug in concat CUDA kernel. #11540

There is bug in concat CUDA kernel. #11540

qingqing01 commented Jun 18, 2018 •

edited

Loading

There is bug in concat CUDA kernel. #11540

There is bug in concat CUDA kernel. #11540

Comments

qingqing01 commented Jun 18, 2018 • edited Loading

qingqing01 commented Jun 18, 2018 •

edited

Loading