Skip to content

Maxpool bwd#750

Merged
zjing14 merged 27 commits into
developfrom
max-pool-bwd
Jun 19, 2023
Merged

Maxpool bwd#750
zjing14 merged 27 commits into
developfrom
max-pool-bwd

Conversation

@rocking5566
Copy link
Copy Markdown
Collaborator

@rocking5566 rocking5566 commented Jun 9, 2023

image

Maxpool backward is dx = dy[index], which is same as Put operation from

https://pytorch.org/docs/stable/generated/torch.Tensor.put_.html
https://numpy.org/doc/stable/reference/generated/numpy.put.html

Hence, I implemented Put kernel and uses it to implement Maxpool backward.
However, if each sliding windows will be overlap (eg. window size = 3, window stride = 1), we need to use atomicAdd() to add the gradient. In this case, Maxppol backward is dx += dy[index]
So, my Put kernel let the user to specify the memory operation different than the usually used setting the value.
MemOp(Dx, Dy, index)

However, fp16x1 and bf16 does not support atomicAdd. In this case, Put kernel will output fp32 and another casting kernel is used to cast fp32 to fp16 and bf16.

Note: This P.R assumes all the input and output tensors are in packed memory space.

@rocking5566 rocking5566 requested a review from qianfengz June 9, 2023 09:05
@rocking5566
Copy link
Copy Markdown
Collaborator Author

rocking5566 commented Jun 12, 2023

Similar to DeviceElementwiseImpl,
We need to get grid size until we solve #266
Hence, I hardcode the grid size for now.

Comment thread include/ck/tensor_operation/gpu/grid/gridwise_put_element_1d.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/device_index_pool_bwd.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated
Copy link
Copy Markdown
Contributor

@qianfengz qianfengz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For referene_pool_fwd.hpp and the Class name, suggest to include "nhwc", as the codes insides shows the supported layout is "NHWC"

Comment thread include/ck/tensor_operation/gpu/device/device_put_element.hpp
Comment thread include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated
Comment thread include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated
@qianfengz qianfengz requested a review from zjing14 June 16, 2023 03:45
Comment thread example/49_maxpool2d_bwd/maxpool2d_bwd_common.hpp Outdated
Comment thread example/49_maxpool2d_bwd/maxpool2d_bwd_common.hpp
qianfengz
qianfengz previously approved these changes Jun 16, 2023
@zjing14 zjing14 merged commit 341ad95 into develop Jun 19, 2023
@rocking5566 rocking5566 deleted the max-pool-bwd branch December 14, 2023 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants