Maxpool bwd#750

Merged

zjing14 merged 27 commits into

Jun 19, 2023

Collaborator

rocking5566 commented Jun 9, 2023 •

edited by qianfengz

Loading

Maxpool backward is dx = dy[index], which is same as Put operation from

https://pytorch.org/docs/stable/generated/torch.Tensor.put_.html
https://numpy.org/doc/stable/reference/generated/numpy.put.html

Hence, I implemented Put kernel and uses it to implement Maxpool backward.
However, if each sliding windows will be overlap (eg. window size = 3, window stride = 1), we need to use atomicAdd() to add the gradient. In this case, Maxppol backward is dx += dy[index]
So, my Put kernel let the user to specify the memory operation different than the usually used setting the value.
MemOp(Dx, Dy, index)

However, fp16x1 and bf16 does not support atomicAdd. In this case, Put kernel will output fp32 and another casting kernel is used to cast fp32 to fp16 and bf16.

Note: This P.R assumes all the input and output tensors are in packed memory space.

rocking5566 added 11 commits

June 2, 2023 13:51


          Add maxpool f32 kernel and example

1d56022


          Revise copyright

afe8dae


          Add device pool bwd device op

acd980f


          Support f16 and bf16

7f09b8a


          Add compute datatype for reference code.

85b2fee

Prevent error in bf16


          Fix type error

1737b08


          Remove layout

1a1059a


          Fix bf16 error

81caa44


          Add f16 and bf16 example

b3bc666


          Add more operations

e279b40


          Implement IsSupportedArgument

cf9114f

rocking5566 requested a review from qianfengz

June 9, 2023 09:05

rocking5566 and others added 3 commits

June 9, 2023 05:10


          Add changelog

3ef4dc7


          Merge branch 'develop' into max-pool-bwd

8c2c111


          Merge branch 'max-pool-bwd' of github.com:ROCmSoftwarePlatform/compos…

e9ca131

…able_kernel into max-pool-bwd

Collaborator Author

rocking5566 commented Jun 12, 2023 •

edited

Loading

Similar to DeviceElementwiseImpl,
We need to get grid size until we solve #266
Hence, I hardcode the grid size for now.


          Add comment

4726df7

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/grid/gridwise_put_element_1d.hpp Outdated

rocking5566 added 2 commits

June 15, 2023 07:42


          Add comment

9e7cca9


          Remove useless header

4f1dbdf

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp


          Move initialize of workspace to the run

a2598b8

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/device_index_pool_bwd.hpp Outdated


          Move set din zero to the device operator

283f9b6

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated

include/ck/tensor_operation/gpu/device/impl/device_index_pool_bwd_impl.hpp Outdated

rocking5566 and others added 3 commits

June 15, 2023 08:18


          Save din_length_raw

47ac376


          Remove useless header

508d7a5


          Merge branch 'develop' into max-pool-bwd

3550cef


          Calculate gridsize according to the number of CU

38962b9

qianfengz reviewed

View reviewed changes

Contributor

qianfengz left a comment

For referene_pool_fwd.hpp and the Class name, suggest to include "nhwc", as the codes insides shows the supported layout is "NHWC"

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp

include/ck/tensor_operation/gpu/device/device_put_element.hpp

include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated

qianfengz reviewed

View reviewed changes

include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated

include/ck/tensor_operation/gpu/device/impl/device_put_element_impl.hpp Outdated

qianfengz reviewed

View reviewed changes

library/include/ck/library/reference_tensor_operation/cpu/reference_maxpool_bwd.hpp Outdated

library/include/ck/library/reference_tensor_operation/cpu/reference_maxpool_bwd.hpp Outdated

qianfengz reviewed

View reviewed changes

library/include/ck/library/reference_tensor_operation/cpu/reference_maxpool_bwd.hpp Outdated

qianfengz requested a review from zjing14

June 16, 2023 03:45

qianfengz reviewed

View reviewed changes

example/49_maxpool2d_bwd/maxpool2d_bwd_common.hpp Outdated

example/49_maxpool2d_bwd/maxpool2d_bwd_common.hpp

rocking5566 added 3 commits

June 16, 2023 00:32


          Calculate gridSize according to the number of CU.

ed4912f

Remove useless header


          Add put example

0bf5750


          Remove useless header

eac452a

qianfengz previously approved these changes

View reviewed changes


          Fix CI fail

rocking5566 dismissed qianfengz’s stale review via

June 16, 2023 12:36

qianfengz added the CI - Pass label

qianfengz approved these changes

View reviewed changes

zjing14 merged commit 341ad95 into develop

rocking5566 deleted the max-pool-bwd branch

December 14, 2023 01:10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels