Skip to content

add general inplace transpose kernels#140

Merged
bragadeesh merged 15 commits intoclMathLibraries:developfrom
TimmyLiu:develop_inplace_tranpose_general
Apr 6, 2016
Merged

add general inplace transpose kernels#140
bragadeesh merged 15 commits intoclMathLibraries:developfrom
TimmyLiu:develop_inplace_tranpose_general

Conversation

@TimmyLiu
Copy link
Copy Markdown
Contributor

@TimmyLiu TimmyLiu commented Apr 5, 2016

This PR attempts to enable general in place transpose. Although the algorithms and kernels can handle 2D in place transpose of any size in theory, only matrix sizes where one dimension is 2, 3, 5, 10 times (or a combination of the ratio) of the other dimension are supported and tested due to performance reason.

257 tests are added to validate the functionality, including pre-callback and post-callback tests. The added tests are disabled by default since they launch some big fft calculations and take relatively long to finish. To run added tests:

  1. set ENV to use inplace transpose CLFFT_REQUEST_LIB_NOMEMALLOC=1
  2. launch gtests with disabled tests such as Test.exe --gtest_filter=*huge_1D* --gtest_also_run_disabled_tests should launch all 257 added tests

Both pre-callback and post-callback are supported. Twiddling will also be done within the "transpose" kernels when needed.

Full suite tests have passed on Hawaii and Fiji device.

TODO items includes performance validation and optimization and any bug fixes that were not exposed yet in the testing.


This change is Reviewable

Timmy added 15 commits April 4, 2016 13:55
…x729x3 and 625x625x3. Need to enable pre/post callback and twiddle.
…st. For double precison some time 2d inplace transepose are requried. And this is not passing yet.
…ter breaking done is bigger than 2048(double) or 4096(single) this cannot be done inplace since we dont support 3d inplace transpose. Thus 729x729x3 can be done for single but not double
…w. need to add interleaved and backward.). extended size supported for inplace transpose.
… transpose now can be done in recusive layer. still need to modify swap line kernel in the case the each line is bigger than LDS can hold.
@bragadeesh bragadeesh merged commit 4e67415 into clMathLibraries:develop Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants