add general inplace transpose kernels#140
Merged
bragadeesh merged 15 commits intoclMathLibraries:developfrom Apr 6, 2016
Merged
add general inplace transpose kernels#140bragadeesh merged 15 commits intoclMathLibraries:developfrom
bragadeesh merged 15 commits intoclMathLibraries:developfrom
Conversation
added 15 commits
April 4, 2016 13:55
…x729x3 and 625x625x3. Need to enable pre/post callback and twiddle.
… passed unit tests.
…st. For double precison some time 2d inplace transepose are requried. And this is not passing yet.
…ter breaking done is bigger than 2048(double) or 4096(single) this cannot be done inplace since we dont support 3d inplace transpose. Thus 729x729x3 can be done for single but not double
…w. need to add interleaved and backward.). extended size supported for inplace transpose.
… transpose now can be done in recusive layer. still need to modify swap line kernel in the case the each line is bigger than LDS can hold.
… cases. passed all added test cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR attempts to enable general in place transpose. Although the algorithms and kernels can handle 2D in place transpose of any size in theory, only matrix sizes where one dimension is 2, 3, 5, 10 times (or a combination of the ratio) of the other dimension are supported and tested due to performance reason.
257 tests are added to validate the functionality, including pre-callback and post-callback tests. The added tests are disabled by default since they launch some big fft calculations and take relatively long to finish. To run added tests:
CLFFT_REQUEST_LIB_NOMEMALLOC=1Test.exe --gtest_filter=*huge_1D* --gtest_also_run_disabled_testsshould launch all 257 added testsBoth pre-callback and post-callback are supported. Twiddling will also be done within the "transpose" kernels when needed.
Full suite tests have passed on Hawaii and Fiji device.
TODO items includes performance validation and optimization and any bug fixes that were not exposed yet in the testing.
This change is