Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libcusmm] libcusmm_unittest_transpose test failure #75

Closed
dev-zero opened this issue Nov 2, 2018 · 4 comments
Closed

[libcusmm] libcusmm_unittest_transpose test failure #75

dev-zero opened this issue Nov 2, 2018 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@dev-zero
Copy link
Contributor

dev-zero commented Nov 2, 2018

On our K20X system, the libcusmm_unittest_transpose test fails with the following message:

13/14 Test #13: libcusmm_unittest_transpose ...........................***Failed    2.43 sec
# Libcusmm has 295 blocksizes for transposition
Cannot transpose matrices with dimensions above 80, got (256 x 16)

@shoshijak is this CUDA arch or kernel dependent?

@dev-zero dev-zero added the bug Something isn't working label Nov 2, 2018
@alazzaro
Copy link
Member

alazzaro commented Nov 2, 2018

Are you using CUBLAS flag during the compilation? Such a big kernel should not be there, unless we apply densification and that case we should not transpose...

@dev-zero
Copy link
Contributor Author

dev-zero commented Nov 2, 2018

no, I used cmake -DUSE_MPI=OFF -DUSE_CUDA=ON -DWITH_GPU=K20X to build, which leaves CUBLAS turned off by default

@alazzaro
Copy link
Member

alazzaro commented Nov 2, 2018

Ah OK, I see the problem now. The file

https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/libcusmm/parameters_K20X.json

reports "huge" kernels (I think for testing purpose):

{"m": 81, "n": 9, "k": 9, "tile_m": 3, "tile_n": 3, "w": 4, "v": 6, "threads": 128, "grouping": 16, "minblocks": 8, "algorithm": "largeDB1", "perf": 189.642},
{"m": 96, "n": 96, "k": 96, "tile_m": 6, "tile_n": 3, "w": 14, "v": 48, "threads": 512, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 614.588},
{"m": 100, "n": 10, "k": 10, "tile_m": 2, "tile_n": 2, "threads": 256, "grouping": 16, "minblocks": 4, "algorithm": "medium", "perf": 226.917},
{"m": 121, "n": 11, "k": 11, "tile_m": 5, "tile_n": 3, "threads": 128, "grouping": 16, "minblocks": 1, "algorithm": "medium", "perf": 233.211},
{"m": 144, "n": 12, "k": 12, "tile_m": 2, "tile_n": 4, "w": 6, "v": 8, "threads": 288, "grouping": 16, "minblocks": 4, "algorithm": "largeDB1", "perf": 268.209},
{"m": 169, "n": 13, "k": 13, "tile_m": 3, "tile_n": 4, "w": 6, "v": 10, "threads": 256, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 221.427},
{"m": 196, "n": 14, "k": 14, "tile_m": 6, "tile_n": 2, "threads": 256, "grouping": 16, "minblocks": 1, "algorithm": "medium", "perf": 243.838},
{"m": 225, "n": 15, "k": 15, "tile_m": 3, "tile_n": 3, "w": 4, "v": 12, "threads": 384, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 248.307},
{"m": 256, "n": 16, "k": 16, "tile_m": 2, "tile_n": 6, "w": 6, "v": 10, "threads": 384, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 309.19}

@shoshijak
Those kernels are too big to run on the GPU, I wonder how they made it...
I see two solutions:

  1. remove those kernels from the file
  2. replace the assert with the product of dimensions (808080 as maximum).

Personally, I will go for the first solution...

@shoshijak
Copy link
Contributor

The libcusmm_unittest_transpose test just tests all transposition operations that could arise given the kernels defined in parameters_GPU.json. I see that in parameters_K20X.json (https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/libcusmm/parameters_K20X.json), there are a number of kernels with m, n, or k > 80. I'm guessing they were introduced to the parameter file before the Cannot transpose matrices with dimensions above 80-limitation.

If really libcusmm should not be transposing kernels with a dimension above 80, then these kernels should be removed from parameters_K20X.json.

shoshijak added a commit to shoshijak/dbcsr that referenced this issue Nov 6, 2018
shoshijak added a commit to shoshijak/dbcsr that referenced this issue Nov 6, 2018
shoshijak added a commit that referenced this issue Nov 6, 2018
shoshijak added a commit to shoshijak/dbcsr that referenced this issue Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants