Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
test-functional fails xgemm.cc #207
I compiled clBLAS cloned from the git (c2.8) with no bigger problems, but any kind of tests fail, for example
I run it on lubuntu (upgraded straight after installation to kernel 4.2.0-22) , with Radeon R9 290.
dpkg -l fglrx fglrx-core fglrx-dev fglrx-amdcccle
cmake - mCMakeCache.txt
I tried to put some other tests in front of this one and their did pass.
I found a solution, but it is kind of strange or at least unexpected.
This does not work
but this does
and in the end
any ideas why I need sudo to run the tests?
The /var/log/apport.log had those kind of entries when the test crashed
Another interesting thing is that when running the same test as root and as a regular user the output is slightly different. As a root you can see a line Invalid Size for A, which does not appear otherwise.
sudo /opt/clBLAS/bin/test-functional --gtest_filter=InvalidMemObjecttrmv
I experienced the crash of clinfo before I renamed libamdocl12cl64.so coming from AMDAPPSDK-3.0 so it would not be picked up by clinfo. After renaming it worked fine.
At the moment I have it renamed and clinfo works for both root and non-root accounts.
Now, after thinking that I solved a problem with test-functional and played a bit with test-correctness compiling kernels etc. and the test-functional does not work anymore, and I have no idea why. I am back to square one. But it might be interesting that the test I mentioned above that was reporting "Invalid Size for A" does not print that line anymore for sudo, but as I said the overall test-functional does not go through.
In general which libraries (libOpenCL.so, libamdocl*64.so) should I link to? The ones coming from a driver or SDK?
Also does it make any difference if I use header files for OpenCL coming from khronos.org or from the SDK?
So I set up a new system, and below are all the commands I used from fresh installation to run of test-functional. I do not know if it helps, but maybe somebody will spot a place where I did something wrong. If I got some specific output I put a command in bold and then output in the code block.
After the installation of the driver I got a msg
_dpkg -l fglrx fglrx-core fglrx-dev fglrx-amdcccle_
Add to ~/.bashrc
########## Boost 1.60
########## SANITY CHECK 1
########## AMD SDK OpenCl
########## SANITY CHECK 2
fixing problem with clinfo
check if visible libOpenCL is linked library or not
la -la /usr/lib/libOpenCL.*
la -la /usr/lib32/libOpenCL.*
Main parts of CMake output
That should be it, I hope I did not miss anything.
I also tried removing the existing ACML 6 and replacing it with only ACML 5.3.1, then rebuilding clBlas but it did not work. It is worth mentioning that I tried test-functional for both root and non-root users.
Any suggestions on which versions of kernel and components (Boost, acml, GTest, GFlags, drivers, SDK) it should work?
I think I may know what's happening.
test-functional is feeding an invalid command queue to clblas and expect a return of clblasInvalidCommandQueue as defined in clblas.h
In xgemm.cc there is an assert statement right after a call to clgetCommandQueueInfo, which of course should return an error code and the assertion failed.
I think this is a bug that invalid command queue was not properly handled. Instead of aborting, the clblas api should return the error code quietly.
Maybe for the purpose of using the library you can ignore test-functional for now. Have you tried test-short, which test the correctness (shorter version for test-correctness.)
I commented out two tests, and now I am able to run test-functional, still it fails on couple of tests, see below.
The tests I commented out before building clBLAS:
Summary of test-functional
Details of failed tests
In those two tests I got some additional output (Invalid Size of X), which I guess should not be there:
Regarding test-short it was really short as it crashed quite soon after starting.
Tests that run and failed:
Segmentation fault (core dumped)
No problem. Intersting that when I build the libblas.so myself the make failed
To see whether everything linked ok:
But when I installed libblas-dev (with dependenices libblas-common libblas-dev libblas3), which as I understand is the Netlib's BLAS, linked cmake to /usr/lib/libblas/libblas.so.3.0, and it build properly.
Then I run test-short
and test-functional, still with the first two tests being commented out
Now I am running test-correctness and will update this comment with the results.
the test-correctness crashed
referenced this issue
Jan 7, 2016
Looks like test-functional passes the initial tests, but it crashes later on
line 1080 is similarly like before
Now when I build the project and run any tests they run on CPU instead of GPU (GPU load is 0%, and one core of the CPU shows 100%). It did not happen before. I build the project using exactly the same parameters as before, no issues with compiling or building in general.
Could this thread safety fix be the cause?
It does work on CPU instead of GPU. Even on CPU it uses only one core.
With the fix, if each cpu thread uses it own opencl context, then each thread has to compile its own opencl kernel. That probably takes a lot longer than executing the kernels itself. That's a possible reason that the cpu usage is high and gpu usage is low.
I run the test, see the output below. I am still surprised during the both tests the GPU usage was very low almost always showing 0%, sometimes jumping to 3%.
The CPU use was 100% on one core when doing one of larger tests
then it switched to another core showing there 100% use, GPU constantly on 0%
Regarding the tests:
Then I run it again
It passed all of them ... I do not know why it failed for the first time.