Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"OpenCL error -11 on line 164 of /home/user/git/clBLAS/src/library/blas/xgemm.cc" #172

Closed
hughperkins opened this issue Oct 29, 2015 · 11 comments

Comments

@hughperkins
Copy link
Contributor

OpenCL error -11 on line 164 of /home/user/git/clBLAS/src/library/blas/xgemm.cc

Following test program, adapted from xgemm.cc sample, running on NVIDIA 940M:

/* ************************************************************************
 * Copyright 2013 Advanced Micro Devices, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 * ************************************************************************/


#include <sys/types.h>
#include <stdio.h>
#include <string.h>

/* Include CLBLAS header. It automatically includes needed OpenCL header,
 * so we can drop out explicit inclusion of cl.h header.
 */
#include <clBLAS.h>

void run() {
    #define M  4
    #define N  3
    #define K  16

    static const clblasOrder order = clblasColumnMajor;

    static const cl_float alpha = 1;

    static const clblasTranspose transA = clblasTrans;
    static const cl_float A[M*K];
    static const size_t lda = K;        /* i.e. lda = K */

    static const clblasTranspose transB = clblasNoTrans;
    static const cl_float B[K*N];
    static const size_t ldb = N;        /* i.e. ldb = N */

    static const cl_float beta = 0;

    static cl_float C[M*N];
    static const size_t ldc = N;        /* i.e. ldc = N */

    static cl_float result[M*N];

    static const size_t off  = 1;
    static const size_t offA = K + 1;   /* K + off */
    static const size_t offB = N + 1;   /* N + off */
    static const size_t offC = N + 1;   /* N + off */

    cl_int err;
    cl_platform_id platform = 0;
    cl_device_id device = 0;
    cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, 0, 0 };
    cl_context ctx = 0;
    cl_command_queue queue = 0;
    cl_mem bufA, bufB, bufC;
    cl_event event = NULL;
    int ret = 0;

    /* Setup OpenCL environment. */
    err = clGetPlatformIDs(1, &platform, NULL);
    if (err != CL_SUCCESS) {
        printf( "clGetPlatformIDs() failed with %d\n", err );
        return;
    }
    printf("got platformids\n");

    err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    if (err != CL_SUCCESS) {
        printf( "clGetDeviceIDs() failed with %d\n", err );
        return;
    }
    printf("got deviceids\n");

    props[1] = (cl_context_properties)platform;
    ctx = clCreateContext(props, 1, &device, NULL, NULL, &err);
    if (err != CL_SUCCESS) {
        printf( "clCreateContext() failed with %d\n", err );
        return;
    }
    printf("created context\n");

    queue = clCreateCommandQueue(ctx, device, 0, &err);
    if (err != CL_SUCCESS) {
        printf( "clCreateCommandQueue() failed with %d\n", err );
        clReleaseContext(ctx);
        return;
    }
    printf("created commandqueue\n");

    /* Setup clblas. */
    err = clblasSetup();
    if (err != CL_SUCCESS) {
        printf("clblasSetup() failed with %d\n", err);
        clReleaseCommandQueue(queue);
        clReleaseContext(ctx);
        return;
    }
    printf("setup blas ok\n");

    /* Prepare OpenCL memory objects and place matrices inside them. */
    bufA = clCreateBuffer(ctx, CL_MEM_READ_ONLY, M * K * sizeof(*A),
                          NULL, &err);
    bufB = clCreateBuffer(ctx, CL_MEM_READ_ONLY, K * N * sizeof(*B),
                          NULL, &err);
    bufC = clCreateBuffer(ctx, CL_MEM_READ_WRITE, M * N * sizeof(*C),
                          NULL, &err);

    err = clEnqueueWriteBuffer(queue, bufA, CL_TRUE, 0,
        M * K * sizeof(*A), A, 0, NULL, NULL);
    err = clEnqueueWriteBuffer(queue, bufB, CL_TRUE, 0,
        K * N * sizeof(*B), B, 0, NULL, NULL);
    err = clEnqueueWriteBuffer(queue, bufC, CL_TRUE, 0,
        M * N * sizeof(*C), C, 0, NULL, NULL);

    /* Call clblas extended function. Perform gemm for the lower right sub-matrices */
    printf("calling sgemm....\n");
    err = clblasSgemm(order, transA, transB, M, N, K,
                         alpha, bufA, 0, M,
                         bufB, 0, K, beta,
                         bufC, 0, M,
                         1, &queue, 0, NULL, &event);
    if (err != CL_SUCCESS) {
        printf("clblasSgemmEx() failed with %d\n", err);
        ret = 1;
    }
    else {
        /* Wait for calculations to be finished. */
        err = clWaitForEvents(1, &event);
        clReleaseEvent(event);

        /* Fetch results of calculations from GPU memory. */
        err = clEnqueueReadBuffer(queue, bufC, CL_TRUE, 0,
                                  M * N * sizeof(*result),
                                  result, 0, NULL, NULL);
    }

    /* Release OpenCL memory objects. */
    clReleaseMemObject(bufC);
    clReleaseMemObject(bufB);
    clReleaseMemObject(bufA);

    /* Finalize work with clblas. */
    clblasTeardown();

    /* Release OpenCL working objects. */
    clReleaseCommandQueue(queue);
    clReleaseContext(ctx);
}

int
main(void)
{
    for(int i=0; i < 3; i++) {
        printf("i=%i\n", i);
        run();
        printf("finished ok :-)\n");
    }
    return 0;
}

Compile, and run, like this:

gcc -std=c99 -I.. -o test169 clblas_issue_169.c -L../build/library -lclBLAS -lOpenCL \
  && LD_LIBRARY_PATH=../build/library ./test169

Expected output:

  • something with no error messages in :-)

Actual output:

$ ./run_169.sh 
i=0
got platformids
got deviceids
created context
created commandqueue
setup blas ok
calling sgemm....
GemmSpecialCases<float>
columnMajor
SGEMM_BRANCH_32
M * N < 1080 * 1080 && (M%32 != 0 || N % 32 != 0) && (K&16 == 0)
special case 664
sourceBuildOptions -cl-std=CL2.0
OpenCL error -11 on line 164 of /home/user/git/clBLAS/src/library/blas/xgemm.cc
test169: /home/user/git/clBLAS/src/library/blas/xgemm.cc:164: void makeGemmKernel(_cl_kernel**, cl_command_queue, const char*, const char*, const unsigned char**, size_t*, const char*): Assertion `false' failed.
./run_169.sh: line 4:  3539 Aborted                 LD_LIBRARY_PATH=../build/library ./test169

Note that there are some additional debugging lines here, most important of which are:

  • GemmSpecialCases.cpp, line 664, insert: printf("special case 664\n");
  • xgemm.c, line 159, insert: printf("sourceBuildOptions %s\n", sourceBuildOptions);

It seems plausible that this problem is caused by the compile options for AutoGemmUserKernels being hard-coded as -cl-std=2.0, ie in file UserGemmKernelSourceIncludes.h , lines 12-13

Probable future of this issue?

  • havent decided, but seems probably I'll at least make it possible to run these samples on my computer, eg by changing the hard-coding at line 12-13 of UserGemmKernelSourceIncludes
@TimmyLiu
Copy link
Contributor

thanks Hugh! This makes sense. I assume clBLAS was built with "OPENCL_VERSION=1.2" since you are running on a nvidia system. We did most of our test setting OPENCL_VERSION=2.0 (since it has better performance on AMD card) and thus let this bug through. I can probably make a fix for this.

@hughperkins
Copy link
Contributor Author

Alright. By the way, note that fixing this hides some other bugs:

  1. in the kernel, at the start of the kernel, there is a superfluous kernel source which makes compilation fail
  2. start of the kernel, there are some 2d arrays with 1d initializers, which make compilation fail.

Going through these bit by bit. Will submit a pull request once working.

@hughperkins
Copy link
Contributor Author

Actually, seems the superfluous kernel source is an artifact in my debugging somehow. However, the initializers do need to be fixed, in order to compiler on NVIDIA.

@TimmyLiu
Copy link
Contributor

I noticed you made a few pull requests to master branch. In the future when you make pull requests can you make them to develop branch?

@hughperkins
Copy link
Contributor Author

Well... I guess I'm thinking that 'master' branch represents 2.8.0, and 'develop' represents new features? Is that actually incorrect, and develop contains only bug fixes? I'd prefer to use a stable versoin of clBLAS in my own projects, ideally.

@TimmyLiu
Copy link
Contributor

we actually only "accept" pull requests to develop branch. Once a while the develop branch will be merged to master branch. It is just a work flow we like to follow.

@hughperkins
Copy link
Contributor Author

Please confirm that develop branch contains no new or original work or features, and contains only bug fixes.

@kknox
Copy link
Contributor

kknox commented Oct 29, 2015

The opencl math library projects follow the 'git flow' scheme, in which all 'new' code should be committed into 'develop'. For critical bug-fixes, a we can cherry-pick commits and push out a new 'patch' (like 2.8.1) release. When making your commits into 'develop', it would help us if you isolate each bug fix into a single commit, so that we can cleanly cherry-pick each fix into 'master'. If your fork has many commits as you worked on bug-fixes, you can squash the commits into a minimal set first before you issue a PR with git rebase --interactive

The problem with accepting code into 'master' branch is that code from 'master' never flows into 'develop' (it's a one-way valve). We risk losing changes the next time 'develop' merges into 'master', because if the merge is done carelessly the changes to 'master' will be overwritten by 'develop' which never had the fix.

@hughperkins
Copy link
Contributor Author

The problem with accepting code into 'master' branch is that code from 'master' never flows into 'develop' (it's a one-way valve). We risk losing changes the next time 'develop' merges into 'master', because if the merge is done carelessly the changes to 'master' will be overwritten by 'develop' which never had the fix.

Well, you can create release branches, eg 2.8, 2.9, and apply fixes to those, which are immediately merged to develop. No master required, and merge is always in direction [some relesae branch] => develop. When you create the next release, you simply fork develop to eg 2.9 or 3.0.

Edit: basically, I think your master should be removed. Your develop should be renamed to master, and each release should have a release branch, as above.

@TimmyLiu
Copy link
Contributor

TimmyLiu commented Nov 6, 2015

can we close this issue?

@hughperkins
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants