BLAS vector-vector dot#294
Conversation
| #endif | ||
|
|
||
| // only for vector-vector dot | ||
| GPUARRAY_PUBLIC int GpuArray_dot( GpuArray *A, GpuArray *B, |
There was a problem hiding this comment.
You declare _dot and then use _rdot. That might not work.
| ctx->err = cuModuleLoadData(&res->m, bin); | ||
| // for both info/err log | ||
| cujit_info_log = (char*)malloc(2*cujit_log_size*sizeof(char)); | ||
| if(cujit_info_log == NULL) { |
|
|
||
| ctx->err = cuModuleLoadData(&res->m, bin); | ||
| // for both info/err log | ||
| cujit_info_log = (char*)malloc(2*cujit_log_size*sizeof(char)); |
There was a problem hiding this comment.
No need to cast the return of malloc.
There was a problem hiding this comment.
I think it's more friendly to possible inclusion from C++ code, but I'll remove it anyway.
There was a problem hiding this comment.
The headers should be c++ clean, but the source files will probably not be touched by a c++ compiler.
|
This PR seems to be more about using cuModuleLoadDataEx than blas dot. The blas parts should either be completed or removed. |
|
Ok sorry, I just realized that you just reused the same branch for this PR, so the commits from the other one are in here. You should probably rebase on the current master. |
| if (!(X->flags & GA_ALIGNED) || !(Y->flags & GA_ALIGNED) || | ||
| !(Z->flags & GA_ALIGNED)) | ||
| return GA_UNALIGNED_ERROR; | ||
| if (X->dimensions[0] != n || Y->dimensions[0] != n) |
There was a problem hiding this comment.
You don't set n before this. Also you could just compare them to each other.
| Yp = ©Y; | ||
| } | ||
| } | ||
| if (Z->strides[0] < 0) { |
There was a problem hiding this comment.
Z doesn't have strides, it's a scalar.
|
|
||
| // we should store dot result on device | ||
| cublasGetPointerMode(h->h, &pmode); | ||
| cublasSetPointerMode(h->h, CUBLAS_POINTER_MODE_HOST); |
There was a problem hiding this comment.
I think you meant to set the mode to DEVICE here.
| size_t N, | ||
| gpudata *X, size_t offX, size_t incX, | ||
| gpudata *Y, size_t offY, size_t incY, | ||
| gpudata *Z) { |
There was a problem hiding this comment.
There should be an offset for Z.
|
|
||
| GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(X, CUDA_WAIT_READ)); | ||
| GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(Y, CUDA_WAIT_READ)); | ||
| GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(Z, CUDA_WAIT_ALL)); |
There was a problem hiding this comment.
We should only wait for write on Z since it is not read by dot.
|
Also, please get rid of the merge commit and do a proper rebase. |
|
Just to note that you still have a merge commit. Just a different one now. After a rebase, when you want to push to github, git will tell you to run Otherwise, the code progresses nicely. Good job! |
Plus some minor changes: - did `chmod +x setup.py` - added interface for clblas
Plus some minor changes: - did `chmod +x setup.py` - added interface for clblas
- Added tests for BLAS dot - Implementation for CLBlast - modified blas tests from using nested for loops to itertools.product for parametrized tests.
|
Mostly done, just a few problems. Changes
|
| scratch_mem = clCreateBuffer( | ||
| ctx->ctx, CL_MEM_READ_WRITE, N*sizeof(float), NULL, &cl_err); | ||
| if (cl_err != CL_SUCCESS) | ||
| return GA_MEMORY_ERROR; |
There was a problem hiding this comment.
The temporary buffer allocation is ok, however please use cl_alloc() to allocate it so we can have one central location for all allocations.
| error, | ||
| hdot, /* TODO */ | ||
| sdot, /* TODO */ | ||
| ddot, /* TODO */ |
|
|
||
| int (*hdot)( size_t N, | ||
| gpudata *X, size_t offX, int incX, | ||
| gpudata *Y, size_t offY, int incY, |
There was a problem hiding this comment.
Don't use int, use ssize_t for a signed type. We don't want to limit the size of arrays at this level.
There was a problem hiding this comment.
It seems some BLAS functions are using int as stride (such as *gemv *ger) while others are using size_t. For now I just made a revert to size_t. I'm experimenting with ssize_t however getting test errors. Still need some work to finish. I think it's ready for jenkins test if no other issues. I'll leave stride problem to another PR.
| " const float *x[], size_t incx, " \ | ||
| " float *y[], size_t incy, " \ | ||
| " const float *x[], int incx, " \ | ||
| " float *y[], int incy, " \ |
There was a problem hiding this comment.
Here you changed the data type in the kernel without changing the declared data type. This will break badly.
|
jenkins test this please |
For #292.