Get the largest allocatable block size#304
Conversation
|
Can you temporarily change in travis.yml the way theano is installed, so that it gets it from your updated branch? |
|
|
||
| case GA_CTX_PROP_FREE_GMEM: | ||
| /* There is no way to query free memory so we just return the | ||
| largest block size */ |
There was a problem hiding this comment.
We can get an upperbound as we know the size we allocated and the total size of the memory on the GPU.
I'm not suggesting that we do that here, as using an upperbound could cause crash, using an lower bound as currently done is safer, but could be less efficient. This is mostly just a comment.
There was a problem hiding this comment.
I don't know the size that was allocated here since I don't keep track of it for OpenCL.
Also this is neither a lower nor upper bound. Just the largest size that clMemAlloc will accept, not taking into account how much memory is actually free. It's really crappy, but until we handle memory allocation in a similar way to cuda here, we can't do better.
| cuda_exit(ctx); | ||
| /* We guess that we can allocate at least a quarter of the free size | ||
| in a single block. This might be wrong though. */ | ||
| sz /= 4; |
There was a problem hiding this comment.
This should be documented the doc about the user function to get the properties. Mostly, that this return the preallocated biggest block of a quarter of the free memory on the GPU.
There was a problem hiding this comment.
This case handles memory that hasn't been preallocated. We can't query the largest block available for cuMalloc, so I am resorting to a guess here.
There was a problem hiding this comment.
Yes I understand that. My only point is to document that.
There was a problem hiding this comment.
I consider this to be an implementation detail (and a bad one at that) and I would prefer not to document it since I hope to change it to something better whenever possible.
There was a problem hiding this comment.
ok that the size / 4 is a detail. But there is no doc about which property to can be queried. I'll make an issue about that.
There was a problem hiding this comment.
There is documentation that is generated from the headers. This should list all the defined properties.
| { | ||
| int e = load_libcublas(major, minor); | ||
| if (e != GA_NO_ERROR) | ||
| return e; |
There was a problem hiding this comment.
Can you confirm that this cause the init of cublas before the prealloc? I think so, but I'm not sure at 100%.
There was a problem hiding this comment.
No it doesn't in most situations.
| MACOSX_RPATH OFF | ||
| # This is the shared library version | ||
| VERSION 0.0 | ||
| VERSION 0.1 |
There was a problem hiding this comment.
I'll come back to the versioning. Now have a new interface. This won't trigger the recompilation and won't give good user warning. If people update Theano but not libgpuarray, then they will get compilation error related to the convolution as GA_CTX_PROP_LARGEST_MEMBLOCK isn't defined. Check jenkins buildbot
If we keep it like that, then we will frequently have useless user questions. We will loose our own time and user time. If we bump the major version, I don't think it will give a good user error. We need to fix that. It could be for an 0.9rc2, but it should be before 0.9.
There was a problem hiding this comment.
I don't think that it is a problem that changing the minor version here will not trigger a recompilation. Everything that worked with 0.0 will work with 0.1. That is the point of this version scheme.
If we were to change the major version it would make the currently compiled modules unloadable, possibly triggering a recompilation (I'm not 100% sure how Theano deal with C code it can't reload).
If people update Theano and it uses newly introduced symbols, they will need to update libgpuarray also, yes. Usually this is handled with recommended versions for releases and a guideline of the style: use the latest master of libgpuarray for the latest master of Theano.
My problem with maintaining another version scheme is that it duplicates works and we will forget to bump one or the other for some changes, which will lead to exactly the problems that you describe, except that nobody will expect them.
407e94c to
c18bdaf
Compare
| * | ||
| * Type: `size_t` | ||
| */ | ||
| #define GA_CTX_PROP_LARGEST_MEMBLOCK 20 |
There was a problem hiding this comment.
It should be added to gpuarray.pxd.
There was a problem hiding this comment.
I can add it for sure, but I usually only add the values that I need, and I don't need this one currently.
|
in the C api, but not in the python api.
People should be able to use the python api without looking at the C api,
or at least, have a link from the python api to the relevent part of the c
api.
…On Thu, Dec 1, 2016 at 10:42 AM, abergeron ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/gpuarray_buffer_cuda.c
<#304>:
> @@ -443,6 +443,21 @@ static void find_best(cuda_context *ctx, gpudata **best, gpudata **prev,
}
}
+static size_t largest_size(cuda_context *ctx) {
+ gpudata *temp;
+ size_t sz, dummy;
+ cuda_enter(ctx);
+ ctx->err = cuMemGetInfo(&sz, &dummy);
+ cuda_exit(ctx);
+ /* We guess that we can allocate at least a quarter of the free size
+ in a single block. This might be wrong though. */
+ sz /= 4;
There is documentation that is generated from the headers. This should
list all the defined properties.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#304>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALC-2S5uAZA-BCuS1UWlR1wtJxkKSREks5rDuregaJpZM4K_krZ>
.
|
|
You want me to add a new context property that exposes this value? In that case ok. |
2a8e641 to
1732459
Compare
This also includes a location change for blas loading.