Skip to content

Get the largest allocatable block size#304

Merged
nouiz merged 10 commits intoTheano:masterfrom
abergeron:largest_block
Dec 1, 2016
Merged

Get the largest allocatable block size#304
nouiz merged 10 commits intoTheano:masterfrom
abergeron:largest_block

Conversation

@abergeron
Copy link
Copy Markdown
Member

@abergeron abergeron commented Nov 29, 2016

This also includes a location change for blas loading.

@abergeron abergeron changed the title Largest block Get the largest allocatable block size Nov 29, 2016
@lamblin
Copy link
Copy Markdown
Member

lamblin commented Nov 30, 2016

Can you temporarily change in travis.yml the way theano is installed, so that it gets it from your updated branch?


case GA_CTX_PROP_FREE_GMEM:
/* There is no way to query free memory so we just return the
largest block size */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get an upperbound as we know the size we allocated and the total size of the memory on the GPU.

I'm not suggesting that we do that here, as using an upperbound could cause crash, using an lower bound as currently done is safer, but could be less efficient. This is mostly just a comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the size that was allocated here since I don't keep track of it for OpenCL.

Also this is neither a lower nor upper bound. Just the largest size that clMemAlloc will accept, not taking into account how much memory is actually free. It's really crappy, but until we handle memory allocation in a similar way to cuda here, we can't do better.

cuda_exit(ctx);
/* We guess that we can allocate at least a quarter of the free size
in a single block. This might be wrong though. */
sz /= 4;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented the doc about the user function to get the properties. Mostly, that this return the preallocated biggest block of a quarter of the free memory on the GPU.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case handles memory that hasn't been preallocated. We can't query the largest block available for cuMalloc, so I am resorting to a guess here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand that. My only point is to document that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this to be an implementation detail (and a bad one at that) and I would prefer not to document it since I hope to change it to something better whenever possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that the size / 4 is a detail. But there is no doc about which property to can be queried. I'll make an issue about that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is documentation that is generated from the headers. This should list all the defined properties.

{
int e = load_libcublas(major, minor);
if (e != GA_NO_ERROR)
return e;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this cause the init of cublas before the prealloc? I think so, but I'm not sure at 100%.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't in most situations.

Comment thread src/CMakeLists.txt
MACOSX_RPATH OFF
# This is the shared library version
VERSION 0.0
VERSION 0.1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back to the versioning. Now have a new interface. This won't trigger the recompilation and won't give good user warning. If people update Theano but not libgpuarray, then they will get compilation error related to the convolution as GA_CTX_PROP_LARGEST_MEMBLOCK isn't defined. Check jenkins buildbot

http://darjeeling.iro.umontreal.ca:8080/job/Theano%20gpu/550/testReport/junit/theano.gpuarray.tests.test_dnn/TestDnnInferShapes/test_conv_time_on_shape_change_valid_conv/

If we keep it like that, then we will frequently have useless user questions. We will loose our own time and user time. If we bump the major version, I don't think it will give a good user error. We need to fix that. It could be for an 0.9rc2, but it should be before 0.9.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is a problem that changing the minor version here will not trigger a recompilation. Everything that worked with 0.0 will work with 0.1. That is the point of this version scheme.

If we were to change the major version it would make the currently compiled modules unloadable, possibly triggering a recompilation (I'm not 100% sure how Theano deal with C code it can't reload).

If people update Theano and it uses newly introduced symbols, they will need to update libgpuarray also, yes. Usually this is handled with recommended versions for releases and a guideline of the style: use the latest master of libgpuarray for the latest master of Theano.

My problem with maintaining another version scheme is that it duplicates works and we will forget to bump one or the other for some changes, which will lead to exactly the problems that you describe, except that nobody will expect them.

@nouiz
Copy link
Copy Markdown
Member

nouiz commented Nov 30, 2016

Comment thread src/gpuarray/buffer.h
*
* Type: `size_t`
*/
#define GA_CTX_PROP_LARGEST_MEMBLOCK 20
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be added to gpuarray.pxd.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it for sure, but I usually only add the values that I need, and I don't need this one currently.

@nouiz
Copy link
Copy Markdown
Member

nouiz commented Dec 1, 2016 via email

@abergeron
Copy link
Copy Markdown
Member Author

You want me to add a new context property that exposes this value? In that case ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants