Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

round reduction size up to nearest power of two to avoid overloading cache #65

Merged
merged 1 commit into from
Aug 5, 2015

Conversation

davidweichiang
Copy link
Contributor

Hi, in reduction.py, ReductionKernel._get_basic_kernel is cached according to its arguments maxls and nd, where maxls is the smaller of the reduction size and self.init_local_size (= 1024 on my machine). If there are a lot of small reductions, there will be many different versions of the basic kernel in the cache, too many to fit in the cache. However, one of the first things that the basic kernel does is to round maxls up to the nearest power of two. This patch does the rounding up before caching so that there are at most lg(self.init_local_size) different versions of the basic kernel for each value of nd. (It also does the rounding again afterwards, to avoid breaking anything.) I think this fixes the problem with small reductions without costing anything for larger reductions.

@abergeron
Copy link
Member

Good catch, this is certainly ok.

abergeron added a commit that referenced this pull request Aug 5, 2015
round reduction size up to nearest power of two to avoid overloading cache
@abergeron abergeron merged commit da08a6b into Theano:master Aug 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants