-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a custom Op with C code for 'roll' #222
Comments
There was some previous discussion on this topic in this thread I don't know much about making theano fast but I'll contribute my (rather On Wed, Nov 23, 2011 at 3:12 PM, David Warde-Farley <
|
Indeed, I tend to agree. However, in the specific case of optimizing for speed, which is a major part of Theano's goal, optimizing memory access patterns play a rather crucial role. It can be quite hard to write something that's uniformly fast without special casing. |
I would call this low priority as I think it is not a bottle neck. So what do you mean by nice-to-have tag? Should we create a "low prio" tag? |
I meant "Nice-To-Have" as "would be nice, at some point, not necessary and certainly not critical", but yeah, a "Low Priority" tag would make this clear. |
(Also, I mainly created this ticket in that it might work well as an exercise for a student who wants to learn to write Ops, so that they start with something simple.) |
There's a simple implementation mimicking
numpy.roll
contributed by @mrocklin in pull request #221. However, it uses subtensors and Join, and could probably be sped up quite a bit by writing a custom Op with C code.This would be a pretty easy task for someone who wanted to become familiar with writing Ops, as it doesn't involve any particularly complicated logic, just permuting things and doing the inverse permutation for the gradient.
All the better if it has a flag that can operate in-place (obviously you need a temporary buffer the size of a single element on the roll axis; @nouiz may have some insight on the most efficient way to do this cache-wise)
The text was updated successfully, but these errors were encountered: