-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Framework for symbolic optimizations #122
Comments
#5 above is easy since it doesn't generalize (it only deals with 2 dimensions). How should it be called? Maybe
? Of course a fallback definition can be provided so implementing it is only an optimization. Same idea applies to For #2, we already parse runs of |
Yes, #5 is easier - and the suggested syntax seems ok. I guess aTb() and abT() are good starts for a first release, until a more general framework can be implemented. Ideally, BLAS allows you to efficiently do A[I,J]^T*B[I,J]^T + C[I,J]^T where I and J are ranges. I guess the indexing part can be handled through subarrays, but then the user has to use subarrays explicitly. For #2, I was only thinking of element-wise operations for a start. Think of a stencil operation. It will be essentially the element-wise stuff, but with some indexing thrown in. It would be nice to handle these cases with indexing. Perhaps we need to use subarrays more automatically in more places. |
Well I'm saying we can already handle element-wise What special methods do we want? I guess I could also make special versions of the operators where you know that the first and/or second argument can be overwritten. |
I was referring to element-wise a-b._c+4_d? |
Seems to me passing this to something at run time amounts to handling arithmetic operators with an interpreter? |
Would be cool if there were a way to do this. I guess that for Dense Arrays, one can just hack it into the compiler as a special case. |
The Unfortunately, this presumes that you can do |
I have often thought of this but am not sure the resulting complexity is worthwhile. Also it annoys the user who know what they are doing. |
It really only annoys the user who knows what they're doing in Matlab and doesn't know what they're doing in Julia. There could be ways to force the actual transpose to be done. With the mutability vs. immutability stuff, this approach wouldn't just apply to just transpose, but also subarray refs — since those could safely be references into the same data as the original array with different strides — and many other cases too. |
Here's a suggestion for this - How about extend * to allow: For A*B: For A'*B'+C', where either of the transposes may or may not happen. Also want one more form for x' *A' *y'. Perhaps * can take a 7th argument that says what type of operation it is - a'b'+c' or x'A'y'. |
Ahh! These all go away if we deal with mutability and allow cheap transposes. |
I don't like the idea of the cheap transpose. Cheap transposes are not always desirable, as they just move the problem elsewhere. Most fast matlab code is written with an assumption on stride and storage, and the cheap transpose just messes up that performance model. Also, every single operation has to be made stride-aware, and also will become a bit slower due to dealing with strides everywhere. I'd be ok with a cheap transpose that is evaluated lazily, but not one that just changes the stride. |
We can have functions for A'_B'+C', x'_A'_y' etc. but I'd rather they not be part of |
muladd() and bilinear()? |
I tried a simple experiment on randmatstat, minimizing the array allocations and concatenation with the following change:
It gives a 10% improvement right away, and we can perhaps get a bit more. Optimizing P'_P and Q'_Q will possibly give a bit more improvement. |
commit 361588d adds transposed multiplication. |
Things such as |
Common Lisp has Compiler Macros, these should be fairly easy to implement. Basically, they are just macros except they may choose to do nothing and are intended for optimization. If you do this, it might actually be more desirable to allow the compiler macros to do things after type calculation, so that information is available? Though then it might have to be careful not to break whatever has been type-calculated, or the type calculation has to run again. Of course the disadvantage of this is that the macros can do anything whatsoever, care has to be taken in making them. (And of course you can make a macro to turn conversion rules into compiler macros) Also, do you care about float-behavior specifics? Maybe if you want to, using the types |
I find it quite comical that no matter what language feature X is, the statement "Common Lisp has X" always holds. |
I think the standard name for this operation is axpy. |
Given that array views seem to be the way to go (see #5003), wouldn't it be consequent to also let the transpose operator return a view? I am not sure what to do when assigning the transposed to a new variable (i.e. x=y') but this should be handled the same way as array views (which are the right choice IMHO) |
I believe there's another issue open for making transpose operations produce a new immutable Transposed type. |
FWIW, I have been exploring something along this line in Accelereval.jl. I am considering to use this to replace Devectorize.jl. The progress is not very fast, as I only worked on it when I got some free time to spend. This work is inspired by Theano, but take advantage of Julia's multiple dispatch mechanism and meta-programming to simplify interface. The goal is to automatically devectorize expressions or turn them into BLAS calls whenever appropriate. |
Having slicing return views may actually help alleviate the issue that |
Neal has been implementing some of these features in R recently: http://radfordneal.wordpress.com/2014/01/01/new-version-of-pqr-now-with-task-merging/. Might be worth taking some inspiration.
|
This is well-trod ground. I'm not sure I've heard this optimization called "task merging" before; it is usually called "stream fusion". We could hack something into the compiler to recognize and replace certain patterns, I just don't find it very satisfying. Library-defined rewrite rules are probably the best option I'm aware of, but they still feel like a hack. An easier case might be high-latency operations like on |
What would these library-defined rewrite rules look like? Right now it's hard to implement something like this with a macro because inferred types aren't available at compile time. Solving this issue in a sufficiently general way will directly help JuMP. |
@StefanKarpinski mentioned the staged function idea in the maillist. |
This might be a complicated issue -- suppose the compiler managed to make the inferred type available to the "later-staged" macro, what if that macro some how modify the types ... |
Operator overloading with lazy objects is a very intuitive way to do this "task merging". If only we could have some interface where these lazy objects can be computed once at compile time, so long as the compiler can guarantee the types of the objects from type inference (otherwise it's done at runtime). |
I keep coming back to this for JuMP. I essentially need a post-macro step that lets me write code based on the inferred types of objects in an expression tree. It's not feasible to write a code that dispatches on the types of the objects or generates |
Much of this issue is addressed in many different ways - with staged functions, array views, and the various mul operators. |
We need a framework to express certain mathematical optimizations in julia itself. These may be expressed as rules that are run after types have been inferred. Examples are:
A' * B, A' \ B
: Can be computed without computing the transpose by calling BLAS/LAPACK routinesA - B + C .* D
: Can be computed without temporariesA[m:n, p:q] + B
: Avoid computing the ref.A[m:n, p:q] * B
: Avoid computing the sref, and call DGEMM directly in cases where this is possible.[ a b c; d f; e]
: Compute the entire concatenation in one shot.A[p,q] = B[r,s]
: Temporary can be avoided by callingassign(A, sub(B, r, s), p, q)
In all cases, temporaries can be avoided. Either we can do it with optimizations, or have the parser such expressions down to the runtime. If a matching runtime implementation for such cases is not found, the existing base case should certainly be used.
The text was updated successfully, but these errors were encountered: