-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L-BFGS-(B) #521
Comments
see discussions here about the two-loop recursion vs other representations http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf |
Ref discussion in #520 Sooo, actually, the fastest version of doing a mat-vec with small-ish matrices (say 10x10000) is... storing the matrix as an array of arrays. Either my BLAS sucks for those matrices, or I don't understand performance at all. Gist here: https://gist.github.com/antoine-levitt/04487571690b4d69dfbcb0b671f648cd. Very interested in those results with 0.7 and a better BLAS if someone has that handy. My results:
|
How did you install Julia and BLAS? also why is it Edit: relevant from: http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf page 3 So they're saying it's "the same work" in the unconstrained case, but BLAS still may be faster than the hand-rolled recursion |
I get similar results. You've found some of the secret sauce of DiffEq and why I spend so much time with RecursiveArrayTools.jl 👍 . Honestly, I can't explain it half of the time but arrays of arrays to surprisingly well in a lot of tests, shockingly so. Here, I think the issue is that you're allocating the output. It's much much easier to allocate 10 arrays of size 10,000 instead of one array of size 100,000 because those 10 don't have to be contiguous, and given large enough arrays it's "contiguous enough". Note that BLAS calls don't actually have to be with contiguous portions either. @YingboMa pointed out to me that it first loads things into a stack-allocated memory buffer to get contiguousness even when it isn't present (which is why it can still do well on transposed operations). If we can get that kind of memory buffer in Julia then this would have absolutely no issues! But yeah, this is why I said it's not so obvious. Why? Who knows. |
We should experiment with other low-rank update schemes for the inverse hessian approximation instead of the recursion. It appears that we can do it differently, and exploit efficient BLAS-2 calls instead of the repeated dot products.
Input, experiments, and everything else is much welcomed!
The text was updated successfully, but these errors were encountered: