Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L-BFGS-(B) #521

Closed
pkofod opened this issue Jan 27, 2018 · 4 comments
Closed

L-BFGS-(B) #521

pkofod opened this issue Jan 27, 2018 · 4 comments

Comments

@pkofod
Copy link
Member

pkofod commented Jan 27, 2018

We should experiment with other low-rank update schemes for the inverse hessian approximation instead of the recursion. It appears that we can do it differently, and exploit efficient BLAS-2 calls instead of the repeated dot products.

Input, experiments, and everything else is much welcomed!

@pkofod
Copy link
Member Author

pkofod commented Jan 27, 2018

see discussions here about the two-loop recursion vs other representations http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Jan 27, 2018

Ref discussion in #520

Sooo, actually, the fastest version of doing a mat-vec with small-ish matrices (say 10x10000) is... storing the matrix as an array of arrays. Either my BLAS sucks for those matrices, or I don't understand performance at all. Gist here: https://gist.github.com/antoine-levitt/04487571690b4d69dfbcb0b671f648cd. Very interested in those results with 0.7 and a better BLAS if someone has that handy.

My results:

| BLAS2           | 32.464 μs  |
| loop_mn         | 127.592 μs |
| loop_nm         | 54.093 μs  |
| views_mn        | 122.114 μs |
| views_nm        | 403.941 μs |
| noviews_mn      | 165.762 μs |
| noviews_nm      | 725.805 μs |
| array_of_arrays | 18.822 μs  |

@pkofod
Copy link
Member Author

pkofod commented Jan 27, 2018

How did you install Julia and BLAS? also why is it 10x10000 not 10000x10 ? We would store each vector in a column, so 10000x10 seems more natural, or else we would be writing across columns when storing a vector.

Edit: relevant from: http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf page 3
workwork

So they're saying it's "the same work" in the unconstrained case, but BLAS still may be faster than the hand-rolled recursion

@ChrisRackauckas
Copy link
Contributor

Sooo, actually, the fastest version of doing a mat-vec with small-ish matrices (say 10x10000) is... storing the matrix as an array of arrays. Either my BLAS sucks for those matrices, or I don't understand performance at all.

  37.763 μs (1 allocation: 160 bytes)
  146.077 μs (1 allocation: 160 bytes)
  69.379 μs (1 allocation: 160 bytes)
  144.028 μs (11 allocations: 640 bytes)
  419.790 μs (20001 allocations: 1.98 MiB)
  201.991 μs (31 allocations: 782.34 KiB)
  742.098 μs (39490 allocations: 3.35 MiB)
  21.662 μs (1 allocation: 160 bytes)

I get similar results.

You've found some of the secret sauce of DiffEq and why I spend so much time with RecursiveArrayTools.jl 👍 . Honestly, I can't explain it half of the time but arrays of arrays to surprisingly well in a lot of tests, shockingly so.

Here, I think the issue is that you're allocating the output. It's much much easier to allocate 10 arrays of size 10,000 instead of one array of size 100,000 because those 10 don't have to be contiguous, and given large enough arrays it's "contiguous enough".

Note that BLAS calls don't actually have to be with contiguous portions either. @YingboMa pointed out to me that it first loads things into a stack-allocated memory buffer to get contiguousness even when it isn't present (which is why it can still do well on transposed operations). If we can get that kind of memory buffer in Julia then this would have absolutely no issues!

But yeah, this is why I said it's not so obvious. Why? Who knows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants