L-BFGS-(B) #521

pkofod · 2018-01-27T11:30:49Z

We should experiment with other low-rank update schemes for the inverse hessian approximation instead of the recursion. It appears that we can do it differently, and exploit efficient BLAS-2 calls instead of the repeated dot products.

Input, experiments, and everything else is much welcomed!

pkofod · 2018-01-27T12:36:48Z

see discussions here about the two-loop recursion vs other representations http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf

antoine-levitt · 2018-01-27T12:38:32Z

Ref discussion in #520

Sooo, actually, the fastest version of doing a mat-vec with small-ish matrices (say 10x10000) is... storing the matrix as an array of arrays. Either my BLAS sucks for those matrices, or I don't understand performance at all. Gist here: https://gist.github.com/antoine-levitt/04487571690b4d69dfbcb0b671f648cd. Very interested in those results with 0.7 and a better BLAS if someone has that handy.

My results:

| BLAS2           | 32.464 μs  |
| loop_mn         | 127.592 μs |
| loop_nm         | 54.093 μs  |
| views_mn        | 122.114 μs |
| views_nm        | 403.941 μs |
| noviews_mn      | 165.762 μs |
| noviews_nm      | 725.805 μs |
| array_of_arrays | 18.822 μs  |

pkofod · 2018-01-27T12:46:45Z

How did you install Julia and BLAS? also why is it 10x10000 not 10000x10 ? We would store each vector in a column, so 10000x10 seems more natural, or else we would be writing across columns when storing a vector.

Edit: relevant from: http://users.iems.northwestern.edu/~nocedal/PDFfiles/representations.pdf page 3

So they're saying it's "the same work" in the unconstrained case, but BLAS still may be faster than the hand-rolled recursion

ChrisRackauckas · 2018-01-27T16:46:42Z

Sooo, actually, the fastest version of doing a mat-vec with small-ish matrices (say 10x10000) is... storing the matrix as an array of arrays. Either my BLAS sucks for those matrices, or I don't understand performance at all.

  37.763 μs (1 allocation: 160 bytes)
  146.077 μs (1 allocation: 160 bytes)
  69.379 μs (1 allocation: 160 bytes)
  144.028 μs (11 allocations: 640 bytes)
  419.790 μs (20001 allocations: 1.98 MiB)
  201.991 μs (31 allocations: 782.34 KiB)
  742.098 μs (39490 allocations: 3.35 MiB)
  21.662 μs (1 allocation: 160 bytes)

I get similar results.

You've found some of the secret sauce of DiffEq and why I spend so much time with RecursiveArrayTools.jl 👍 . Honestly, I can't explain it half of the time but arrays of arrays to surprisingly well in a lot of tests, shockingly so.

Here, I think the issue is that you're allocating the output. It's much much easier to allocate 10 arrays of size 10,000 instead of one array of size 100,000 because those 10 don't have to be contiguous, and given large enough arrays it's "contiguous enough".

Note that BLAS calls don't actually have to be with contiguous portions either. @YingboMa pointed out to me that it first loads things into a stack-allocated memory buffer to get contiguousness even when it isn't present (which is why it can still do well on transposed operations). If we can get that kind of memory buffer in Julia then this would have absolutely no issues!

But yeah, this is why I said it's not so obvious. Why? Who knows.

antoine-levitt mentioned this issue Mar 14, 2018

Fixed dep warnings for v0.7. #547

Closed

pkofod closed this as completed Aug 16, 2020

pkofod mentioned this issue Aug 16, 2020

L-BFGS-B and compact representations pkofod/NLSolvers.jl#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L-BFGS-(B) #521

L-BFGS-(B) #521

pkofod commented Jan 27, 2018

pkofod commented Jan 27, 2018

antoine-levitt commented Jan 27, 2018 •

edited

pkofod commented Jan 27, 2018 •

edited

ChrisRackauckas commented Jan 27, 2018

L-BFGS-(B) #521

L-BFGS-(B) #521

Comments

pkofod commented Jan 27, 2018

pkofod commented Jan 27, 2018

antoine-levitt commented Jan 27, 2018 • edited

pkofod commented Jan 27, 2018 • edited

ChrisRackauckas commented Jan 27, 2018

antoine-levitt commented Jan 27, 2018 •

edited

pkofod commented Jan 27, 2018 •

edited