Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation related to an example problem? #292

Closed
multidis opened this issue Oct 4, 2016 · 17 comments
Closed

Performance degradation related to an example problem? #292

multidis opened this issue Oct 4, 2016 · 17 comments

Comments

@multidis
Copy link

multidis commented Oct 4, 2016

This seems quite unusual, but:

starting with this Aug 8 commit, there is a ten-fold slowdown in my code calling Optim multiple times, which is eliminated (i.e. normal run times are almost restored) if this function is taken out (found that after a fairly long experimentation).

Specifically, even if I change this like

p18(x) = x <= 3.0 ? (x-2.0)^2 : 2.0*log(x-2.0)+1.0

to

p18(x) = x <= 3.0 ? (x-2.0)^2 : 2.0

then things are much faster.

Any ideas on what could cause the slowdown?

@pkofod
Copy link
Member

pkofod commented Oct 4, 2016

So you are saying that editing one the the problems in the univariate problems module makes your (otherwise unrelated to this module) code run faster? Could you provide an example I can run and test it myself?

@multidis
Copy link
Author

multidis commented Oct 5, 2016

Yes, strange as it seems, that is the case.

I'll try to put together a mockup code reproducing this some time soon.

@pkofod
Copy link
Member

pkofod commented Oct 5, 2016

I'll try to put together a mockup code reproducing this some time soon.

That would be great! To be honest, this sounds like a JuliaLang bug, but if we are at fault, then we surely need to fix this!

@pkofod pkofod closed this as completed Jan 28, 2017
@multidis
Copy link
Author

multidis commented Feb 7, 2017

Not ready to re-open yet (still using private code), but here is a new detail: strong dependency on a particular Julia build (same computer, Haswell CPU). In all cases, commenting out specific line mentioned above reduces the time by half.

With julia-0.5.0-linux-x86_64.tar.gz, julia-3c9d75391c:

  • p18(x) not modified; 53.338824 seconds (3.01 M allocations: 53.778 MB, 0.09% gc time)
  • p18(x) modified; 24.585712 seconds (3.65 k allocations: 7.962 MB, 0.05% gc time)

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
DEBUG build
Platform Info:
System: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

With 0.5.0-trusty8 installed from Ubuntu PPA:

  • p18(x) not modified; 20.140286 seconds (3.01 M allocations: 53.770 MB, 0.11% gc time)
  • p18(x) modified; 10.820859 seconds (3.65 k allocations: 7.955 MB, 0.05% gc time)

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
System: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: liblapack.so.3
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

@pkofod
Copy link
Member

pkofod commented Feb 8, 2017

Sorry, maybe I should have provided an explanation (I thought I did). This should be fixed on master, but there will be no v0.5.1, so there is not really anything we can do but wait for v0.6.0 of Julia to come out.

@pkofod
Copy link
Member

pkofod commented Feb 8, 2017

For the speed difference I think it is the different blas and lapack versions kicking in.

@multidis
Copy link
Author

multidis commented Feb 8, 2017

Indeed the BLAS/LAPACK versions are probably at play here.
Thanks for clarifying regarding Julia v0.6; I'll return to this and re-run the timing when it is released.

@pkofod
Copy link
Member

pkofod commented Feb 8, 2017

If it's possible, you can share the files in private, and I can look into it. Pkm at the domain econ.ku with the danish top level domain dk.

@multidis
Copy link
Author

multidis commented Feb 8, 2017

Probably time to re-open. After cutting out pieces of my private code, it turns out that the core of this issue reproduces with a very simple example:

https://gist.github.com/multidis/ca16bed5fcdd29f88ca1e07973c9cd16

There is a dramatic increase in memory allocation somehow related to the Optim line mentioned above. Note that the @time call is repeated more times than the number of CPU cores, per previous pmap discussion.

$ julia -p 8 optim_time_issue.jl
Any[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
4.209872 seconds (10.41 M allocations: 366.103 MB, 3.67% gc time)
0.128667 seconds (3.00 M allocations: 53.580 MB, 8.02% gc time)
0.122628 seconds (3.00 M allocations: 53.582 MB, 2.05% gc time)
0.120606 seconds (3.00 M allocations: 53.581 MB, 2.00% gc time)
0.190841 seconds (3.00 M allocations: 53.578 MB, 35.62% gc time)
0.124058 seconds (3.00 M allocations: 53.580 MB, 1.65% gc time)
0.120485 seconds (3.00 M allocations: 53.578 MB, 2.12% gc time)
0.121392 seconds (3.00 M allocations: 53.574 MB, 1.42% gc time)
0.121820 seconds (3.00 M allocations: 53.580 MB, 1.44% gc time)
0.119781 seconds (3.00 M allocations: 53.579 MB, 2.10% gc time)
0.120435 seconds (3.00 M allocations: 53.578 MB, 1.45% gc time)
0.120995 seconds (3.00 M allocations: 53.579 MB, 2.09% gc time)

After modifying the line in Optim source and recompiling the cache:

$ julia -p 8 optim_time_issue.jl
Any[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
4.131495 seconds (7.42 M allocations: 320.447 MB, 3.74% gc time)
0.014800 seconds (1.52 k allocations: 7.779 MB)
0.014569 seconds (1.54 k allocations: 7.780 MB, 41.83% gc time)
0.008288 seconds (1.50 k allocations: 7.778 MB)
0.008502 seconds (1.50 k allocations: 7.777 MB)
0.010397 seconds (1.50 k allocations: 7.777 MB, 19.84% gc time)
0.007632 seconds (1.49 k allocations: 7.777 MB)
0.007674 seconds (1.52 k allocations: 7.778 MB)
0.009328 seconds (1.54 k allocations: 7.779 MB, 15.03% gc time)
0.007867 seconds (1.50 k allocations: 7.777 MB)
0.008401 seconds (1.49 k allocations: 7.778 MB)
0.009240 seconds (1.50 k allocations: 7.778 MB, 6.02% gc time)

@pkofod
Copy link
Member

pkofod commented Feb 9, 2017

What is the versioninfo() here?

@pkofod
Copy link
Member

pkofod commented Feb 9, 2017

I get (with p18)

julia> include("optim_time_memalloc_issue.jl")
INFO: Precompiling module Optim.
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
  2.476556 seconds (2.84 M allocations: 139.774 MiB, 2.25% gc time)
  0.012756 seconds (172 allocations: 7.640 MiB)
  0.011627 seconds (172 allocations: 7.640 MiB)
  0.016929 seconds (172 allocations: 7.640 MiB, 20.29% gc time)
  0.008118 seconds (172 allocations: 7.640 MiB)
  0.012219 seconds (172 allocations: 7.640 MiB)
  0.013740 seconds (172 allocations: 7.640 MiB, 17.04% gc time)
  0.008514 seconds (172 allocations: 7.640 MiB)
  0.009472 seconds (172 allocations: 7.640 MiB)
  0.011474 seconds (172 allocations: 7.640 MiB, 24.22% gc time)
  0.008059 seconds (172 allocations: 7.640 MiB)
  0.015001 seconds (172 allocations: 7.640 MiB)

and without p18 (commented out)

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
  2.679532 seconds (2.84 M allocations: 139.788 MiB, 2.14% gc time)
  0.011872 seconds (172 allocations: 7.640 MiB)
  0.012183 seconds (172 allocations: 7.640 MiB)
  0.016697 seconds (172 allocations: 7.640 MiB, 23.72% gc time)
  0.008149 seconds (172 allocations: 7.640 MiB)
  0.009820 seconds (172 allocations: 7.640 MiB)
  0.013352 seconds (172 allocations: 7.640 MiB, 11.09% gc time)
  0.008580 seconds (172 allocations: 7.640 MiB)
  0.009617 seconds (172 allocations: 7.640 MiB)
  0.009496 seconds (172 allocations: 7.640 MiB, 11.00% gc time)
  0.008273 seconds (172 allocations: 7.640 MiB)
  0.009951 seconds (172 allocations: 7.640 MiB)

Seems comparable to me.

julia> versioninfo()
Julia Version 0.6.0-dev.2585
Commit 62a63ab (2017-02-04 14:17 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

@multidis
Copy link
Author

multidis commented Feb 9, 2017

My versioninfo() is the same as in the post above (actually 2 versions), the main difference being Julia 0.5.

@pkofod could you please try running on Julia 0.5 to see if you reproduce the difference? If that confirms increased memory allocation, I'll just wait for 0.6 release.

@pkofod
Copy link
Member

pkofod commented Feb 10, 2017

Oh, I'm pretty sure it will be there on v0.5 as well. Let me check.
edit: yeah, this is not something we can fix, it's a v0.5 thing

julia> include("optim_time_memalloc_issue.jl")
Any[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
  4.514857 seconds (10.27 M allocations: 359.929 MB, 5.84% gc time)
  0.194984 seconds (3.00 M allocations: 53.442 MB, 6.66% gc time)
  0.189407 seconds (3.00 M allocations: 53.443 MB, 4.36% gc time)
  0.134296 seconds (3.00 M allocations: 53.437 MB, 3.75% gc time)
  0.134937 seconds (3.00 M allocations: 53.439 MB, 3.70% gc time)
  0.187085 seconds (3.00 M allocations: 53.440 MB, 4.12% gc time)
  0.135010 seconds (3.00 M allocations: 53.442 MB, 4.20% gc time)
  0.195116 seconds (3.00 M allocations: 53.442 MB, 2.78% gc time)
  0.187571 seconds (3.00 M allocations: 53.440 MB, 4.57% gc time)
  0.138093 seconds (3.00 M allocations: 53.440 MB, 4.42% gc time)
  0.194434 seconds (3.00 M allocations: 53.441 MB, 2.62% gc time)
  0.205953 seconds (3.00 M allocations: 53.441 MB, 4.51% gc time)
10-element Array{Any,1}:
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
julia> include("optim_time_memalloc_issue.jl")
INFO: Recompiling stale cache file /home/pkm/.julia/lib/v0.5/Optim.ji for module Optim.
Any[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
  4.276831 seconds (7.27 M allocations: 314.151 MB, 6.32% gc time)
  0.024969 seconds (251 allocations: 7.639 MB)
  0.033728 seconds (251 allocations: 7.639 MB)
  0.041442 seconds (251 allocations: 7.639 MB, 14.50% gc time)
  0.007823 seconds (251 allocations: 7.639 MB)
  0.038794 seconds (251 allocations: 7.639 MB)
  0.040094 seconds (251 allocations: 7.639 MB, 7.33% gc time)
  0.007730 seconds (251 allocations: 7.639 MB)
  0.007869 seconds (251 allocations: 7.639 MB)
  0.013530 seconds (251 allocations: 7.639 MB, 20.92% gc time)
  0.007852 seconds (251 allocations: 7.639 MB)
  0.040834 seconds (251 allocations: 7.639 MB)
10-element Array{Any,1}:
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0
 1.0

@pkofod
Copy link
Member

pkofod commented Feb 10, 2017

At least the "fast" version (no p18) is a bit faster on v0.6 compared to v0.5 it seems :)

@multidis
Copy link
Author

Thank you @pkofod for verifying; now I know it is not something very specific to my setup. Julia v0.6 timing seems promising, I'll test it after the release with the full code.

@pkofod
Copy link
Member

pkofod commented Feb 10, 2017

Thank you @pkofod for verifying; now I know it is not something very specific to my setup. Julia v0.6 timing seems promising, I'll test it after the release with the full code.

No problem. I hope I get a chance to get to know more about how you use Optim some day!

@multidis
Copy link
Author

I hope I get a chance to get to know more about how you use Optim some day!

Sure, will move to email with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants