Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark sympy implementation #27

Closed
mmikhasenko opened this issue May 27, 2022 · 7 comments · Fixed by #29
Closed

Benchmark sympy implementation #27

mmikhasenko opened this issue May 27, 2022 · 7 comments · Fixed by #29
Assignees
Milestone

Comments

@mmikhasenko
Copy link
Contributor

Here are figures from julia implementation for the reference

  • calling the amplitude for the 25 chains of the default model on a random point in the phase space takes ~1.2ms
julia> const λσs0 = randomPoint(tbs)

DalitzPlotPoint{Invariants, Vector{Int64}}
  σs: Invariants
  two_λs: Array{Int64}((4,)) [-1, 0, 0, 1]

julia> @btime amplitude.($(Ref(λσs0)), $(model0.chains))

1.195 ms (3483 allocations: 177.17 KiB)

julia> @benchmark amplitude.($(Ref(λσs0)), $(model0.chains))

BenchmarkTools.Trial: 3894 samples with 1 evaluation.
 Range (min  max):  1.228 ms    6.857 ms  ┊ GC (min  max): 0.00%  80.44%
 Time  (median):     1.250 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.280 ms ± 284.049 μs  ┊ GC (mean ± σ):  1.15% ±  4.23%

  ▅████▇▆▅▅▄▄▄▄▃▃▂▂▂▁▁▁  ▁▁                                   ▂
  ██████████████████████████▇▆███████▇▇▅▆▆▇▅▇▇▆▆▄▆▆▄▅▆▁▄▄▄▁▄▄ █
  1.23 ms      Histogram: log(frequency) by time       1.5 ms <

 Memory estimate: 177.17 KiB, allocs estimate: 3483.
  • plotting the Dalitz on 54x54 grid takes roughly 10 sec.
  • computing alpha_x,y,z on the same grid takes 180 sec.
  • computing the amplitude (x4 helicity values) for 100_000 points on 8 threads takes 130 secs.
@redeboer
Copy link
Member

redeboer commented Jun 1, 2022

#29 implements a similar benchmark under docs/polarization.ipynb. With a "hexacore" (nproc is 12):

  • Total intensity for 12 chains (aren't there 12 only?) on a random point: 37.7 µs ± 12.7 µs
  • Computing the total intensity over a 54x54 Dalitz grid: 1.1 ms ± 15 µs
    (the entire plot procedure with matplotlib would be 72.2 ms ± 2.28 ms)
  • Computing the three $\alpha_{x,y,z}$ over the same grid: 6.88 ms ± 104 µs
  • Computing the total intensity over 100.000-event phase sample within the Dalitz region: 29.7 ms ± 2.05 ms
    • Same for the three $\alpha_{x,y,z}$ functions: 180 ms ± 3.02 ms
    • 10.000.000 events:
      - total intensity: 2.54 s ± 51.5 ms
      - $\alpha_{x,y,z}$: 15.8 s ± 325 ms

Note:

  • 'Unfolding' (calling doit()) the intensity and polarization functions takes around 1.5 minutes
  • Lambdifying them takes around 10 seconds
  • Input to the functions are (pre-computed) angles, not only the two Mandelstam variables. This conversion takes 23 ms ± 1.53 ms for 100.000 events.

UPDATE: See #69 for a correction to these numbers.

@mmikhasenko
Copy link
Contributor Author

that is ridiculously fast. I am jealous

@redeboer
Copy link
Member

redeboer commented Jun 1, 2022

that is ridiculously fast.

That's also what makes me doubt whether this is computing things correctly 😅 But creating those 20 plots (intensity, $|\vec{\alpha}|$, and $\vec\alpha$, plus three for each sub-system) over a 400x400 grid always took around 25s, so makes sense ^^

@mmikhasenko
Copy link
Contributor Author

We can cut the number of interesting plots. I think we do not need to look at the individial chains anymore for the final results.
Still, the first notebook that you created with three decay chains and the sliders was extremely useful to understand these alpha disctributions. The individual plots there are critical.

For the number of decay chains, I get 25 by counting the production couplings.

@redeboer
Copy link
Member

redeboer commented Jun 1, 2022

For the number of decay chains, I get 25 by counting the production couplings.

Aye that be true
image
(one of them is set to 1).
At any rate, the cross-checks indicate that the functions should be the same in both languages.

@mmikhasenko
Copy link
Contributor Author

I found one problem with my implementation. Actually, my customary stuctures for masses, spins and helicity were not interfaced correctly.

This piece of code caused x10 slow down

import Base:getproperty
function getproperty(two_js::ThreeBodySpins, sym::Symbol)
    (sym in fieldnames(ThreeBodySpins)) && return getfield(two_js, sym)
    (sym == :two_j1 || sym == :two_λ1) && return getfield(two_js, :two_h1)
    (sym == :two_j2 || sym == :two_λ2) && return getfield(two_js, :two_h2)
    (sym == :two_j3 || sym == :two_λ3) && return getfield(two_js, :two_h3)
    (sym == :two_j0 || sym == :two_λ0) && return getfield(two_js, :two_h0)
    error("no property $(sym)")
end

The intensity plot on my laptop takes 1s. Compare to your 1ms, haha

@redeboer
Copy link
Member

The intensity plot on my laptop takes 1s. Compare to your 1ms, haha

Nice! That agrees more with the corrected benchmarks from #69. JAX seems to do some kind of fancy nested for repeated calls, but for a first call, it's apparently also about one second.

What's weird though, is how JAX scales. According to #69, computing the intensity for 10^7 events takes around 5.3s.

@redeboer redeboer added this to the 0.0.1 milestone Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants