-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use ArraysOfArrays to reduce allocation #65
Conversation
Codecov Report
@@ Coverage Diff @@
## master #65 +/- ##
==========================================
- Coverage 83.41% 82.54% -0.88%
==========================================
Files 10 10
Lines 1182 1203 +21
==========================================
+ Hits 986 993 +7
- Misses 196 210 +14
Continue to review full report at Codecov.
|
manually requesting @aminnj 's review with his many battle tested workload |
Hmm, I tried this out on a 2.4M event Z->mumu tree. Uploaded here (~400MB). I get 5x slower looping rate compared to julia> const f = ROOTFile("doublemu.root") # 2.4M events
julia> const t = LazyTree(f, "t", [r"^Muon_(pt|eta|phi|mass)$","MET_pt"]);
julia> t.Muon_pt
2476431-element LazyBranch{Vector{Float32}, UnROOT.Nooffsetjagg}:
[22.402422, 18.186892]
[45.062744, 44.058678]
[8.216898]
⋮
[24.227955, 14.885574]
[16.145634, 12.987432, 6.6345634, 4.7526903]
julia> struct DummyLV{T <: AbstractFloat}
pt::T
eta::T
phi::T
mass::T
end julia> @time for (i,evt) in enumerate(t)
length(evt.Muon_pt) < 2 && continue # at least 2 mu
((evt.Muon_pt[1] < 20) || (evt.Muon_pt[2] < 20)) && continue # leading 2 mu pt>20
lvs = DummyLV.(evt.Muon_pt,evt.Muon_eta,evt.Muon_phi,evt.Muon_mass)
end
0.199550 seconds (511.96 k allocations: 56.539 MiB) # in master
1.161238 seconds (19.17 M allocations: 721.681 MiB, 11.49% gc time, 5.18% compilation time)
# in this branch |
hmm, I can imagine if loop is very tight already, getting views itself might be slower than having concrete numbers with very good locality. |
Some more data. With this simpler test: julia> @time for (i,evt) in enumerate(t)
length(evt.Muon_pt) < 2 && continue # at least 2 mu
end I got mastercold: 2.859309 seconds (24.56 M allocations: 1.072 GiB, 20.29% gc time, 0.16% compilation time) this PRcold: 0.575812 seconds (4.92 M allocations: 373.350 MiB, 14.94% gc time, 1.65% compilation time) |
Time and allocations are reduced for cold runs in this PR wrt master, but subsequent runs are slow. I tried reverting the 3GB->1GB cache reduction and still see the slowness. So it must not be the cache. |
add0fff
to
d52e528
Compare
I wonder if we can pre-fetch the first basket of each |
do we want to merge this? or people have ideas for polishing. I think there's still some small instability causing allocation, but since even with that |
Yes I think we should merge and then see how it performs in the wild 😜 |
3a2ba37
to
cfafdcc
Compare
[skip ci]
close #64