Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate on baskets + an optimization #111

Merged
merged 5 commits into from
Sep 17, 2021
Merged

Conversation

aminnj
Copy link
Member

@aminnj aminnj commented Sep 17, 2021

Add some functions to iterate on baskets.

This lets one bypass the cost of many getindex()s...

julia> @time sum(t.nMuon) |> Int
  0.716353 seconds (6.31 k allocations: 470.014 MiB, 3.63% gc time)
149322456

julia> @time sum(sum,UnROOT.basketarrays(t.nMuon)) |> Int
  0.215638 seconds (6.71 k allocations: 470.039 MiB, 13.26% gc time)
149322456

well, also highlighting that getindex() can be improved @Moelf :) Also note that uproot takes 0.5s to do the above and C++ root takes 1.1s.

By the way,

julia> @profilehtml sum(t.nMuon)

shows a gap after the orange getindex bar. Not sure what that means. I would've expected the bars above getindex to add up to the full orange bar width...

image

@codecov

This comment has been minimized.

@Moelf
Copy link
Member

Moelf commented Sep 17, 2021

getindex() can be improved\

yeah, indeed but I doubt it can ever be improved 3x. Basically, when one iterates over events, especially in parallel, we have a branch:
https://github.com/tamasgal/UnROOT.jl/blob/ac66158f38d8b0fecec06595473f9879b3f693d8/src/iteration.jl#L151

that we can't avoid... I think its performance would look relatively better in the case where the physics workload is >> a few branches...

shows a gap after the orange getindex bar. Not sure what that means

I'm not sure either, from experience usually it's a mix of compilation time and or type instability (dynamic dispatch) time

@aminnj aminnj changed the title iterate on baskets iterate on baskets + an optimization Sep 17, 2021
@aminnj
Copy link
Member Author

aminnj commented Sep 17, 2021

Latest commit factors out the uncommon/slow path from getindex(). This evidently helps the compiler because now we cut down on the getindex() time by a factor of almost 2. Tested on two machines and I see similar reductions.

julia> function bar(t)
           for evt in t
               _ = evt.Muon_pt
           end
           nothing
       end

before

(note this is on a different machine wrt the original post, hence the faster times to begin with)

julia> @time bar(t)
  0.888482 seconds (14.30 k allocations: 1.818 GiB, 4.61% gc time)

julia> @time sum(t.nMuon) |> Int
  0.495250 seconds (6.08 k allocations: 469.980 MiB, 6.55% gc time)
149322456

after

julia> @time bar(t)
  0.762924 seconds (14.30 k allocations: 1.818 GiB, 5.33% gc time)

julia> @time sum(t.nMuon) |> Int
  0.224379 seconds (6.08 k allocations: 469.980 MiB, 5.02% gc time)
149322456

src/iteration.jl Outdated Show resolved Hide resolved
@aminnj
Copy link
Member Author

aminnj commented Sep 17, 2021

After the change, the gap is more or less gone.
image

@Moelf
Copy link
Member

Moelf commented Sep 17, 2021

wow, insane!! doing lord's work thanks this is great.

so now all the time are spent in copying, on the
https://github.com/tamasgal/UnROOT.jl/blob/ac66158f38d8b0fecec06595473f9879b3f693d8/src/root.jl#L272

@aminnj
Copy link
Member Author

aminnj commented Sep 17, 2021

Well to put it into perspective all these gains probably get diluted by a factor of 3 when we run on the zlib file, but better than nothing :).

We can get blazing fast speeds if we eliminate that pesky copy! 👿

@Moelf
Copy link
Member

Moelf commented Sep 17, 2021

btw I think

basketarrays
basketarray

is getting out of hands. Since basketarrays is lazy, maybe call it something iterator?

@aminnj
Copy link
Member Author

aminnj commented Sep 17, 2021

We could change it to basketarray_iter. what's the Julia convention?

@Moelf
Copy link
Member

Moelf commented Sep 17, 2021

there's not really convention, I just don't know how people are gonna use it. _iter is fine, one word is also fine.

@aminnj
Copy link
Member Author

aminnj commented Sep 17, 2021

I'll rename it to _iter since then it's more clear what it does. We're not exporting any of the basketarray* methods so these are mainly in existence for someone who knows what they're doing and want high performance (operating on chunks of a branch)

@Moelf Moelf merged commit db89380 into JuliaHEP:master Sep 17, 2021
@tamasgal
Copy link
Member

Awesome 😅 I need more popcorn 😁

Moelf pushed a commit to aminnj/UnROOT.jl that referenced this pull request Jun 23, 2022
* iterate on baskets

* 10-50% speedup for simple benchmarks

* simplify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants