Skip to content

Conversation

@oschulz
Copy link
Member

@oschulz oschulz commented Mar 23, 2020

No description provided.

@codecov
Copy link

codecov bot commented Mar 23, 2020

Codecov Report

Merging #8 into master will increase coverage by 0.45%.
The diff coverage is 53.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #8      +/-   ##
==========================================
+ Coverage   36.07%   36.52%   +0.45%     
==========================================
  Files           8        8              
  Lines         158      167       +9     
==========================================
+ Hits           57       61       +4     
- Misses        101      106       +5
Impacted Files Coverage Δ
src/pyjlconv.jl 59.52% <0%> (-2.39%) ⬇️
src/ttree.jl 39.28% <63.63%> (+3.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 068375f...beb7a10. Read the comment docs.

@oschulz oschulz force-pushed the branches-of-arrays branch 2 times, most recently from 879ba8b to 76389b9 Compare March 23, 2020 20:18
@oschulz oschulz force-pushed the branches-of-arrays branch from 76389b9 to beb7a10 Compare March 23, 2020 20:47
@oschulz
Copy link
Member Author

oschulz commented Mar 23, 2020

Closes #5

@oschulz oschulz merged commit beb7a10 into master Mar 23, 2020
@oschulz oschulz deleted the branches-of-arrays branch March 23, 2020 21:25
@mmikhasenko
Copy link
Member

well done, @oschulz, it works with my trees 👍

However, still conversion to julia vectors a table takes ages (AGES).
Any ideas on how to get the following faster?

@time let
    t = TFile(filename)[treename];
    tpl = [Symbol("four_vector1_"*c) for c in ['X', 'Y', 'Z', 'E']]
    pX, pY, pZ, pE = [getproperty(t,v) for v in tpl];
    # f = x->SVector(x) 
    f = x->NamedTuple{(:px,:py,:pz,:e)}(x)
    @views f.(zip(pX[1:10000], pY[1:10000], pZ[1:10000], pE[1:10000]))
    "done"
end  # takes 50 seconds
# just 1 fourvector out of 10 on 1/30 of the full tree

here I tried with the TBranches instread of the new
getindex(tree::TTree, idxs::AbstractUnitRange, columns::Vector{String}) posted at #5

@mmikhasenko
Copy link
Member

just checked that getindex of TTree reduces a factor 10 for the same problem
once only necessary columns are processed;

function Base.getindex(tree::TTree, idxs::AbstractUnitRange, columns::Vector{String})
    @boundscheck checkbounds(tree, idxs)
    cols = pyobj(tree).arrays(entrystart = first(idxs) - 1, entrystop = last(idxs))
    #
    colnames = Tuple(Symbol.(columns))
    colvals_py = [cols[c] for c in columns]
    colvals_jl = _branch_content.(py2jl.(colvals_py))
    #
    Table(NamedTuple{colnames}(colvals_jl))
end

@oschulz
Copy link
Member Author

oschulz commented Mar 24, 2020

Could you send me the file you're benchmarking that on? I'll see what can be done ...

@mmikhasenko
Copy link
Member

thanks,
I will make some MWE on the weekend

@mmikhasenko
Copy link
Member

Here is a root script to produce the tree:

#include "TString.h"
#include "TFile.h"
#include "TTree.h"

#include <string>
#include "stdlib.h"


int main() {

  TFile *f = new TFile("test_tree.root", "recreate");
  TTree *t = new TTree("mytree", "mytree");

  auto app = "XYZE";

  const uint Np = 100;
  //
  double v[Np][4];
  for (uint n=0; n<Np; n++)
    for (uint i=0; i<4; i++)
      t->Branch(TString::Format("p%d_%c",n, app[i]), &v[n][i]);

  const int Nev = 10000;

  for (uint i=0; i <= Nev; i++) {

    for (uint n=0; n<Np; n++)
      for (uint i=0; i<4; i++)
	v[n][i] = (rand() % 100) / 100.0;
    //
    t->Fill();
  }

  t->Write();
  f->Close();

  return 0;
}

then, you can try (only *_X branches)

@time let
    f = TFile(joinpath("test_tree.root"))
    t = f["mytree"]
    t.p1_X + t.p2_X
end

will take forever (11 sec.).

Surely faster to convert the whole tree and, then, access.

@btime let
    f = TFile(joinpath("test_tree.root"))
    t = f["mytree"][1:end]
    t.p1_X + t.p2_X
end # 351.006 ms

Creating the three with only needed branches is the fastest option

@btime let
    f = TFile(joinpath("test_tree.root"))
    t = f["mytree"]
    tT = getindex(t, eachindex(t), ["p1_X", "p2_X"])
    #
    tT.p1_X + tT.p2_X
end # 328.482 ms

The difference between the last two scales with the number of branches.

@oschulz
Copy link
Member Author

oschulz commented Apr 1, 2020

Ah, right,

@time let
   f = TFile(joinpath("test_tree.root"))
   t = f["mytree"]
   t.p1_X + t.p2_X
end

is indeed dead-slow (5.9 s on my system), and it's expected to be: The type of t.p1_X is TBranch which is an AbstractArray, but it's an on-disk array. So the code above will work, but it will load the values of each branch for each entry one-by-one.

The design is is similar to how HDF5.jl does things, though there the datasets (while they support getindex) are not AbstractArrays.

Try this instead:

@time let
   f = TFile(joinpath("test_tree.root"))
   t = f["mytree"]
   t.p1_X[:] + t.p2_X[:]
end

Here, all entries for each branch are loaded in one go, it's really fast (0.09 s on my system).

@oschulz
Copy link
Member Author

oschulz commented Apr 1, 2020

Also see here (or join in), I just posted this: https://discourse.julialang.org/t/api-traits-for-chunked-out-of-core-arrays/36873

@mmikhasenko
Copy link
Member

mmikhasenko commented Apr 2, 2020

Thanks for looking into it.
I will update to #9 from my implementation 👍

@oschulz
Copy link
Member Author

oschulz commented Apr 2, 2020

Let me know if this works well for you, if so, I'll tag a new release.

@mmikhasenko
Copy link
Member

just checked. All right, my code works without any modifications. Perfect

@oschulz
Copy link
Member Author

oschulz commented Apr 4, 2020

UpROOT.jl v0.3.0 has been released now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants