Skip to content

Commit

Permalink
Support Jagged branch from CMS NanoAOD (#22)
Browse files Browse the repository at this point in the history
* handle multiple blocks in datastream

* hard code TLeafB

* add simpler jagged array root test file

* support for jagged branch of basic std types

* add 64 bit tyes jagged array test

* fix string length 255 parsing, skip the 0xff byte

* handle jagged-ness coming from '[]' too

* add NanoAOD test

* refurbish README

* fix 1.0 compat
  • Loading branch information
Moelf committed Jul 3, 2021
1 parent 67f1e36 commit b74eb85
Show file tree
Hide file tree
Showing 17 changed files with 498 additions and 131 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
/dev/
/docs/build/
/docs/site/
__pycache__/
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "UnROOT"
uuid = "3cd96dde-e98d-4713-81e9-a4a1b0235ce9"
authors = ["Tamas Gal", "Jerry Ling"]
version = "0.1.7"
version = "0.1.8"

[deps]
CodecLz4 = "5ba52731-8f18-5e0d-9241-30f10d1ec561"
Expand Down
102 changes: 35 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,81 +22,59 @@ Here is also a short discussion about the [ROOT binary format
documentation](https://github.com/scikit-hep/uproot/issues/401)

## Status
The project is in early alpha prototyping phase and contributions are very
The project is in early prototyping phase and contributions are very
welcome.

Reading of raw basket data is already working for uncompressed and
Zlib-compressed files. The raw data consists of two vectors: the bytes
We support reading all scalar branch and jagged branch of "basic" types, as
a metric, UnROOT can already read all branches of CMS' NanoAOD:

``` julia
julia> t = ROOTFile("test/samples/NanoAODv5_sample.root")
ROOTFile("test/samples/NanoAODv5_sample.root") with 2 entries and 21 streamers.

# example of a flat branch
julia> array(t, "Events/HLT_Mu3_PFJet40")
1000-element BitVector:
0
1
0
0
0

# example of a jagged branch
julia> array(t, "Events/Electron_dxy")
1000-element Vector{Vector{Float32}}:
[0.00037050247]
[-0.009819031]
[]
[-0.0015697479]
```

If you have custom C++ struct inside you branch, reading raw data is also possible.
The raw data consists of two vectors: the bytes
and the offsets and are available using the
`UnROOT.array(f::ROOTFile, path; raw=true)` method. This data can
be reinterpreted using a custom type with the method
`UnROOT.splitup(data, offsets, T::Type; skipbytes=0)`.

Everything is in a very early alpha stage, as mentioned above.

Here is a quick demo of reading a simple branch containing a vector of integers
using the preliminary high-level API, which works for non-jagged branches
(simple vectors of primitive types):

```julia
julia> using UnROOT

julia> f = ROOTFile("test/samples/tree_with_histos.root")
ROOTFile("test/samples/tree_with_histos.root") with 1 entry and 4 streamers.

julia> array(f, "t1/mynum")
25-element Array{Int32,1}:
0
1
2
3
4
5
6
7
8
9
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
```

There is also a `raw` keyword which you can pass to `array()`, so it will skip
the interpretation and return the raw bytes. This is similar to `uproot.asdebug`
and can be used to read data where the streamers are not available (yet).
Here is it in action, using some data from the KM3NeT experiment:
You can then define suitable Julia `type` and `readtype` method for parsing these data.
Here is it in action, with the help of the `type`s from `custom.jl`, and some data from the KM3NeT experiment:

``` julia
julia> using UnROOT

julia> f = ROOTFile("test/samples/km3net_online.root")
ROOTFile("test/samples/km3net_online.root") with 10 entries and 41 streamers.

julia> array(f, "KM3NET_EVENT/KM3NET_EVENT/triggeredHits"; raw=true)
julia> data, offsets = array(f, "KM3NET_EVENT/KM3NET_EVENT/snapshotHits"; raw=true)
2058-element Array{UInt8,1}:
0x00
0x03
0x00
0x01
0x00
0x56
0x45
0x4e
0x54
0x00

julia> UnROOT.splitup(data, offsets, UnROOT.KM3NETDAQHit)
4-element Vector{Vector{UnROOT.KM3NETDAQHit}}:
[UnROOT.KM3NETDAQHit(1073742790, 0x00, 9, 0x60)......
```
This is what happens behind the scenes with some additional debug output:
Expand Down Expand Up @@ -176,16 +154,6 @@ Compressed datastream of 1317 bytes at 6180 (TKey 't1' (TTree))
10
10
10
10
10
10
10
10
10
10
10
10
10
```
## Main challenges
Expand Down
7 changes: 6 additions & 1 deletion src/UnROOT.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ module UnROOT

export ROOTFile, array

import Base: keys, get, getindex, show, length, iterate, position
import Base: keys, get, getindex, show, length, iterate, position, ntoh
ntoh(b::Bool) = b

using CodecZlib, CodecLz4, CodecXz
using Mixers
Expand All @@ -18,4 +19,8 @@ include("bootstrap.jl")
include("root.jl")
include("custom.jl")

if VERSION < v"1.2"
hasproperty(x, s::Symbol) = s in fieldnames(typeof(x))
end

end # module
39 changes: 39 additions & 0 deletions src/bootstrap.jl
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@ primitivetype(l::TLeafL) = l.fIsUnsigned ? UInt64 : Int64
fMinimum
fMaximum
end
primitivetype(l::TLeafO) = Bool

function parsefields!(io, fields, T::Type{TLeafO})
preamble = Preamble(io, T)
Expand Down Expand Up @@ -264,6 +265,41 @@ end

primitivetype(l::TLeafF) = Float32

# FIXME this should be generated and inherited from TLeaf
# https://root.cern/doc/master/TLeafB_8h_source.html#l00026
@with_kw struct TLeafB
# from TNamed
fName
fTitle

# from TLeaf
fLen
fLenType
fOffset
fIsRange
fIsUnsigned
fLeafCount

# own fields
fMinimum
fMaximum
end

function parsefields!(io, fields, T::Type{TLeafB})
preamble = Preamble(io, T)
parsefields!(io, fields, TLeaf)
fields[:fMinimum] = readtype(io, UInt8)
fields[:fMaximum] = readtype(io, UInt8)
endcheck(io, preamble)
end

function unpack(io, tkey::TKey, refs::Dict{Int32, Any}, T::Type{TLeafB})
@initparse
parsefields!(io, fields, T)
T(;fields...)
end

primitivetype(l::TLeafB) = UInt8
# FIXME this should be generated and inherited from TLeaf
@with_kw struct TLeafD
# from TNamed
Expand Down Expand Up @@ -755,3 +791,6 @@ function TTree(io, tkey::TKey, refs)
endcheck(io, preamble)
TTree(;fields...)
end

# FIXME what to do with auto.py's massive type translation?
# https://github.com/scikit-hep/uproot3/blob/54f5151fb7c686c3a161fbe44b9f299e482f346b/uproot3/interp/auto.py#L360-L365
2 changes: 1 addition & 1 deletion src/custom.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ function Base.getproperty(hit::DAQHit, s::Symbol)
r = Ref(hit)
GC.@preserve r begin
if s === :dom_id
return bswap(unsafe_load(Ptr{Int32}(Base.unsafe_convert(Ptr{Cvoid}, r))))
return ntoh(unsafe_load(Ptr{Int32}(Base.unsafe_convert(Ptr{Cvoid}, r))))
elseif s === :channel_id
return unsafe_load(Ptr{UInt8}(Base.unsafe_convert(Ptr{Cvoid}, r)+4))
elseif s === :tdc
Expand Down
5 changes: 3 additions & 2 deletions src/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ function readtype(io, ::Type{T}) where T<:AbstractString
length = readtype(io, UInt8)

if length == 255
seek(io, start)
# first byte 0xff is useless now
# https://github.com/scikit-hep/uproot3/blob/54f5151fb7c686c3a161fbe44b9f299e482f346b/uproot3/source/cursor.py#L91
length = readtype(io, UInt32)
end

Expand Down Expand Up @@ -135,7 +136,7 @@ function endcheck(io, preamble::T) where {T<:Preamble}
error("Object '$(preamble.type)' has $(observed) bytes; expected $(preamble.cnt)")
end
end
return true
nothing
end


Expand Down
Loading

2 comments on commit b74eb85

@Moelf
Copy link
Member Author

@Moelf Moelf commented on b74eb85 Jul 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/40187

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.1.8 -m "<description of version>" b74eb8513a2566b5929550ee2f321b32c1151457
git push origin v0.1.8

Also, note the warning: Version 0.1.8 skips over 0.1.7
This can be safely ignored. However, if you want to fix this you can do so. Call register() again after making the fix. This will update the Pull request.

Please sign in to comment.