-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHYSLITE schema #527
PHYSLITE schema #527
Conversation
This might be an interesting place to do unit conversions from MeV to GeV. ;-) |
Aside from the inter-experiment snark. This is awesome, I'll have a careful look through it. :-) |
class PHYSLITESchema(BaseSchema): | ||
|
||
mixins = { | ||
"Electrons": "xAODParticle", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there no further specializations to electrons/muons/etc. that you'd like to do? It's still a draft so I suppose you may be working on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are going to be more specializations - e.g. the cross references will be different for Electrons/Muons/Jets
Do you have like a 40-event DAOD file that we can put into tests? Then we can automate running test processors and such on it. |
I experimented a bit more to get cross references working (added some comments inline). I think it should work in principle - probably the way i implemented it can be improved. Let me know if you have ideas. But there is another issue now - i get the wrong integer types for my ElementLink arrays. If i ask uproot for the akward form of this branch: # file from https://cernbox.cern.ch/index.php/s/xTuPIvSJsP3QmXa/download
tree = uproot.open("DAOD_PHYSLITE.art_split99.pool.root:CollectionTree")
tree["AnalysisElectronsAuxDyn.trackParticleLinks"].interpretation.awkward_form(None) it will give me
But when i actually retrieve that array with uproot it's suddenly tree["AnalysisElectronsAuxDyn.trackParticleLinks"].array().layout.form
Any ideas how to circumvent this? |
The file from https://cernbox.cern.ch/index.php/s/xTuPIvSJsP3QmXa/download has 100 events and 2.8MB. If that's still too large i could also make a smaller one. |
That's fine we have other ~2MB files in |
About the issue i mentioned before where the Link indices and keys become For now i just hardcoded the form for the element links to have Now i can have cross references using global indices import os
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
factory = NanoEventsFactory.from_root(
os.path.abspath("tests/samples/DAOD_PHYSLITE_21.2.108.0.art.pool.root"),
treepath="CollectionTree",
schemaclass=PHYSLITESchema
)
events = factory.events() >>> events.Electrons.trackParticles
<xAODTrackParticleArray [[], [], ... vz: 0, z0: -0.0982}]]] type='40 * var * var...'>
>>> events[30:].Electrons.trackParticles.pt.tolist()
[[], [], [], [], [], [[100500.9765625]], [], [], [[36525.61328125]], [[27814.41015625, 18714.423828125]]]
>>> events[32:].Electrons.trackParticles.pt.tolist()
[[], [], [], [[100500.9765625]], [], [], [[36525.61328125]], [[27814.41015625, 18714.423828125]]] I also experimented a bit with Links into multiple collections and made a function that creates a union array from them
which creates a union array of all the collections linked into by evaluating So i'm not quite sure if this is the best way to do it - at the moment i see it more as a proof of concept. Maybe there is in general a better way to deal with these links ... Also i don't quite know how to attach behavior to the union arrays. Finally one thing i haven't quite understood yet: the way i use the events.Electrons.trackParticles.layout there is no VirtualArray anymore |
I figured out a few more things: The behavior for union arrays works if i add >>> events.TruthBoson.children
<TruthParticleArray [[[], ... TruthParticle]]] type='40 * truthParticle'>
>>> events.TruthBoson.children.parents
<TruthParticleArray [[[], ... [TruthParticle]]]] type='40 * truthParticle'>
>>> events.TruthBoson.children.parents.children
<TruthParticleArray [[[], ... TruthParticle]]]]] type='40 * truthParticle'>
>>> events.TruthBoson.children.parents.children.ndim
5 Note: needs the fix from scikit-hep/awkward#870 However, as you can see the overridden typestring (with Also i found that adding the Note on performance: In current uproot the loading of the ElementLink branches is quite slow, so as a temporary solution for playing with larger ROOT files i wrote a small hack to hijack the # pip install git+https://gitlab.cern.ch/nihartma/physlite-experiments.git
from physlite_experiments.deserialization_hacks import patch_nanoevents
patch_nanoevents() |
While testing the element links i found scikit-hep/awkward#876 - which is already fixed. After the fix the tests should pass. |
Nice a new awkward is build built. Once that is done baking we should require |
@nikoladze Can you add the awkward array version requirement to setup.py in this PR? Thanks! |
Are you ready to switch this from a draft PR, and have it reviewed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice! I have some proposed simplifications, but for the most part it looks good to go.
def trackParticle(self): | ||
trackParticles = self.trackParticles | ||
return self.trackParticles[ | ||
tuple([slice(None) for i in range(trackParticles.ndim - 1)] + [0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be self.trackParticles[..., 0]
?
By the way, nice catch to guard against accessing attributes in objects that may be nested further than the default 2 dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That did not wok - awkward complains with ValueError: ellipsis (...) can't be used on a data structure of different depths
. Should that work? Then i can open an issue for awkward array.
Actually i noticed it for lower number of dimensions e.g. when i do events[2].Electrons.trackParticle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that as long as array.ndim
is well-defined, ellipsis is as well (provided no field indexers are used in the getitem). @jpivarski thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error message about ellipsis is talking about an issue with unions (so, unless it's not what I'm thinking, it's not a bug).
>>> array = ak.Array([[1], [2], [3]])
>>> array[..., 0]
<Array [1, 2, 3] type='3 * int64'>
>>> array = ak.Array([[1], [2], [3], [[4], [4], [4], [4]]])
>>> array[..., 0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/awkward/highlevel.py", line 993, in __getitem__
return ak._util.wrap(self.layout[where], self._behavior)
ValueError: ellipsis (...) can't be used on a data structure of different depths
(https://github.com/scikit-hep/awkward-1.0/blob/1.2.2/src/libawkward/Content.cpp#L1491)
Arrays always have an ndim
, and this is the "purelist depth". A union like the above doesn't have an easily definable depth. I had thought that I made this return -1 in cases like that, but apparently, it gives the depth of the UnionArray object:
>>> array.ndim # high-level
0
>>> array.layout.purelist_depth # low-level
0
That's not ideal: it doesn't give more information than if it had been a RecordArray at that level. But I don't know if this addresses your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked a bit what's going on for this particular array. Seems the issue here is not that it is a union, but that one field has a higher dimension. There is definingParametersCovarianceMatrix
which has one dimension more than the other fields (but is still zipped on the outermost level of offsets). So a small example for that situation would be
import awkward as ak
array = ak.zip(
{"a" : [[1], [2, 3]], "b" : [[[10, 10]], [[20, 20], [30, 30]]]},
depth_limit=2
)
The resulting array actually has ndim=2
- which is what i use in that code for trackParticle
.
>>> array.ndim
2
I want this:
>>> array[:, 0].tolist()
[{'a': 1, 'b': [10, 10]}, {'a': 2, 'b': [20, 20]}]
The ellipsis don't work, presumably because of the one dimension more in the field b
>>> array[..., 0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nikolai/.local/lib/python3.9/site-packages/awkward/highlevel.py", line 993, in __getitem__
return ak._util.wrap(self.layout[where], self._behavior)
ValueError: ellipsis (...) can't be used on a data structure of different depths
(https://github.com/scikit-hep/awkward-1.0/blob/1.3.0/src/libawkward/Content.cpp#L1491)
This might be a bug, perhaps related to scikit-hep/awkward#770
This might be related to the lack of a coffea/coffea/nanoevents/methods/base.py Lines 36 to 50 in 1963fc9
collection_name is encountered then awkward._util.recursively_apply(self.layout, descend) just returns the original (but now materialized) array. Whereas if collection_name is encountered, it would return the flat collection RecordArray without materializing its constituent fields. The purpose of that function is to descend through possible extraneous array nodes like VirtualArray since in general VirtualArray nodes in layouts can disappear in certain circumstances.
|
@nikoladze if we can wrap this up today or tomorrow we can have it in the next coffea release - which would be pretty snazzy. |
Fine for me. Of course this is still work in progress, but so is DAOD_PHYSLITE, so i think it won't hurt to have some basic functionality already included in coffea. I tried to get the
So here is a TODO list for the future:
... and i'm sure more things will come up. Today i won't be able to get more work on this done, so if you think there is still significant things to do until this can be merged it is of course also fine to just have it for the next release then. Thanks a lot! |
Please make a tracking issue for these and use checkboxes or whatever you like to mark completion. Unless @nsmith- has something further that needs addressing I'd like to go ahead and merge this, we can figure out the docs stuff in time (or I'll fix it before minting a release). |
This addresses #514. I started implementing a schema for
DAOD_PHYSLITE
. So far it's just zipping together the collections and assigning some basic LorentzVector behavior which i just copied over from my prototype.So far the following works:
Besides refining the behavior and adding other stuff like EventInfo the next big thing will be the cross references. From looking over the code for NanoAOD it looks like everything is there to do basic indexing from one collection into another, even with nested indices (that result in a double-jagged array). I just need to figure out how to best create the global indices. Maybe i'll have to add or modify some of the transforms to get them since i don't get the local indices from counts, but need to pop them out of already existing awkward arrays that uproot created.
Example for a local index, resulting in a single jagged array:
>>> events.Muons["combinedTrackParticleLink.m_persIndex"] <Array [[], [0], [], [], ... 0], [0], [0, 1]] type='100 * var * uint32[parameter...'>
which should link into
events.CombinedMuonTrackParticles
, e.g. without global indices that would be something likeExample for a local (nested) index, resulting in a double jagged array:
which should link into
events.GSFTrackParticles
. I have some example for that, but i hope that this could also be bent somehow into thenestedindex
stuff that already exists.A bit more advanced would then be to link into multiple collections, resulting in a
UnionArray
. In principle, all these ElementLinks have anm_persKey
field that contains hashes of the collections this index links into. In most cases (like the examples above) this is always the same, but in some cases it can be several, e.g.which are the hashes for
TruthTaus
,TruthBoson
,TruthTop
,TruthBottom
. I think 0 should be masked out since they probably correspond to links to particles that don't exist anymore (maybe thinned out).I'll try out a few more things, but let me know if you have ideas or suggestions how to do these things.