Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: NanoEvents Delphes #584

Merged
merged 36 commits into from
Sep 30, 2021
Merged

feat: NanoEvents Delphes #584

merged 36 commits into from
Sep 30, 2021

Conversation

kratsg
Copy link
Contributor

@kratsg kratsg commented Sep 13, 2021

This PR will aim to add functionality for coffea to read in Delphes files.

Docstrings are added for common names. The following were more complicated branches that needed to be handled / fixed in uproot first initially before being handled correctly in coffea. The schema is modified in a custom way to drop in the vector.LorentzVector behavior for those that are TLVs.

  • Area (TLorentzVector)
  • Constituents (TRefArray)
  • Edges[4] (Float_t)
  • FracPt[5] (Float_t)
  • Particle (TRef)
  • Particles (TRefArray)
  • PrunedP4[5] (TLorentzVector)
  • SoftDroppedJet (TLorentzVector)
  • SoftDroppedP4[5] (TLorentzVector)
  • SoftDroppedSubJet1 (TLorentzVector)
  • SoftDroppedSubJet2 (TLorentzVector)
  • Tau[5] (Float_t)
  • TrimmedP4[5] (TLorentzVector)

We don't know why Area is a TLV.

@nsmith-
Copy link
Member

nsmith- commented Sep 20, 2021

With scikit-hep/uproot5#441 and scikit-hep/uproot5#442 I can now do:

import awkward as ak
from coffea.nanoevents import NanoEventsFactory, DelphesSchema

events = NanoEventsFactory.from_root(
    "tests/samples/delphes.root",
    "Delphes",
    schemaclass=DelphesSchema,
).events()
ak.to_list(events)

which confirms we can at least interpret all the branches successfully.

@kratsg
Copy link
Contributor Author

kratsg commented Sep 29, 2021

Using

pip install -U git+git://github.com/scikit-hep/uproot4.git@007b10014003a74d0e89e0166e21a6218de0d445

and an editable install at the current commit baa2018, i have the following

>>> ak.type(data.Jet[0].Area)
2 * struct[["fE", "fP"], [float64, TVector3["fX": float64, "fY": float64, "fZ": float64]], parameters={"__doc__": "Area[Jet_]"}]
>>> data.Jet[0].Area
<Array [{fE: 0, fP: {fX: 0, ... fZ: 0}}] type='2 * struct[["fE", "fP"], [float64...'>

>>> ak.type(data.Jet[0]['PrunedP4[5]'])
2 * [5 * struct[["fE", "fP"], [float64, TVector3["fX": float64, "fY": float64, "fZ": float64]], parameters={"__doc__": "PrunedP4[Jet_]"}], parameters={"__doc__": "first entry (i = 0) is the total Pruned Jet 4-momenta and from i = 1 to 4 are the pruned subjets 4-momenta"}]
>>> data.Jet[0]['PrunedP4[5]']
<Array [[{fE: 0, fP: {fX: 0, ... fZ: 0}}]] type='2 * [5 * struct[["fE", "fP"], [...'>

and the others look ok

>>> data.Jet[0].SoftDroppedSubJet1
<Array [{fE: 111, fP: {, ... fZ: 17.5}}] type='2 * struct[["fE", "fP"], [float64...'>
>>> data.Jet[0]['TrimmedP4[5]']
<Array [[{fE: 0, fP: {fX: 0, ... fZ: 0}}]] type='2 * [5 * struct[["fE", "fP"], [...'>
>>> data.Jet[0]['PrunedP4[5]']
<Array [[{fE: 0, fP: {fX: 0, ... fZ: 0}}]] type='2 * [5 * struct[["fE", "fP"], [...'>
>>> data.Jet[0]['SoftDroppedP4[5]']
<Array [[{fE: 126, fP: {, ... fZ: 0}}]] type='2 * [5 * struct[["fE", "fP"], [flo...'>

So I want to add quicker access for some of the branches that have brackets in the name, and it would be nice to automatically interpret some of these as TLVs -- and that requires wrapping it somehow? Or is it expected that uproot or awkward is supposed to recognize it automatically? I thought Jim mentioned something about that.

@kratsg
Copy link
Contributor Author

kratsg commented Sep 29, 2021

Can confirm it all looks ok without the TLV interpretations

>>> data.Jet.Area
<Array [[{fE: 0, fP: {fX: 0, ... fZ: 0}}]] type='25 * var * struct[["fE", "fP"],...'>
>>> data.Jet.Constituents
<Array [[{fName: '', fSize: 12, ... 5932]}]] type='25 * var * struct[["fName", "...'>
>>> data.Jet.Particles
<Array [[{fName: '', fSize: 12, ... 1929]}]] type='25 * var * struct[["fName", "...'>
>>> data.Jet.PrunedP4_5
<Array [[[{fE: 0, fP: {fX: 0, ... fZ: 0}}]]] type='25 * var * [5 * struct[["fE",...'>
>>> data.Jet.SoftDroppedJet
<Array [[{fE: 126, fP: {, ... fZ: 44.2}}]] type='25 * var * struct[["fE", "fP"],...'>
>>> data.Jet.SoftDroppedP4_5
<Array [[[{fE: 126, fP: {, ... fZ: 0}}]]] type='25 * var * [5 * struct[["fE", "f...'>
>>> data.Jet.SoftDroppedSubJet1
<Array [[{fE: 111, fP: {, ... fZ: 39.2}}]] type='25 * var * struct[["fE", "fP"],...'>
>>> data.Jet.SoftDroppedSubJet2
<Array [[{fE: 15.2, fP: {, ... fZ: 4.98}}]] type='25 * var * struct[["fE", "fP"]...'>
>>> data.Jet.Tau_5
<Array [[[0.197, 0.148, ... 0.0485, 0.0356]]] type='25 * var * [5 * float32[para...'>
>>> data.Jet.TrimmedP4_5
<Array [[[{fE: 0, fP: {fX: 0, ... fZ: 0}}]]] type='25 * var * [5 * struct[["fE",...'>

@nsmith-
Copy link
Member

nsmith- commented Sep 29, 2021

Certainly the vector interpretation of TLorentzVector directly via uproot will come in some form eventually, but anyway we'll also have to move coffea vectors to vector. For now I think it would make more sense to consistently use coffea vectors and when we move we can just adjust these as appropriate. To do that I think the easiest is to just define a mixin for the TLV that provides properties x y z t that then access the objects self["fP"]["fX"], ..., self["fE"]

@nsmith-
Copy link
Member

nsmith- commented Sep 29, 2021

Another option is to hack at the form to turn the record items into x y z t.

@kratsg kratsg changed the title draft: feat: NanoEvents Delphes feat: NanoEvents Delphes Sep 30, 2021
@kratsg kratsg requested a review from nsmith- September 30, 2021 00:30
@kratsg
Copy link
Contributor Author

kratsg commented Sep 30, 2021

Another option is to hack at the form to turn the record items into x y z t.

I went with this option for hacking at the form using a recursive function.

Copy link
Member

@nsmith- nsmith- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last check I would suggest is make sure _ = ak.to_list(events) runs. This makes sure all arrays materialize successfully. The output will be gigantic so best not to print it :)

coffea/nanoevents/schemas/delphes.py Show resolved Hide resolved
@lgray lgray merged commit 6eb35e2 into CoffeaTeam:master Sep 30, 2021
@kratsg kratsg deleted the feat/nanodelphes branch October 1, 2021 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants