Skip to content

Conversation

@jedori0228
Copy link
Contributor

@jedori0228 jedori0228 commented Feb 15, 2024

Saving genie::EventRecord as a separate TTree into CAFs. By default we do not save it, and we turn this feature by adding physics.producers.cafmaker.SaveGENIEEventRecord: true in fhicls. I will create another PR on develop.
The matching between a SRTrueInteraction object from recTree and genie::EventRecord is done by the index of genie tree, genie_evtrec_idx, which is introduced in SBNSoftware/sbnanaobj#124; This PRs requires SBNSoftware/sbnanaobj#124

@jedori0228 jedori0228 marked this pull request as ready for review February 16, 2024 04:58
Copy link
Member

@PetrilloAtWork PetrilloAtWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not look at the code too much in detail, but it's not clear to me how you avoid duplicate GENIE matching indices. Each job will start with index 0 and go on, right?

@jedori0228
Copy link
Contributor Author

I did not look at the code too much in detail, but it's not clear to me how you avoid duplicate GENIE matching indices. Each job will start with index 0 and go on, right?

Yes, fGenieEventCounter is increased everytime it fills one entry in the genie tree.

@PetrilloAtWork
Copy link
Member

In general the concern is that the newly introduced index is not enforced to be unique, which demands a specific workflow in order to make the matching between entries in the new tree ("GENIE tree") and in the rest of the data ("CAF tree").
Because of this non-uniqueness, blindly using file lists or hadd'ing files won't do.

@jedori0228 is providing a tool that can generate a reindexed tree: as long as this is used, joining the trees is possible.

Also, the index can be used for fast lookup of the needed entry. This works only if the order of the entries in the tree is never changed (if for example the input is a file list, the order of the files must not be changed, or the lookup will silently pick a wrong entry).

My recommendation to address this fragility is to introduce a unique identifier for each entry in the CAF tree, and replicate it in all the "auxiliary" trees (in this case, the GENIE tree).
In that way there is a mean to look up an entry even when the order of the input is shuffled, and it is an independent check when doing the matching in a different way.

As a candidate for the unique identifier, I suggest the couple of entries: the original input file name, and the index of the event in that file. To make this more practical, I would use instead of the actual file name its hash code (just from std::hash). The two components are already stored in the CAF tree (the hash value should instead be added), and there is code that can be used in other art modules to reproduce the values.

For fast lookup, nothing really beats the absolute index of the entry in the tree. Yet, a feature of ROOT trees is the ability to build an index that can then be used for lookup. This index can for example map the (hash ; position) ID described above to the location of the entry in the tree. The facility is started by loading the TTree/TChain and running BuildIndex() on it; any entry can then be quickly accessed with TTree::GetEntryWithIndex(hash, position).

@jedori0228
Copy link
Contributor Author

jedori0228 commented Mar 26, 2024

Thanks @PetrilloAtWork ! I pushed few more commits with updated structure;

  • In sbnanaobj, I add std::uint32_t sourceNameHash into caf::SRHeader. A 32-bit integer is chosen to make sure this value not exceeding 32-bit. This is necessary to be used as one of the ingredients of TTree::BuildIndex, where it shifts the major number by 32-bit.
  • Here in sbncode, I am now evaluating the hash using fSourceFile which is the input filename. Then I truncate this into 32-bit integer using static_cast<std::uint32_t>.
  • In the GENIE tree, I am also saving this hash
  • In the end, both CAF and GENIE tree has hash of the inputfile and GENIE tree position. I run BuildIndex on the GENIE tree so one can access to this tree using (hash, GENIE tree position) pair.
  • A hadd-ed or concat-caf-ed file should be duplicated GENIE tree position, but those entry should have different hash. User can now correct match between caf::SRTrueInteraction and genie::EventRecord on these merged files.

I tested this by running CAFMaker on two stage1 files, which has 34 and 36 true neutrinos, respectively. I merged (hadd) the two outputs, and below is a printouts from an external module:

...
[WeightUpdater::ProcessFile]     - (Truncated hash, GENIE Tree position) = (796806952, 32) -> GetEntryNumberWithIndex = 32
[WeightUpdater::ProcessFile]     - (Truncated hash, GENIE Tree position) = (796806952, 33) -> GetEntryNumberWithIndex = 33
[WeightUpdater::ProcessFile]     - (Truncated hash, GENIE Tree position) = (2840352975, 0) -> GetEntryNumberWithIndex = 34
[WeightUpdater::ProcessFile]     - (Truncated hash, GENIE Tree position) = (2840352975, 1) -> GetEntryNumberWithIndex = 35
...

You can see the first file, which has file hash of 796806952 ends at GENIE Tree position = 33 and a new file started with a new hash of 2840352975; and the GENIE Tree position is reset to 0. Using (hash, GENIE Tree position) pair correctly matches this caf::SRTrueInteraction to the next genie record, which is GENIE Tree position of 33+1=34.

@jedori0228
Copy link
Contributor Author

jedori0228 commented Mar 26, 2024

Plan is to make a copy-PR of this against develop once this PR is approved and merged.

Copy link
Member

@PetrilloAtWork PetrilloAtWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of details to be fixed...
The PR seems, in general, good.

Comment on lines 540 to 542
//delete fGenieEvtRec;
//delete fGenieTree;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these causing double-freeing or some other trouble?
Shouldn't the flat versions also be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit where I now delete fGenieEvtRec and fFlatGenieEvtRec. But I found fGenieTree or fFlatGenieTree is deleted when the Record is delete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After investigation, the conclusion is that the trees are deleted when the ROOT file objects they are associated to (e.g. fFile) are deleted.
In addition, the fGenieEvtRec should be deleted by us since, according to ROOT documentation ("If addr is not zero, but the pointer addr points at is zero" paragraph), we passed an address variable that has null content at branch address setting time, the record was allocated by ROOT but it still considered owned by us, and it should be explicitly deleted with delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In 1c47597, I properly deleted all the GENIE objects.

@miquelnebot
Copy link
Contributor

trigger build SBNSoftware/sbnanaobj#124 LArSoft/lar*@LARSOFT_SUITE_v09_72_00_01 SBNSoftware/sbndaq-artdaq-core@v1_06_00of0 SBNSoftware/*@release/SBN2023A_NuMI

@FNALbuild
Copy link

❌ CI build for LArSoft Failed at phase build LArSoft on slf7 for e26:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build LArSoft phase logs

@FNALbuild
Copy link

❌ CI build for LArSoft Failed at phase build LArSoft on slf7 for c14:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build LArSoft phase logs

@FNALbuild
Copy link

❌ CI build for LArSoft Failed at phase build LArSoft on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build LArSoft phase logs

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for e20:prof -- details available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for SBND Failed at phase build SBND on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build SBND phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for ICARUS Failed at phase build ICARUS on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build ICARUS phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for ICARUS Warning at phase ci_tests ICARUS on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests ICARUS phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for e20:prof -- details available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for SBND Failed at phase build SBND on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build SBND phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for ICARUS Warning at phase ci_tests ICARUS on slf7 for e20:prof -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests ICARUS phase logs

parent CI build details are available through the CI dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants