Output format update/discussion

Reporting the issue from [WenjieWu-Sci/FLArE#48](https://github.com/WenjieWu-Sci/FLArE/issues/48).

[mvicenzi](https://github.com/mvicenzi)
opened [on Aug 3, 2024](https://github.com/WenjieWu-Sci/FLArE/issues/48#issue-2446132579)
> The default output format defined in AnalysisManager is getting inadequate given that the simulation no longer contains just FLArE. For example, no information is saved from other sensitive volumes by default. This is in contrast with our push to make this toolkit available/usable by the entire FPF community.
> 
> We should revisit our plans for the output file format to be more inclusive of all detectors.
> We should revisit our use of sensitive detectors: how are we planning to implement digitization? I would push to have it in an indipendent/downstream package.
> We should revisit our "reconstruction" code/variables. My suggestion would be to save the recorded G4hits in the output (maybe divided according to SD?) and move every reco script outside of Geant4.

[WenjieWu-Sci](https://github.com/WenjieWu-Sci)
[on Aug 5, 2024](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2270092531)
> I completely agree. It has been in my mind for a while, but I didn't really think it through in terms of how to implement it, there are many alternative detector configurations. It requires much flexibility to the output variables.
> I thought about saving the G4hits, but there are so many of them given the high energies and it results in a very large file size. I did some test before, which seems impossible to save all the hits. But maybe we can merge hits geometrically nearby?

[mvicenzi](https://github.com/mvicenzi)
[on Aug 22, 2024](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2305462721)
> Yes, it's not easy to create a flexible system given all the configurations and their possible output variables. My feeling is that it's better to keep things simple by outputting low-level info (G4hits), so that all the hit merging and hit digitization steps can be performed afterwards by detector-specific tools. However, it's true that it may not be possible to save everything... From your tests, which detector is more challeging? For example, the FLArE output could end up just being the pixelated projections (time and charge for each pixel) instead of the full 3D set of hits.
> 
> Maybe we can start by adding some infrastructure and options to save hits only from specific sensitive volumes?
> It might end-up being impossible to dump all of them, but if someone is interested in a specific subdetector they can do that more easily?
> 
> I was looking at what the edep-sim output format looks like from [here](https://github.com/ClarkMcGrew/edep-sim?tab=readme-ov-file#the-tg4event-class). For each event, there are three objects:
> 
> Primaries: The GEANT4 primary particles (A vector of TG4PrimaryVertex)
> 
> Trajectories: The GEANT4 particle trajectories (A vector of TG4Trajectory)
> 
> SegmentDetectors: The energy deposition information (A map keyed by sensitive detector name, containing a vector of TG4HitSegments).
> 
> We already have in place a vector of primaries, although I'm not sure how it would handle multiple vertexes. The hits are saved as a map using the sensitive detectors as keys, so we could potentially replicate something similar (specifying which ones to save in a macro parameter?). Regarding trajectories, they're most likley not needed unless we want to do some fancy visualization.

[OlivierSalin](https://github.com/OlivierSalin)
[on Feb 20](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2671733279)
> Hi Matteo,
> 
> As we discussed in the FLArE meeting, the ACTS input for the hits might be a good reference to format the G4 output. We can find the exact format that is compatible with ACTS in those functions:
> https://github.com/acts-project/acts/blob/main/Examples/Io/Root/src/RootSimHitWriter.cpp
> 
> Here you can find an example of Hits input file: https://cernbox.cern.ch/s/YorN7RqA0yWerdC
> 
> Do you think it would be straightforward to match this output ?
> 
> Best,
> Olivier

[benw22022](https://github.com/benw22022)
[2 weeks ago](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2729937921)
> Hi All,
> 
> I've started working on some changes to the code to make it more useful from a FASER2 perspective on my [fork](https://github.com/benw22022/FLArE) (I'll make a PR to this repo when it's ready). For us I think what we need is a HepMC parser (this is the format that our LLP generator makes) and to change the FASER2 tracking elements from the scintillating bar-like design to simple layers of plastic. I think @mvicenzi suggestion of just keeping the hits info and keeping the digitisation and reco components separate is a good one. I can also try and look at implementing the ACTS SimHit format as Olivier suggested.
> Cheers,
> 
> Ben

[mvicenzi](https://github.com/mvicenzi)
[2 weeks ago](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2730365559)
> Hi @benw22022,
> 
> Thank you for your input. Feel free to open PRs as you get things done! It would be great if you could split things in different PRs (HepMC parser, geometry changes, etc..) so that they are a bit easier to review.
> 
> I was planning to put together a new output structure, where hits are grouped by sensitive volume which can be saved or not depending on a detector names list in the configuration file. This probably doesn't solve the size problem, but at least you can switch off detectors you are not interested in. The ACTS SimHit should be easily reproducible, but it's unclear to me what to make of all these ids:
> 
>   m_outputTree->Branch("volume_id", &m_volumeId);
>   m_outputTree->Branch("boundary_id", &m_boundaryId);
>   m_outputTree->Branch("layer_id", &m_layerId);
>   m_outputTree->Branch("approach_id", &m_approachId);
>   m_outputTree->Branch("sensitive_id", &m_sensitiveId);
>
> We can easily assign a sensitive volume ID, but what's all the rest? Are all of them necessary?

[benw22022](https://github.com/benw22022)
[2 weeks ago](https://github.com/WenjieWu-Sci/FLArE/issues/48#issuecomment-2730383141)
> Hi @mvicenzi ,
> 
> Thanks, will do! I think I have the HepMC parser mostly sorted - will try and open a PR for that soon.
> Honestly, I've not got much of a clue what all the IDs are, from what I understand they're related to how ACTS labels its geometry elements and indicates which volumes are connected to others. We can probably try and set them to zero for now and see if that works? Otherwise I think it might take some trial and error and perhaps some expert help - might be challenging since they'll need to match up with the ACTS implementation of the detector geometry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Output format update/discussion #2

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Output format update/discussion #2

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions