Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Generating ./geometry/*.root during build #335

Open
klenze opened this issue Mar 4, 2020 · 14 comments
Open

Proposal: Generating ./geometry/*.root during build #335

klenze opened this issue Mar 4, 2020 · 14 comments

Comments

@klenze
Copy link
Contributor

klenze commented Mar 4, 2020

From what I understand, the geometry root files are generated from macros/geo/create_${DETECTOR}.C.

Still, they are added to the repository. This could cause some possible problems:

  • Incremental changes become opaque. From gits point of view, a binary blob is just replaced by another binary blob.
  • There is always the exciting possibility that the generating macros and the binary root geometry file diverge, because someone added the binary change in the main repository but not the macro change in the macros repository, or vice versa.
  • Furthermore, at least the floating point operations used to generate the geometries are implementation dependent. So "let's run the create script and do a diff" is right out. One would have to do a semantic comparision.
  • Finally, this seems to encourage duplication. (See [rant] Code duplication #317 .) At the moment, we have 14 geometries for califa, and 9 different create_califa_*.C scripts. Given that I had to re-add support for the 2019/s444 geometry, I would wager that most versions will fail to work with the current code base.

The largest file in geometries is some 150kBytes. I think it is both feasible and desirable to auto-generate the supported versions of these files (e.g. 2019, 2020) during the cmake build process.

What do you think?

@hapol
Copy link
Contributor

hapol commented Mar 5, 2020

To my understanding the files in macros/geo/ are under the code control and evolve with time, while the existing root files should not be modified, as they correspond to particular situations, and therefore they do not evolve. If a root file is modified, then it is stored under other name. If this cause a problem (space, info, ...), it could be modified.
So, regarding the problems:

  • There should be no incremental changes in the root files, they represent some stable geometries.
  • It's true that they can diverge, and it would be great to have a kind of meta-file or README specifying how each useful root file is created from a (git-labelled) macro.
  • Is this true? is there any case where you have seen a difference in a root file?
  • As you know, the new CALIFA code is only supporting CALIFA_2020 geometry. Other legacy geometries are still not supported or require an older R3BRoot version. So, it would be necessary to clean...

@janmayer
Copy link
Contributor

janmayer commented Mar 6, 2020

I see this issue as well, and would like to raise some further points:

There should be no incremental changes in the root files, they represent some stable geometries.

  • I agree that they do represent a stable geometry, but they are produced from a macro - thus that macro they were created from is the actual stable configuration. If this is macro, or the way in which it was invoked, is lost (which might have already happened), it is hard to reproduce the root file or analyze what it entails / what changed.

There is always the exciting possibility that the generating macros and the binary root geometry file diverge, because someone added the binary change in the main repository but not the macro change in the macros repository, or vice versa.
Other legacy geometries are still not supported or require an older R3BRoot version. So, it would be necessary to clean...

  • In addition, the detector class might develop as well and not work with the older geometries - all combinations should be tested (wishful thinking).

I think it is both feasible and desirable to auto-generate the supported versions of these files (e.g. 2019, 2020) during the cmake build process.

  • I would go even further and propose that the detector simulation classes could create the geometry on-demand. For example, with NeuLAND, I already have an option to ask the Simulation class to get the right geometry for a number of double planes and move it into position without having to worry about the file name or version. It would not be a large step to have it produce the geometry from scratch on-the-fly.
  • I think it is semantically not correct to have the geometry-creating-macros under /marcos/geo. I would say that they (or most of them) belong closely to the individual detector and probably should be under /<detector>/geometry oder /<detector>/simulation. *

*) Tangent topic: As there are so many folders in the root dir, it has become quite cluttered. In general, I would propose moving all detectors to /detectors/ or at least put all the fibers under /fibers/<fiXX>. What is the difference between all those fiber classes anyway? Would a template help?

@vadimr3b
Copy link
Contributor

vadimr3b commented Mar 6, 2020

I would agree with Jan and produce the geometries on the fly for the simulation.
If we have different states for the detector, we should rather solve this by git branchs (e.g. s444).
Like this you can do a simulation with the setup matching your experiment by simply checking out the branch and everything should work.

I also agree to Jans * ;)

@janmayer
Copy link
Contributor

janmayer commented Mar 6, 2020

If we have different states for the detector, we should rather solve this by git branchs (e.g. s444).

Or/And individual repositories for each experiment instead of that big macro repo.

@klenze
Copy link
Contributor Author

klenze commented Mar 10, 2020

The idea for this issue came to me when I read this code:

int R3BCalifaGeometry::GetCrystalId(const char* volumePath)
{
[...]
    Int_t alveolusCopy = atoi(volumePath + 34); 
    // converting to int the alveolus copy
    if (alveolusCopy < 10)
        cryType = atoi(volumePath + 76); // converting to int the crystal type
    else
        cryType = atoi(volumePath + 77); // converting to int the crystal type

What happens here is that we try to extract numbers from a string in a geometry file. As the string was created using the format specifier %d in one place instead of e.g. %02d, we now have to work around this and catch the case where the first number is larger than 9.

The lamentable need to do string parsing in the inner loop of the simulation non-withstanding, a better solution would have been to update the califa 2020 geometry to use %02d. If geometry files were though of as easily generated, transient files (such as the cxx.o files are), this update would have been easy. If they are instead thought of as fixed, essential components of our project, we will always need workarounds like that.

I would furthermore propose to move both the generation macros (and the macros used in test cases) to the main repository, so that there is a clear correspondence between macro versions and detector code versions. This would take a lot of detective work (e.g. "this main repo commit was followed by that macros commit by the same author, so the two versions are probably intended to work with each other") out of the job of people wanting to run old versions. (We can still keep the macros subrepository as a Schmuddelecke for code which is neither compiled nor tested against the latest version of the detector classes and thus will probably not work.)

While talking about hypothetical changes, I would also propose making "dev" the default branch of R3BRoot, imposing a function length limit of 30 lines, reject any PRs containing the string "sprintf" with prejudice and providing all code authors with fusion powered jetpacks. :-)

@hapol
Copy link
Contributor

hapol commented Mar 10, 2020

I guess you are at the wrong forum making your suggestion: your comment "using the format specifier %d in one place instead of e.g. %02d" and "The lamentable need to do string parsing in the inner loop of the simulation non-withstanding, a better solution would have been to update the califa 2020 geometry to use %02d" correspond to features of the TGeoVolume::AddNode() function in ROOT VMC, and not to the macros that we made for the CALIFA geometry. https://root.cern.ch/doc/master/TGeoVolume_8cxx_source.html#l00930

As the only way to retrieve the object is using the volume path, we should perform the string parsing. As I told you previously, if you know another method to access to the volume information, please, let us know.

@janmayer
Copy link
Contributor

Yeah, the ROOT geometry stuff is extremely bad - and to think this terrible interface is at the core of our simulations... Can we just drop ROOT Geo and just use pure Geant4?

On Topic: Could you use gMC->CurrentVolOffID(1, fPaddleID); like here?

@klenze
Copy link
Contributor Author

klenze commented Mar 11, 2020

I guess you are at the wrong forum making your suggestion: your comment "using the format specifier %d in one place instead of e.g. %02d" and "The lamentable need to do string parsing in the inner loop of the simulation non-withstanding, a better solution would have been to update the califa 2020 geometry to use %02d" correspond to features of the TGeoVolume::AddNode() function in ROOT VMC, and not to the macros that we made for the CALIFA geometry. https://root.cern.ch/doc/master/TGeoVolume_8cxx_source.html#l00930

Ooops. I stand corrected. Sorry.

As the only way to retrieve the object is using the volume path, we should perform the string parsing. As I told you previously, if you know another method to access to the volume information, please, let us know.

I do not have a method which does not break the TVirtualMC abstraction. :-(
(Creating a new class extending TGeoVolume to include a CrystalID field might be an option, but I would not count on TGeoWhatever supporting this).

That being said, I noticed a speedup by several orders of magnitudes when I stopped simulating 4.4GeV photons and started with 4.4MeV photons instead, so my complaints about speed are mostly a thing of the past now. :-)

@inkdot7
Copy link
Contributor

inkdot7 commented Mar 15, 2020

If we have different states for the detector, we should rather solve this by git branchs (e.g. s444).

Or/And individual repositories for each experiment instead of that big macro repo.

General remark: branches are useful for code development, not history tracking of a setup.

Please do not use git branches to try to handle things that code-wise shall survive for a long time. If there were two different geometries in use at different experiments, both (generating macros!) should be in the code, in (all later) branches. (Even better: not separate macros, but options within the macro to handle the differences.)

If needed, then separate repositories for different experiments. (With only the experiment-specific stuff of course, not duplicating/forking the big project.)

@klenze
Copy link
Contributor Author

klenze commented Mar 19, 2020

Yeah, the ROOT geometry stuff is extremely bad - and to think this terrible interface is at the core of our simulations... Can we just drop ROOT Geo and just use pure Geant4?

This, 100%.

Today, while waiting for a 10M event simulation to finish, I tried to get rid of the ">>> Event " ... lines our simulation includes once per event. While there is obviously a simple way to do that which I am totally missing, I had to:

0.) break abstraction by casting TVirtualMC to TGeant4
1.) break access specifiers to get to TGeant4::fRunManager
2.) break access specifiers to TG4RunManager::fRunManager, which is the actual G4RunManager
3.) dynamic_cast the result of G4RunManager::GetUserEventAction() to const TG4EventAction*
4.) const_cast that to TG4EventAction*
5.) break access specifiers to call TG4EventAction::VerboseLevel()

So, all in all, it just took five evil things (excluding step 3) to get rid of the spam. Needless to say, by the time I was done, so was my simulation.

Also notice that @janmayer, who proposed moving to raw Geant4 is a NeuLAND person. From my point of view, this means that the argument "We need to support Geant3 because it handles neutrons better" is probably no longer true.

\begin{soapbox}
In general, my attitude towards using 'the ROOT way of doing something' is rather dissimilar to that of the ${EXP}Root projects. Personally, I would crawl through miles of desert to get from TList to std::list, TVirtualMC to Geant4, TFormula to lambda expressions and so on. I can pragmatically accept that for some tasks (plotting histograms, handling event data), ROOT is a viable solution, but I am not really in the "ROOT macros can do anything, so let's use them to store our simulation configuration" camp.
\end{soapbox}

@janmayer
Copy link
Contributor

Also notice that @janmayer, who proposed moving to raw Geant4 is a NeuLAND person. From my point of view, this means that the argument "We need to support Geant3 because it handles neutrons better" is probably no longer true.

Thank you for the confidence in my expertise, but that's maybe a bit too much.

I simply don't use Geant3 anymore - its super annoying to install, as the "modern" fortran compilers don't want to compile it anymore, i.e. I have to change my devtoolset during the compilation of FairSoft. It often simply refuses to work for me.

It does give different results from Geant4, depending on the physics lists, but I can't really tell if they are better - but I can tell that Geant4 is updated and benchmarked regularly by a large user base.

Overall I simply think Geant3 is simply obsolete, and with it so is TVirtualMC and the whole ROOT geometry module.

@inkdot7
Copy link
Contributor

inkdot7 commented Feb 12, 2023

@klenze - some text to describe to the reader about when / with which commits this issue was resolved?

@klenze
Copy link
Contributor Author

klenze commented Feb 13, 2023

@inkdot7: Sorry, I was just cleaning up my stuff.

In the meantime, @jose-luis-rs has implemented automatic generation of ${CMAKE_SOURCE_DIR}/geometry/ geometry root files during the cmake invocation.

At least for CALIFA, the correctness of the result very much seems to depend on the exact platform used. So it seems that without a way to check if a geometry file is good, auto-creating geometry files might not be a good idea.

If you have an idea of how to check if two compressed, opaque root files contain compatible floating point values, that would be helpful. Personally, I find the syntax to extract even the center of a crystal a bit byzantine compared to plain geant4.

Jan's suggestion to move to plain Geant4 would be my preferred solution as well, but this is very much a minority view.

@klenze klenze reopened this Feb 13, 2023
@inkdot7
Copy link
Contributor

inkdot7 commented Feb 14, 2023

At least for CALIFA, the correctness of the result very much seems to depend on the exact platform used.

Does this mean that actual results (e.g. simulated spectra) vary a lot (beyond random fluctuations) (= bug), or 'just' that the generated geometry files have slightly different values due to rounding differences?

I do not have anything to compare generic .root files for compatibility, but if spectra can be generated, this fuzzy spectra CI helper may be of some use:

https://git.chalmers.se/expsubphys/fuzzyspectra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants