Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DetectorChecksum memory consumption scales with complexity (and maxes out...) #1143

Closed
wdconinc opened this issue Jul 14, 2023 · 5 comments
Closed
Labels
bug Fixed Problem is solved Testing

Comments

@wdconinc
Copy link
Contributor

  • OS version: ubuntu 23.04
  • Compiler version: GCC 12.3.0
  • Package version: master 75e88dd
  • Reproduced by: (in the EIC ePIC geometry; still working on minimal example)
$ /usr/bin/time -v geoPluginRun -input $DETECTOR_PATH/epic_lfhcal_only.xml -plugin DD4hepDetectorChecksum
  • Input: still working on minimal example in the DD4hep source tree
  • Goal: The detector checksum currently works by determining the string representation for each detector, and then hashing the string. This potentially takes up a lot of memory for detectors with many replicated identical volumes (11+ GB below), before the hashing starts. A possible way to avoid this could be to determine the hash 'on the fly'. Since the hash64 function works on a byte-by-byte stream, this could maybe be a streambuf sink that calculates the hash on the fly instead of a stringstream, at least when no dump options are engaged. That would run in constant memory.
Output: full build and run log output
$ /usr/bin/time -v geoPluginRun -input $DETECTOR_PATH/epic_lfhcal_only.xml -plugin DD4hepDetectorChecksum
PersistencyIO    INFO  +++ Set Streamer to dd4hep::OpaqueDataBlock
Info in : Geometry default, Detector Geometry created
Info in : --- Maximum geometry depth set to 100
CompactLoader    INFO  +++ Processing compact file: /opt/local/share/epic/epic_lfhcal_only.xml with flag BUILD_DEFAULT
Compact          INFO  ++ Created successfully world volume 'world_volume'. shape: TGeoBBox material:Air.
Info in : Top volume is world_volume. Master volume is world_volume
Detector         INFO  *********** Use Top Node from manager as world volume [TGeoBBox].  Material: Air BBox: 3000 3000 10000
Compact          INFO  ++ Converted subdetector:LFHCAL of type epic_LFHCAL [calorimeter]
Info in : Fixing runtime shapes...
Info in : ...Nothing to fix
Info in : Counting nodes...
Info in : Voxelizing...
Info in : Building cache...
Info in : max level = 4, max placements = 1160
Info in : 1196442 nodes/ 301 volume UID's in Detector Geometry
Info in : ----------------modeler ready----------------
Detector         INFO  +++ Patching names of anonymous shapes....
DetectorChecksum WARN  +++ No Detector header information availible from the geometry description.
RunPlugin        ERROR ++ Exception while executing plugin DD4hepDetectorChecksum:
                std::bad_alloc
dd4hep: with plugin:DD4hepDetectorChecksum
geoPluginRun: Got uncaught exception: RunPlugin: ++ Exception while executing plugin DD4hepDetectorChecksum:
                std::bad_alloc
dd4hep: with plugin:DD4hepDetectorChecksum
Command exited with non-zero status 22
        Command being timed: "geoPluginRun -input /opt/local/share/epic/epic_lfhcal_only.xml -plugin DD4hepDetectorChecksum"
        User time (seconds): 202.22
        System time (seconds): 18.67
        Percent of CPU this job got: 85%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:18.84
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11864228
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 348789
        Minor (reclaiming a frame) page faults: 9429144
        Voluntary context switches: 349234
        Involuntary context switches: 4080
        Swaps: 0
        File system inputs: 12980840
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 22
@wdconinc wdconinc added the bug label Jul 14, 2023
@MarkusFrankATcernch
Copy link
Contributor

Hmm. Was not really aware it would grow that much.
I will have a look. Normally if one would disable text dumps for debugging
in memory, the actual usage should go down dramatically.

@MarkusFrankATcernch
Copy link
Contributor

@wdconinc @andresailer
This issue unfortunately cannot be handled in a backwards compatible way due to constructs like this:

      log << "<" << str_oper << nam
          << " lunit=\"" << m_len_unit_nam << "\""
          << " aunit=\"" << m_ang_unit_nam << "\">" << newline
          << " <first ref=\"" << (void*)ent_left.hash << "\""  << ">" << newline
          << "  " << handlePosition(mat_left).data << newline
          << "  " << handleRotation(mat_left).data << newline
          << " </first>" << newline
          << " <second ref=\"" << (void*)ent_right.hash << "\"" << ">" << newline
          << "  " << handlePosition(mat_right).data << newline
          << "  " << handleRotation(mat_right).data << newline
          << " </second>" << newline
          << "</" << str_oper << ">";

Whenever the hash string is used to define an aggregate (like here for a boolean solid, the string cannot be dropped easily. One could instead either

  • re-create the hash string or
  • use the hash value instead (64 bit integer), which would be not so nice for debugging. and using handleRotation(mat_right).hash instead of handleRotation(mat_right).data

I still find one of these two possibilities somehow more attractive than constantly updating a checksum,
which in turn would make the debugging of sub-units impossible.
What would you prefer ?

@MarkusFrankATcernch
Copy link
Contributor

@wdconinc Hi Wouter, can we close this issue ?

@wdconinc
Copy link
Contributor Author

Yes, I think this has been resolved, right?

@MarkusFrankATcernch
Copy link
Contributor

I think so. You have to specify now a flag if you want to keep the hash strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixed Problem is solved Testing
Projects
None yet
Development

No branches or pull requests

2 participants