New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DD4hep workflow perf, step 1: All paths from SpecPar blocks with same name are stacked together #32505
Conversation
…lysis/data/trackingMaterialGroups_ForPhaseI.xml, which had identical names as the SpecPar blocks in trackerRecoMaterial.xml (!!). This file was added few months ago to all geometry scenarios (why is that? as it is only used for testing materials anyway). It has a very big impact on DD4hep Run3 workflow perf. The underlying issue is that in DetectorDescription/DDCMS, when processing XMLs files for DD4hep, all SpecPar sections with same name are merged together. This results in lookups in (extremely heavy !) paths of trackingMaterialGroups_ForPhaseI.xml being made while searching for TrackerRadLength, despite TrackerRadLength not being defined in those files anyway!
…d4hep (why is there a duplicated file??)
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32505/20438
|
A new Pull Request was created by @ghugo83 for master. It involves the following packages: SimTracker/TrackerMaterialAnalysis @cmsbuild, @civanch, @mdhildreth can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the names in the paths have some redundancy. I'd think, we could do better. It's not a request, but a suggestion that may help us to drop the namespaces :-)
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring1Disc2/pixel:EModule1Disc2/pixel:EModule1Disc2InnerPixelwafer/pixel:EModule1Disc2InnerPixelActive"/> | ||
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring2Disc2/pixel:EModule2Disc2/pixel:EModule2Disc2InnerPixelwafer/pixel:EModule2Disc2InnerPixelActive"/> | ||
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring3Disc2/pixel:EModule3Disc2/pixel:EModule3Disc2InnerPixelwafer/pixel:EModule3Disc2InnerPixelActive"/> | ||
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring4Disc2/pixel:EModule4Disc2/pixel:EModule4Disc2InnerPixelwafer/pixel:EModule4Disc2InnerPixelActive"/> | ||
<Parameter name="TrackingMaterialGroup" value="TrackerRecMaterialPhase2PixelForwardDisk2" /> | ||
</SpecPar> | ||
|
||
<SpecPar name="TrackerRecMaterialPhase2PixelForwardDisk3"> | ||
<SpecPar name="TrackingMaterialGroupPhase2PixelForwardDisk3"> | ||
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc3/pixel:Ring1Disc3/pixel:EModule1Disc3/pixel:EModule1Disc3InnerPixelwafer/pixel:EModule1Disc3InnerPixelActive"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghugo83 - it seems to me that the names could be simplified as well.
For example:
//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring1Disc2/pixel:EModule1Disc2/pixel:EModule1Disc2InnerPixelwafer/pixel:EModule1Disc2InnerPixelActive
could be simplified as:
//PixelEndcapPhase2/PixelDisk2/PixelRing1/PixelEModule1/PixelEInnerWafer/PixelEInnerActive
note, it's also should be Disk
not Disc
, I think :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, this is what I have done in the subsequent PR (gonna push it now for Phase 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha was not the namespace but the head volumes that I had done. Will also add the namespaces in the cases it leads to no regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the biggest effect from several orders of magnitude ended up to remove the regex from trackerPhase1RecoMaterial.xml (the Phase 2 RecoMaterial file has no regex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianna Phase 1: when I try to remove a namespace I get issues with old DD:
Qualify the name with a regexp for the namespace, i.e ".*:name-regexp" !
Will see whether it is different for Phase 2.
I have just removed the parent volumes in Phase 1 trackerRecoMaterial.xml to see, but it had no visible impact (regex removal instead, made the full DD4hep-based step 1 till event generator time from ~140 s to ~85s, while after removing the parent volumes, I still get ~85s).
I can push the cut of the parents volumes to the other PR though, does not harm.
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7e1120/11748/summary.html Comparison SummarySummary:
|
All Run 3 XML files must be given a new version (or "v1" if there isn't an existing "v" version), since there are already DB payloads with the previous XML. |
+1 It is Phase-2, I believe we can change XML in that case if we do not change actual geometry. |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
cc @kpedro88 |
@ghugo83 Isn't this file for Run 3? |
@ghugo83 In the master, due to this PR, |
Have you pushed the commit yet? I don't see it in the #32511. |
This commit is the second before-last in #32511, and the 2 last commits do not touch that file. Looking at the file after merge of #32511, things look fine as expected (ie, SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml is restored to default, and a v1 file is used instead). |
Summary: Reduce number of unnecessary calls to regex comparisons, by simply renaming SpecPar blocks (magic).
At XML parsing stage, https://github.com/cms-sw/cmssw/blob/master/DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc#L974 : all the paths of different SpecPar blocks, with the same block name, are gathered.
It occurs that many different XMLs use SpecPar sections with the same name, to define different parameters.
This has several occurrences, mentioned in [1].
This results, when doing a lookup for a specific parameter from XML, in looping over paths which are not of interest (because these paths are not associated with this parameter anyway).
For example: when looking for TrackerRadLength, all paths in Geometry/TrackerRecoData/data/PhaseI/trackerRecoMaterial.xml (expected) and SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml (not expected) are looked up. Indeed, the SpecPar blocks in these different XMLs have the same name.
A fix could be to simply add namespace at https://github.com/cms-sw/cmssw/blob/master/DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc#L974 , to directly distinguish the different SpecPar blocks. But this results in longer strings (negligible effect in that case), and more importantly, to potential regressions.
A simple fix is to just rename the SpecPar blocks defining different parameters, so that their paths end up stored in different vectors.
This leads to a big perf improvement: x1.4 speedup of overall step1 initialization (XML parsing and geo construction, up to event generator start) on DD4hep workflow on my local.
NB: This is reducing the 'number of calls'.
In a subsequent PR, I propose to remove all regex from trackerRecoMaterial.xml and trackingMaterialGroups_ForPhaseI.xml (this also has an effect on perf, even if less consequent).
NB 2: Why has trackingMaterialGroups_ForPhaseI.xml been (recently) included in the geometry scenarios? Is it really needed?
NB 3: Why is there a duplicated XML file for DD4hep (SimTracker/TrackerMaterialAnalysis/data/dd4hep_trackingMaterialGroups_ForPhaseII.xml) ?
[1] XMLs where this situation occurs:
Since trackingMaterialGroups_ForPhaseI.xm paths are very heavy, this is where the effect is the biggest, though this issue also appears for Zdc and Hcal.
@ianna @civanch @cvuosalo