Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DD4hep workflow perf, step 1: All paths from SpecPar blocks with same name are stacked together #32505

Merged
merged 3 commits into from Dec 17, 2020

Conversation

ghugo83
Copy link
Contributor

@ghugo83 ghugo83 commented Dec 16, 2020

Summary: Reduce number of unnecessary calls to regex comparisons, by simply renaming SpecPar blocks (magic).

At XML parsing stage, https://github.com/cms-sw/cmssw/blob/master/DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc#L974 : all the paths of different SpecPar blocks, with the same block name, are gathered.

It occurs that many different XMLs use SpecPar sections with the same name, to define different parameters.
This has several occurrences, mentioned in [1].

This results, when doing a lookup for a specific parameter from XML, in looping over paths which are not of interest (because these paths are not associated with this parameter anyway).

For example: when looking for TrackerRadLength, all paths in Geometry/TrackerRecoData/data/PhaseI/trackerRecoMaterial.xml (expected) and SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml (not expected) are looked up. Indeed, the SpecPar blocks in these different XMLs have the same name.

A fix could be to simply add namespace at https://github.com/cms-sw/cmssw/blob/master/DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc#L974 , to directly distinguish the different SpecPar blocks. But this results in longer strings (negligible effect in that case), and more importantly, to potential regressions.

A simple fix is to just rename the SpecPar blocks defining different parameters, so that their paths end up stored in different vectors.
This leads to a big perf improvement: x1.4 speedup of overall step1 initialization (XML parsing and geo construction, up to event generator start) on DD4hep workflow on my local.

NB: This is reducing the 'number of calls'.
In a subsequent PR, I propose to remove all regex from trackerRecoMaterial.xml and trackingMaterialGroups_ForPhaseI.xml (this also has an effect on perf, even if less consequent).

NB 2: Why has trackingMaterialGroups_ForPhaseI.xml been (recently) included in the geometry scenarios? Is it really needed?

NB 3: Why is there a duplicated XML file for DD4hep (SimTracker/TrackerMaterialAnalysis/data/dd4hep_trackingMaterialGroups_ForPhaseII.xml) ?

[1] XMLs where this situation occurs:

  • zdcsens.xml and zdcProdCuts.xml (blocks named zdc)
  • hcalRecNumbering.xml,HcalProdCuts.xml, hcalSimNumbering.xml, and hcalsenspmf.wml (blocks named hcal)
  • All blocks in trackerRecoMaterial.xml and trackingMaterialGroups_ForPhaseI.xml
    Since trackingMaterialGroups_ForPhaseI.xm paths are very heavy, this is where the effect is the biggest, though this issue also appears for Zdc and Hcal.

@ianna @civanch @cvuosalo

…lysis/data/trackingMaterialGroups_ForPhaseI.xml, which had identical names as the SpecPar blocks in trackerRecoMaterial.xml (!!). This file was added few months ago to all geometry scenarios (why is that? as it is only used for testing materials anyway). It has a very big impact on DD4hep Run3 workflow perf. The underlying issue is that in DetectorDescription/DDCMS, when processing XMLs files for DD4hep, all SpecPar sections with same name are merged together. This results in lookups in (extremely heavy !) paths of trackingMaterialGroups_ForPhaseI.xml being made while searching for TrackerRadLength, despite TrackerRadLength not being defined in those files anyway!
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32505/20438

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @ghugo83 for master.

It involves the following packages:

SimTracker/TrackerMaterialAnalysis

@cmsbuild, @civanch, @mdhildreth can you please review it and eventually sign? Thanks.
@vargasa, @makortel, @mtosi, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @apsallid, @ebrondol, @threus, @dgulhan this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

Copy link
Contributor

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the names in the paths have some redundancy. I'd think, we could do better. It's not a request, but a suggestion that may help us to drop the namespaces :-)

<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring1Disc2/pixel:EModule1Disc2/pixel:EModule1Disc2InnerPixelwafer/pixel:EModule1Disc2InnerPixelActive"/>
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring2Disc2/pixel:EModule2Disc2/pixel:EModule2Disc2InnerPixelwafer/pixel:EModule2Disc2InnerPixelActive"/>
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring3Disc2/pixel:EModule3Disc2/pixel:EModule3Disc2InnerPixelwafer/pixel:EModule3Disc2InnerPixelActive"/>
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring4Disc2/pixel:EModule4Disc2/pixel:EModule4Disc2InnerPixelwafer/pixel:EModule4Disc2InnerPixelActive"/>
<Parameter name="TrackingMaterialGroup" value="TrackerRecMaterialPhase2PixelForwardDisk2" />
</SpecPar>

<SpecPar name="TrackerRecMaterialPhase2PixelForwardDisk3">
<SpecPar name="TrackingMaterialGroupPhase2PixelForwardDisk3">
<PartSelector path="//pixfwd:Phase2PixelEndcap/pixel:Disc3/pixel:Ring1Disc3/pixel:EModule1Disc3/pixel:EModule1Disc3InnerPixelwafer/pixel:EModule1Disc3InnerPixelActive"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghugo83 - it seems to me that the names could be simplified as well.

For example:

//pixfwd:Phase2PixelEndcap/pixel:Disc2/pixel:Ring1Disc2/pixel:EModule1Disc2/pixel:EModule1Disc2InnerPixelwafer/pixel:EModule1Disc2InnerPixelActive

could be simplified as:

//PixelEndcapPhase2/PixelDisk2/PixelRing1/PixelEModule1/PixelEInnerWafer/PixelEInnerActive

note, it's also should be Disk not Disc, I think :-)

Copy link
Contributor Author

@ghugo83 ghugo83 Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, this is what I have done in the subsequent PR (gonna push it now for Phase 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha was not the namespace but the head volumes that I had done. Will also add the namespaces in the cases it leads to no regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the biggest effect from several orders of magnitude ended up to remove the regex from trackerPhase1RecoMaterial.xml (the Phase 2 RecoMaterial file has no regex)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianna Phase 1: when I try to remove a namespace I get issues with old DD:
Qualify the name with a regexp for the namespace, i.e ".*:name-regexp" !
Will see whether it is different for Phase 2.

I have just removed the parent volumes in Phase 1 trackerRecoMaterial.xml to see, but it had no visible impact (regex removal instead, made the full DD4hep-based step 1 till event generator time from ~140 s to ~85s, while after removing the parent volumes, I still get ~85s).
I can push the cut of the parents volumes to the other PR though, does not harm.

@ianna
Copy link
Contributor

ianna commented Dec 16, 2020

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7e1120/11748/summary.html
CMSSW: CMSSW_11_3_X_2020-12-16-1100/slc7_amd64_gcc900

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 36
  • DQMHistoTests: Total histograms compared: 2747311
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2747288
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 35 files compared)
  • Checked 153 log files, 37 edm output root files, 36 DQM output files

@cvuosalo
Copy link
Contributor

All Run 3 XML files must be given a new version (or "v1" if there isn't an existing "v" version), since there are already DB payloads with the previous XML.

@civanch
Copy link
Contributor

civanch commented Dec 16, 2020

+1

It is Phase-2, I believe we can change XML in that case if we do not change actual geometry.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@silviodonato
Copy link
Contributor

+1

It is Phase-2, I believe we can change XML in that case if we do not change actual geometry.

cc @kpedro88

@cmsbuild cmsbuild merged commit fd1d7de into cms-sw:master Dec 17, 2020
@cvuosalo
Copy link
Contributor

@ghugo83 Isn't this file for Run 3?
SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml.
If so, it requires a "v1" version.

@cvuosalo
Copy link
Contributor

@ghugo83 In the master, due to this PR,
SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml has been changed. It needs to be reverted to its previous version.

@ghugo83
Copy link
Contributor Author

ghugo83 commented Dec 18, 2020

It is fine, I did it in #32511
At this commit: c2f90bb

@cvuosalo
Copy link
Contributor

Have you pushed the commit yet? I don't see it in the #32511.

@ghugo83
Copy link
Contributor Author

ghugo83 commented Dec 21, 2020

This commit is the second before-last in #32511, and the 2 last commits do not touch that file.
But yes, 'git checkout commitHash file' sometimes leads to weird overall diff.

Looking at the file after merge of #32511, things look fine as expected (ie, SimTracker/TrackerMaterialAnalysis/data/trackingMaterialGroups_ForPhaseI.xml is restored to default, and a v1 file is used instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants