-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try building with -O3
by default ?
#44931
Comments
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
cms-bot internal usage |
A new Issue was created by @fwyzard. @smuzaffar, @makortel, @Dr15Jones, @sextonkennedy, @antoniovilela, @rappoccio can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The reference, using
The same rebuilt with
|
@missirol FYI |
I wonder what level of physics validation will be needed. I guess the tests in cms-sw/cmsdist#9183 at least will tell us something. |
In principle we should expect the same results with |
cms-sw/cmsdist#9183 (comment) indeed suggests there would be no changes in results (or, any potential change would be more rare than probed by the PR tests) |
One could hope. At |
An indeed there are differences :-/ Running the HLT over 58k events, I find 51 events with at least one difference:
The results are reproducible:
|
@cms-sw/reconstruction-l2 do you have any suggestions how we should proceed, to identify the differences in the reconstruction due to the more aggressive compiler optimisations ? |
I've used the build from cms-sw/cmsdist#9183 to compare the performance of the With
With
for a 1.6% speed up. |
@cms-sw/reconstruction-l2 FYI |
Repeating the measurement with more jobs and higher statistics, the results look more stable, with a 1.4% speed-up.
|
I believe that the best would be to produce the usual Reco Timing pie chart in order to identify which RECO modules are responsible of the gain. I have to figure out how to plug cms-sw/cmsdist#9183 in a stand alone way, would you have a recipe? From https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-be9632/39349/summary.html it looks like, in principle, there are no salient physics RECO changes arising from the change. Thanks |
If you can generate the pie chart plots from a local release, you can run
to set up a local build of |
I've got some results based on lxplus8 but I don't trust them since they are somewhat fluctuating, I guess due to the lack of a dedicated machine. Since el8 is needed, could you advise on where to perform this test? For RECO profiling we used to launch tests on vocms011 but it is cc7 and the rebuilt with -O3 asks for el8. |
You can use a container (e.g. by running |
I have finally some sensible results on a dedicated machine (cmsdev40), but based on 100 event jobs, so uncertainty is still large. I have added a Run4 PU profile to the wf that Andrea used (wf 12834.0) trying to enlarge the gain over MkFit and Pixel/Track seeding modules. That's why you see that an event takes ~15s to be reconstructed (single thread and single stream, FastTimer does not allow to run multistream[1]) Base CMSSW_14_1_X_2024-05-12-0000: Base+cms-sw/cmsdist#9183 + Rebuild: The gain seems general at the level 1% specially in RecoTracker, RecoParticleFlow and recoVertex. I am running now on more events trying to reduce the uncertainty [1] The framework is configured to use at least two streams, but the following modules |
The real cause is that process.options.numberOfConcurrentLuminosityBlocks = 1 after which any number of streams should work, at the cost of LuminosityBlock transition becoming more costly (which probably is not that important for this test) The workaround should actually have been part of the exception message cmssw/FWCore/Framework/src/EventProcessor.cc Lines 2477 to 2479 in e95546f
Is there some way we could improve the exception message to make it more clear? |
The |
I have removed the DQM and Validation parts of the step3 chunck that I am testing, but it seems that still the setting from Matti is needed |
This is my cmsDriver config: cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM --conditions auto:phase1_2022_realistic --datatier AODSIM -n 100 --eventcontent AODSIM --geometry DB:Extended --era Run3 --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_13_1_0_pre1-130X_mcRun3_2022_realistic_withNewBSFromEOY2022Data_v2_RV186-v1/GEN-SIM --filein file:step2.root --fileout file:step3.root --customise HLTrigger/Timer/FastTimer.customise_timer_service_singlejob --customise_commands "process.FastTimerService.writeJSONSummary=True;process.FastTimerService.jsonFileName="step3_circles.json"" &> step3_TTbar_14TeV+2021PU_ProdLike.log |
Do you actually care about the timing DQM plots ? If you don't (for example, if you are interested only in the text log and or in the JSON output), it would be better not to enable the DQM part at all: cmsDriver.py step3 ... --customise HLTrigger/Timer/FastTimer.addService --customise_commands 'process.FastTimerService.enableDQM=False;process.FastTimerService.writeJSONSummary=True;process.FastTimerService.jsonFileName="step3_circles.json"' |
OK, thanks a lot, that explains why I could not run multithreading |
Probably not, it was my fault since I misread the workaround because there is a new empty line between the problem definition and the solution. I am sorry |
Results with 1k events which confirm again the gain spread across all Reco packages without a clear winner: Base+cms-sw/cmsdist#9183 + Rebuild: https://www.hep.uniovi.es/jfernan/circles/web/piechart.php?local=false&dataset=pr9183cmsdist_11834.21_withPU_1kevents&resource=time_real&colours=default&groups=packages&show_labels=true&threshold=0 |
IMHO we could simply merge it, and use the next pre-release validation to spot any problems |
The plan from ORP+Core today is to
|
Asan Ib also has similar error [a] [a]
|
@smuzaffar it appears that CMSSW_14_1_ASAN_X_2024-05-27-2300 is not on cvmfs. It that on purpose or is there some delay? |
@Dr15Jones , it is installed on cvmfs
|
I used the same command at FNAL
|
looks like issue with cvmfs stratum 1 at FNAL. https://cvmfs-monitor-frontend.web.cern.ch/cms-ib.cern.ch shows that FNAL Stratum 1 is at revision |
In this case there is a code path where |
Probably, there's several of these where I don't see how the compiler is inferring that there could be a problem. This one, however, is extra-weird--if I add just:
it fixes all three instances of the compilation error. So somehow the flow analysis of the three loops are correlated by the optimizer? |
The sauce thickens. If I put that assertion in all three places where we were getting the error, the error returns in all three spots. I think what may be happening is that the optimizer is seeing very similar operations in all three loops and is merging the analysis of those parts, and somehow inferring that somewhere in the combined analysis there's a possible path that gives a nullptr this. If I break the symmetry by inserting an extra branch in one of the three, then the interference pattern disappears. Restoring the symmetry by putting identical polarizers on all three branches restores the interference pattern. |
CTTPS problem was worked around in #45304. @smuzaffar Could we close this issue? |
@makortel , yes we can close it now |
+core |
@cmsbuild, please close |
This issue is fully signed and ready to be closed. |
Now it thickened even more: CMSSW_14_1_ASAN_X_2024-08-21-2300 had a build failure with exactly same symptoms in another loop in the same file. The earlier ASAN build was successful, and there were no relevant code changes in between. I opened #45786 that applies the same workaround in the reported place. Let's see if we'll eventually get to patch the third place too... |
While testing local PGO builds, I noticed1 that the optimisation flags enabled by
-fprofile-use
have a large overlap with those enabled by-O3
:enabled only by
-O3
enabled either by
-O3
or-fprofile-use
enabled only by
-fprofile-use
While working to make PGO easier to use, would it make sense to try (again?) the impact of
-O3
on a full CMSSW build ?This might be even more interesting in conjunction with
-march=x86-63-v3
or-march=x86-63-v4
.[1] https://gcc.gnu.org/onlinedocs/gcc-12.3.0/gcc/Optimize-Options.html
The text was updated successfully, but these errors were encountered: