Try building with `-O3` by default ? #44931

fwyzard · 2024-05-08T16:38:59Z

While testing local PGO builds, I noticed1 that the optimisation flags enabled by -fprofile-use have a large overlap with those enabled by -O3:

enabled only by `-O3`

-fsplit-paths
-ftree-partial-pre

enabled either by `-O3` or `-fprofile-use`

-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-ftree-loop-distribution
-ftree-loop-distribute-patterns
-funswitch-loops
-fvect-cost-model=dynamic
-fversion-loops-for-strides

enabled only by `-fprofile-use`

-fbranch-probabilities
-fprofile-reorder-functions
-fprofile-values
-ftracer
-funroll-loops
-fvpt

While working to make PGO easier to use, would it make sense to try (again?) the impact of -O3 on a full CMSSW build ?

This might be even more interesting in conjunction with -march=x86-63-v3 or -march=x86-63-v4.

[1] https://gcc.gnu.org/onlinedocs/gcc-12.3.0/gcc/Optimize-Options.html

The text was updated successfully, but these errors were encountered:

fwyzard · 2024-05-08T16:39:07Z

assign core

cmsbuild · 2024-05-08T16:39:35Z

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2024-05-08T16:39:36Z

cms-bot internal usage

cmsbuild · 2024-05-08T16:39:36Z

A new Issue was created by @fwyzard.

@smuzaffar, @makortel, @Dr15Jones, @sextonkennedy, @antoniovilela, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

fwyzard · 2024-05-09T23:49:37Z

-O3 seems to have a non-negligible impact on the HLT performance.

The reference, using CMSSW_14_0_6_MULTIARCHS in x86-64-v3 mode, is

Running 4 times over 10300 events with 8 jobs, each with 32 threads, 24 streams and 1 GPUs
   691.1 ±   0.2 ev/s (10000 events, 98.6% overlap)
   693.9 ±   0.2 ev/s (10000 events, 98.4% overlap)
   695.3 ±   0.2 ev/s (10000 events, 99.4% overlap)
   693.4 ±   0.2 ev/s (10000 events, 99.1% overlap)
 --------------------
   693.5 ±   1.7 ev/s

The same rebuilt with -O3 gives a 2% speed up:

Running 4 times over 10300 events with 8 jobs, each with 32 threads, 24 streams and 1 GPUs
   710.6 ±   0.2 ev/s (10000 events, 99.3% overlap)
   709.1 ±   0.2 ev/s (10000 events, 98.9% overlap)
   711.9 ±   0.2 ev/s (10000 events, 99.1% overlap)
   710.3 ±   0.2 ev/s (10000 events, 98.9% overlap)
 --------------------
   710.5 ±   1.2 ev/s

fwyzard · 2024-05-09T23:49:54Z

@missirol FYI

makortel · 2024-05-10T14:45:45Z

I wonder what level of physics validation will be needed. I guess the tests in cms-sw/cmsdist#9183 at least will tell us something.

fwyzard · 2024-05-10T15:33:54Z

In principle we should expect the same results with -O2 and -O3 ?

makortel · 2024-05-10T19:28:11Z

cms-sw/cmsdist#9183 (comment) indeed suggests there would be no changes in results (or, any potential change would be more rare than probed by the PR tests)

dan131riley · 2024-05-11T13:12:34Z

In principle we should expect the same results with -O2 and -O3 ?

One could hope. At -O3 the compiler will be more aggressive about taking advantage of undefined behavior, so different results are likely bugs.

fwyzard · 2024-05-13T14:12:55Z

An indeed there are differences :-/

Running the HLT over 58k events, I find 51 events with at least one difference:

      Events    Accepted      Gained        Lost       Other  Trigger
       58275           1           -           -         ~10  HLT_DoubleEle8_CaloIdM_TrackIdM_Mass8_DZ_PFHT350_v28
       58275           2           -           -         ~10  HLT_DoubleEle8_CaloIdM_TrackIdM_Mass8_PFHT350_v28
       58275         144           -          -2          ~1  HLT_Ele30_WPTight_Gsf_v9
       58275         137           -          -2          ~1  HLT_Ele32_WPTight_Gsf_v23
       58275         113           -          -2          ~1  HLT_Ele35_WPTight_Gsf_v17
       58275          86           -          -1          ~1  HLT_Ele38_WPTight_Gsf_v17
       58275          74           -           -          ~1  HLT_Ele40_WPTight_Gsf_v17
       58275         139           -          -2          ~1  HLT_Ele32_WPTight_Gsf_L1DoubleEG_v17
       58275           2           -           -          ~1  HLT_Mu8_Ele8_CaloIdM_TrackIdM_Mass8_PFHT350_DZ_v27
       58275           5           -           -          ~1  HLT_Mu8_Ele8_CaloIdM_TrackIdM_Mass8_PFHT350_v27
       58275          13           -           -          ~3  HLT_Ele23_Ele12_CaloIdL_TrackIdL_IsoVL_DZ_v27
       58275          23          +1          -1          ~1  HLT_Ele23_Ele12_CaloIdL_TrackIdL_IsoVL_v27
       58275          49           -          -1          ~1  HLT_Ele30_eta2p1_WPTight_Gsf_CentralPFJet35_EleCleaned_v21
       58275          27           -           -          ~2  HLT_Ele28_eta2p1_WPTight_Gsf_HT150_v21
       58275           7           -           -         ~16  HLT_Ele28_HighEta_SC20_Mass55_v21
       58275           6           -           -          ~1  HLT_DiMu4_Ele9_CaloIdL_TrackIdL_DZ_Mass3p8_v25
       58275          11           -           -          ~2  HLT_Ele14_eta2p5_IsoVVVL_Gsf_PFHT200_PNetBTag0p53_v1
       58275          17           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_PFJet165_v26
       58275           9           -           -          ~1  HLT_Ele24_eta2p1_WPTight_Gsf_LooseDeepTauPFTauHPS30_eta2p1_CrossL1_v9
       58275           5           -           -          ~1  HLT_Ele24_eta2p1_WPTight_Gsf_PNetTauhPFJet30_Tight_eta2p3_CrossL1_v2
       58275           7           -           -          ~1  HLT_Ele24_eta2p1_WPTight_Gsf_PNetTauhPFJet30_Medium_eta2p3_CrossL1_v2
       58275           8           -           -          ~1  HLT_Ele24_eta2p1_WPTight_Gsf_PNetTauhPFJet30_Loose_eta2p3_CrossL1_v2
       58275          27           -           -          ~1  HLT_QuadPFJet103_88_75_15_PNetBTag_0p4_VBF2_v2
       58275          23           -           -          ~1  HLT_QuadPFJet103_88_75_15_PNet2BTag_0p4_0p12_VBF1_v2
       58275          25           -           -          ~1  HLT_QuadPFJet105_88_76_15_PNetBTag_0p4_VBF2_v2
       58275          22           -           -          ~1  HLT_QuadPFJet105_88_76_15_PNet2BTag_0p4_0p12_VBF1_v2
       58275          16           -           -          ~1  HLT_QuadPFJet111_90_80_15_PNetBTag_0p4_VBF2_v2
       58275          20           -           -          ~1  HLT_QuadPFJet111_90_80_15_PNet2BTag_0p4_0p12_VBF1_v2
       58275         718           -           -          ~1  HLT_QuadPFJet100_88_70_30_v6
       58275         602           -           -          ~1  HLT_QuadPFJet105_88_75_30_v5
       58275         485           -           -          ~1  HLT_QuadPFJet111_90_80_30_v5
       58275          17           -           -          ~1  HLT_QuadPFJet100_88_70_30_PNet1CvsAll0p5_VBF3Tight_v6
       58275          14           -           -          ~1  HLT_QuadPFJet105_88_75_30_PNet1CvsAll0p5_VBF3Tight_v6
       58275           4           -           -          ~1  HLT_QuadPFJet111_90_80_30_PNet1CvsAll0p6_VBF3Tight_v6
       58275           2           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_AK8PFJet220_SoftDropMass40_v8
       58275           0           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_AK8PFJet220_SoftDropMass40_PNetBB0p06_v5
       58275           2           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_AK8PFJet230_SoftDropMass40_v8
       58275           0           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_AK8PFJet230_SoftDropMass40_PNetBB0p06_v5
       58275           0           -           -          ~3  HLT_Ele50_CaloIdVT_GsfTrkIdT_AK8PFJet230_SoftDropMass40_PNetBB0p10_v5
       58275          36          +1           -          ~2  HLT_DoubleEle6p5_eta1p22_mMax6_v8
       58275          28           -           -          ~2  HLT_DoubleEle8_eta1p22_mMax6_v8
       58275          14           -           -          ~1  HLT_DoubleEle10_eta1p22_mMax6_v8
       58275         227          +5          -4          ~1  HLT_SingleEle8_v7
       58275         196           -           -          ~1  HLT_VBF_DiPFJet75_45_Mjj800_DiPFJet60_v2
       58275         177           -           -          ~1  HLT_VBF_DiPFJet75_45_Mjj850_DiPFJet60_v2

The results are reproducible:

2 runs with the baseline release (-O2) show no differences between them
2 runs with the rebuilt release (-O3) show no differences between them
-O3 vs -O2 gives the differences above

fwyzard · 2024-05-13T14:30:33Z

@cms-sw/reconstruction-l2 do you have any suggestions how we should proceed, to identify the differences in the reconstruction due to the more aggressive compiler optimisations ?

fwyzard · 2024-05-15T16:32:37Z

-O3 seems to have a non-negligible impact also on the performance of the offline reconstruction.

I've used the build from cms-sw/cmsdist#9183 to compare the performance of the step3 of the workflow 12834.0_TTbar_14TeV+2024, running over 100 TTbar events with pileup from /RelValTTbar_14TeV/CMSSW_14_1_0_pre2-PU_140X_mcRun3_2024_realistic_v7_STD_2024_PU-v1/GEN-SIM-DIGI-RAW.

With CMSSW_14_1_X_2024-05-12-0000 I get

Running 3 times over 130 events with 1 jobs, each with 8 threads, 8 streams and 0 GPUs
   0.440 ± 0.007 ev/s (100 events)
   0.447 ± 0.005 ev/s (100 events)
   0.448 ± 0.005 ev/s (100 events)
 --------------------
   0.445 ± 0.004 ev/s

With CMSSW_14_1_X_2024-05-12-0000 plus cms-sw/cmsdist#9183 I get

Running 3 times over 130 events with 1 jobs, each with 8 threads, 8 streams and 0 GPUs
   0.449 ± 0.007 ev/s (100 events)
   0.455 ± 0.009 ev/s (100 events)
   0.453 ± 0.008 ev/s (100 events)
 --------------------
   0.452 ± 0.003 ev/s

for a 1.6% speed up.

fwyzard · 2024-05-15T16:37:16Z

@cms-sw/reconstruction-l2 FYI

fwyzard · 2024-05-16T05:26:20Z

Repeating the measurement with more jobs and higher statistics, the results look more stable, with a 1.4% speed-up.

`CMSSW_14_1_X_2024-05-12-0000` (`-O2`)

Running 4 times over 930 events with 16 jobs, each with 8 threads, 8 streams and 0 GPUs
   6.203 ± 0.002 ev/s (900 events, 99.0% overlap)
   6.202 ± 0.003 ev/s (900 events, 99.2% overlap)
   6.213 ± 0.002 ev/s (900 events, 99.3% overlap)
   6.193 ± 0.002 ev/s (900 events, 99.3% overlap)
 --------------------
   6.203 ± 0.008 ev/s

cms-sw/cmsdist#9183 (`-O3`)

Running 4 times over 930 events with 16 jobs, each with 8 threads, 8 streams and 0 GPUs
   6.291 ± 0.002 ev/s (900 events, 99.4% overlap)
   6.297 ± 0.002 ev/s (900 events, 99.1% overlap)
   6.290 ± 0.003 ev/s (900 events, 98.8% overlap)
   6.288 ± 0.002 ev/s (900 events, 99.1% overlap)
 --------------------
   6.291 ± 0.004 ev/s

jfernan2 · 2024-05-16T07:44:36Z

I believe that the best would be to produce the usual Reco Timing pie chart in order to identify which RECO modules are responsible of the gain. I have to figure out how to plug cms-sw/cmsdist#9183 in a stand alone way, would you have a recipe?

From https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-be9632/39349/summary.html it looks like, in principle, there are no salient physics RECO changes arising from the change.

Thanks

fwyzard · 2024-05-16T09:36:12Z

I have to figure out how to plug cms-sw/cmsdist#9183 in a stand alone way, would you have a recipe?

If you can generate the pie chart plots from a local release, you can run

/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9183/39349/install.sh

to set up a local build of CMSSW_14_1_X_2024-05-12-0000 rebuilt with -O3.

jfernan2 · 2024-05-17T15:09:55Z

I've got some results based on lxplus8 but I don't trust them since they are somewhat fluctuating, I guess due to the lack of a dedicated machine. Since el8 is needed, could you advise on where to perform this test? For RECO profiling we used to launch tests on vocms011 but it is cc7 and the rebuilt with -O3 asks for el8.
Thank you

makortel · 2024-05-17T15:14:38Z

You can use a container (e.g. by running cmssw-el8 first, see https://cms-sw.github.io/singularity.html for more information) to run el8 binaries on slc7 nodes, or you can use one of the cmsdev nodes (see e.g. #44374).

jfernan2 · 2024-05-20T15:13:48Z

I have finally some sensible results on a dedicated machine (cmsdev40), but based on 100 event jobs, so uncertainty is still large. I have added a Run4 PU profile to the wf that Andrea used (wf 12834.0) trying to enlarge the gain over MkFit and Pixel/Track seeding modules. That's why you see that an event takes ~15s to be reconstructed (single thread and single stream, FastTimer does not allow to run multistream[1])

Base CMSSW_14_1_X_2024-05-12-0000:
https://www.hep.uniovi.es/jfernan/circles/web/piechart.php?local=false&dataset=base_11834.21_withPU&resource=time_real&colours=default&groups=packages&show_labels=true&threshold=0

Base+cms-sw/cmsdist#9183 + Rebuild:
https://www.hep.uniovi.es/jfernan/circles/web/piechart.php?local=false&dataset=pr9183cmsdist_11834.21_withPU_v2&resource=time_real&colours=default&groups=packages&show_labels=true&threshold=0

The gain seems general at the level 1% specially in RecoTracker, RecoParticleFlow and recoVertex.

I am running now on more events trying to reduce the uncertainty

[1] The framework is configured to use at least two streams, but the following modules
require synchronizing on LuminosityBlock boundaries:
FastTimerServiceClient fastTimerServiceClient

makortel · 2024-05-20T15:22:26Z

(single thread and single stream, FastTimer does not allow to run multistream[1])

[1] The framework is configured to use at least two streams, but the following modules
require synchronizing on LuminosityBlock boundaries:
FastTimerServiceClient fastTimerServiceClient

The real cause is that FastTimerServiceClient does not support concurrent LuminosityBlocks. You can work around the exception by setting the number of concurrent lumis explicitly along

process.options.numberOfConcurrentLuminosityBlocks = 1

after which any number of streams should work, at the cost of LuminosityBlock transition becoming more costly (which probably is not that important for this test)

The workaround should actually have been part of the exception message

cmssw/FWCore/Framework/src/EventProcessor.cc

Lines 2477 to 2479 in e95546f

    
           ex << "\n\nThe situation can be fixed by either\n" 
        
              << " * modifying the modules to support concurrent LuminosityBlocks (preferred), or\n" 
        
              << " * setting 'process.options.numberOfConcurrentLuminosityBlocks = 1' in the configuration file";

Is there some way we could improve the exception message to make it more clear?

fwyzard · 2024-05-20T15:23:01Z

[1] The framework is configured to use at least two streams, but the following modules
require synchronizing on LuminosityBlock boundaries:
FastTimerServiceClient fastTimerServiceClient

The FastTimerServiceClient is supposed to run during the DQM harvesting, not during the reconstruction step.

jfernan2 · 2024-05-20T15:25:48Z

I have removed the DQM and Validation parts of the step3 chunck that I am testing, but it seems that still the setting from Matti is needed

jfernan2 · 2024-05-20T15:26:59Z

This is my cmsDriver config:

cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM --conditions auto:phase1_2022_realistic --datatier AODSIM -n 100 --eventcontent AODSIM --geometry DB:Extended --era Run3 --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_13_1_0_pre1-130X_mcRun3_2022_realistic_withNewBSFromEOY2022Data_v2_RV186-v1/GEN-SIM --filein file:step2.root --fileout file:step3.root --customise HLTrigger/Timer/FastTimer.customise_timer_service_singlejob --customise_commands "process.FastTimerService.writeJSONSummary=True;process.FastTimerService.jsonFileName="step3_circles.json"" &> step3_TTbar_14TeV+2021PU_ProdLike.log

fwyzard · 2024-05-20T15:41:38Z

This is my cmsDriver config:

cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM --conditions auto:phase1_2022_realistic --datatier AODSIM -n 100 --eventcontent AODSIM --geometry DB:Extended --era Run3 --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_13_1_0_pre1-130X_mcRun3_2022_realistic_withNewBSFromEOY2022Data_v2_RV186-v1/GEN-SIM --filein file:step2.root --fileout file:step3.root --customise HLTrigger/Timer/FastTimer.customise_timer_service_singlejob --customise_commands "process.FastTimerService.writeJSONSummary=True;process.FastTimerService.jsonFileName="step3_circles.json"" &> step3_TTbar_14TeV+2021PU_ProdLike.log

Do you actually care about the timing DQM plots ?

If you don't (for example, if you are interested only in the text log and or in the JSON output), it would be better not to enable the DQM part at all:

cmsDriver.py step3 ...  --customise HLTrigger/Timer/FastTimer.addService --customise_commands 'process.FastTimerService.enableDQM=False;process.FastTimerService.writeJSONSummary=True;process.FastTimerService.jsonFileName="step3_circles.json"'

jfernan2 · 2024-05-20T15:44:27Z

OK, thanks a lot, that explains why I could not run multithreading

jfernan2 · 2024-05-20T16:07:43Z

Is there some way we could improve the exception message to make it more clear?

Probably not, it was my fault since I misread the workaround because there is a new empty line between the problem definition and the solution. I am sorry

jfernan2 · 2024-05-21T12:56:13Z

Results with 1k events which confirm again the gain spread across all Reco packages without a clear winner:

Base: https://www.hep.uniovi.es/jfernan/circles/web/piechart.php?local=false&dataset=base_11834.21_withPU_1kevents&resource=time_real&colours=default&groups=packages&show_labels=true&threshold=0

Base+cms-sw/cmsdist#9183 + Rebuild: https://www.hep.uniovi.es/jfernan/circles/web/piechart.php?local=false&dataset=pr9183cmsdist_11834.21_withPU_1kevents&resource=time_real&colours=default&groups=packages&show_labels=true&threshold=0

smuzaffar · 2024-05-21T13:09:32Z

@makortel @fwyzard , how about testing this in a dedicated CMSSW_14_1_O3_X IB?

fwyzard · 2024-05-21T13:19:53Z

IMHO we could simply merge it, and use the next pre-release validation to spot any problems

makortel · 2024-05-21T18:29:10Z

The plan from ORP+Core today is to

first merge the -O3 in the Geant4 IB, and increase its frequency
If no problems, deploy in all IB flavors
Validate as part of 14_1_0_pre5 (i.e. not in the next pre-release that will be built this week, but in the following)

smuzaffar · 2024-05-28T10:14:42Z

O3 is now default in CMSSW_14_1_X. For non-lto IB we have compilation errors in Geometry/HGCalCommonData [a]. With -O2 we do not get this error, any idea how to avoid it for non-lto?

Asan Ib also has similar error [a]

[a]

In file included from /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/string:50,
                 from src/Geometry/HGCalCommonData/plugins/dd4hep/DDHGCalPassive.cc:8:
In static member function 'static _Tp* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(const _Tp*, const _Tp*, _Tp*) [with _Tp = double; bool _IsMove = false]',
    inlined from '_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const double*; _OI = double*]' at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_algobase.h:495:30,
    inlined from '_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const double*; _OI = double*]' at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_algobase.h:522:42,
    inlined from '_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = const double*; _OI = double*]' at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_algobase.h:529:31,
    inlined from '_OI std::copy(_II, _II, _OI) [with _II = const double*; _OI = double*]' at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_algobase.h:620:7,
    inlined from 'void std::vector<_Tp, _Alloc>::_M_assign_aux(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const double*; _Tp = double; _Alloc = std::allocator<double>]' at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/vector.tcc:330:19:
  /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_algobase.h:431:30: error: argument 1 null where non-null expected [-Werror=nonnull]
   431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[b] https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc12/CMSSW_14_1_ASAN_X_2024-05-27-2300/Validation/CTPPS

In file included from src/DataFormats/Common/interface/RefVector.h:18,
                 from src/DataFormats/ProtonReco/interface/ForwardProton.h:14,
                 from src/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc:20:
In member function 'edm::RefVectorIterator<C, T, F>::reference edm::RefVectorIterator<C, T, F>::operator*() const [with C = std::vector<CTPPSLocalTrackLite>; T = CTPPSLocalTrackLite; F = edm::refhelper::FindUsingAdvance<std::vector<CTPPSLocalTrackLite>, CTPPSLocalTrackLite>]',
    inlined from 'virtual void CTPPSProtonReconstructionPlotter::analyze(const edm::Event&, const edm::EventSetup&)' at src/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc:608:22:
  src/DataFormats/Common/interface/RefVectorIterator.h:49:39: error: 'this' pointer is null [-Werror=nonnull]
    49 |       return (*nestedRefVector_)[iter_];

Dr15Jones · 2024-05-28T13:32:21Z

@smuzaffar it appears that CMSSW_14_1_ASAN_X_2024-05-27-2300 is not on cvmfs. It that on purpose or is there some delay?

smuzaffar · 2024-05-28T13:42:49Z

@Dr15Jones , it is installed on cvmfs

[muzaffar@cmsdev42 ~]$ scram l -c ASAN | tail -1
CMSSW           CMSSW_14_1_ASAN_X_2024-05-27-2300 /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-27-2300

Dr15Jones · 2024-05-28T13:45:13Z

I used the same command at FNAL

[cdj@cmslpc-el8-heavy01 fix]$ scram l -c ASAN | tail -1
CMSSW           CMSSW_14_1_ASAN_X_2024-05-24-2300 /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-24-2300

smuzaffar · 2024-05-28T13:50:27Z

looks like issue with cvmfs stratum 1 at FNAL. https://cvmfs-monitor-frontend.web.cern.ch/cms-ib.cern.ch shows that FNAL Stratum 1 is at revision 279148 while CERN has 279307 . I do not know who maintains the FNAL CVMFS stratum 1

makortel · 2024-05-31T19:57:32Z

[b] https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/el8_amd64_gcc12/CMSSW_14_1_ASAN_X_2024-05-27-2300/Validation/CTPPS

In file included from src/DataFormats/Common/interface/RefVector.h:18,
                 from src/DataFormats/ProtonReco/interface/ForwardProton.h:14,
                 from src/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc:20:
In member function 'edm::RefVectorIterator<C, T, F>::reference edm::RefVectorIterator<C, T, F>::operator*() const [with C = std::vector<CTPPSLocalTrackLite>; T = CTPPSLocalTrackLite; F = edm::refhelper::FindUsingAdvance<std::vector<CTPPSLocalTrackLite>, CTPPSLocalTrackLite>]',
    inlined from 'virtual void CTPPSProtonReconstructionPlotter::analyze(const edm::Event&, const edm::EventSetup&)' at src/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc:608:22:
  src/DataFormats/Common/interface/RefVectorIterator.h:49:39: error: 'this' pointer is null [-Werror=nonnull]
    49 |       return (*nestedRefVector_)[iter_];

In this case there is a code path where nestedRefVector_ may be null (two of the three constructors of edm::RefVector). But I can't see how the compiler could statically figure out it being null in this case (the objects containing the RefVector are being read from the edm::Event). This makes me wonder if the compiler would be throwing out the error too eagerly.

dan131riley · 2024-06-18T14:02:31Z

But I can't see how the compiler could statically figure out it being null in this case (the objects containing the RefVector are being read from the edm::Event). This makes me wonder if the compiler would be throwing out the error too eagerly.

Probably, there's several of these where I don't see how the compiler is inferring that there could be a problem. This one, however, is extra-weird--if I add just:

diff --git a/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc b/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc
index e1d7de5d7b9..7547133e58a 100644
--- a/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc
+++ b/Validation/CTPPS/plugins/CTPPSProtonReconstructionPlotter.cc
@@ -605,7 +605,9 @@ void CTPPSProtonReconstructionPlotter::analyze(const edm::Event &event, const ed

   // make single-RP-reco plots
   for (const auto &proton : *hRecoProtonsSingleRP) {
-    CTPPSDetId rpId((*proton.contributingLocalTracks().begin())->rpId());
+    const auto& pcLTiter = *proton.contributingLocalTracks().begin();
+    assert(pcLTiter.isNonnull());
+    CTPPSDetId rpId(pcLTiter->rpId());
     unsigned int decRPId = rpId.arm() * 100 + rpId.station() * 10 + rpId.rp();

     const bool n1f1 = (armTrackCounter_N[rpId.arm()] == 1 && armTrackCounter_F[rpId.arm()] == 1);

it fixes all three instances of the compilation error. So somehow the flow analysis of the three loops are correlated by the optimizer?

dan131riley · 2024-06-20T19:37:00Z

The sauce thickens. If I put that assertion in all three places where we were getting the error, the error returns in all three spots.

I think what may be happening is that the optimizer is seeing very similar operations in all three loops and is merging the analysis of those parts, and somehow inferring that somewhere in the combined analysis there's a possible path that gives a nullptr this. If I break the symmetry by inserting an extra branch in one of the three, then the interference pattern disappears. Restoring the symmetry by putting identical polarizers on all three branches restores the interference pattern.

makortel · 2024-07-19T20:35:53Z

CTTPS problem was worked around in #45304. @smuzaffar Could we close this issue?

smuzaffar · 2024-07-22T08:47:08Z

@makortel , yes we can close it now

makortel · 2024-07-22T13:32:20Z

+core

makortel · 2024-07-22T13:32:27Z

@cmsbuild, please close

cmsbuild · 2024-07-22T13:32:46Z

This issue is fully signed and ready to be closed.

makortel · 2024-08-23T20:18:38Z

The sauce thickens. If I put that assertion in all three places where we were getting the error, the error returns in all three spots.

Now it thickened even more: CMSSW_14_1_ASAN_X_2024-08-21-2300 had a build failure with exactly same symptoms in another loop in the same file. The earlier ASAN build was successful, and there were no relevant code changes in between. I opened #45786 that applies the same workaround in the reported place. Let's see if we'll eventually get to patch the third place too...

cmsbuild added core-pending pending-signatures labels May 8, 2024

This was referenced May 10, 2024

Change the default optimisation level to -O3 cms-sw/cmsdist#9183

Merged

Change the default optimisation level to -O3 [14.0.x] cms-sw/cmsdist#9184

Open

dan131riley mentioned this issue Jun 13, 2024

workaround for -O3 compilation issues in Geometry/HGCalCommonData #45219

Merged

dan131riley mentioned this issue Jun 25, 2024

CTPPS validation: workaround for -O3 ASAN compilation issue #45304

Merged

cmsbuild closed this as completed Jul 22, 2024

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Jul 22, 2024

makortel mentioned this issue Aug 23, 2024

Another workaround for -O3 ASAN compilation issue in CTPPSProtonReconstructionPlotter #45786

Merged

makortel mentioned this issue Sep 12, 2024

[ASAN_X] Compilation failure in Validation/CTPPS: 'this' pointer is null #45991

Closed

Try building with -O3 by default ? #44931

Try building with -O3 by default ? #44931

Comments

fwyzard commented May 8, 2024

enabled only by -O3

enabled either by -O3 or -fprofile-use

enabled only by -fprofile-use

fwyzard commented May 8, 2024

cmsbuild commented May 8, 2024

cmsbuild commented May 8, 2024 • edited Loading

cmsbuild commented May 8, 2024

fwyzard commented May 9, 2024 • edited Loading

fwyzard commented May 9, 2024

makortel commented May 10, 2024

fwyzard commented May 10, 2024

makortel commented May 10, 2024

dan131riley commented May 11, 2024

fwyzard commented May 13, 2024

fwyzard commented May 13, 2024

fwyzard commented May 15, 2024

fwyzard commented May 15, 2024

fwyzard commented May 16, 2024

CMSSW_14_1_X_2024-05-12-0000 (-O2)

cms-sw/cmsdist#9183 (-O3)

jfernan2 commented May 16, 2024 • edited Loading

fwyzard commented May 16, 2024

jfernan2 commented May 17, 2024 • edited Loading

makortel commented May 17, 2024 • edited Loading

jfernan2 commented May 20, 2024 • edited Loading

makortel commented May 20, 2024 • edited Loading

fwyzard commented May 20, 2024

jfernan2 commented May 20, 2024

jfernan2 commented May 20, 2024 • edited Loading

fwyzard commented May 20, 2024

jfernan2 commented May 20, 2024

jfernan2 commented May 20, 2024

jfernan2 commented May 21, 2024

smuzaffar commented May 21, 2024 • edited Loading

fwyzard commented May 21, 2024

makortel commented May 21, 2024

smuzaffar commented May 28, 2024 • edited Loading

Dr15Jones commented May 28, 2024

smuzaffar commented May 28, 2024

Dr15Jones commented May 28, 2024

smuzaffar commented May 28, 2024

makortel commented May 31, 2024

dan131riley commented Jun 18, 2024

dan131riley commented Jun 20, 2024

makortel commented Jul 19, 2024

smuzaffar commented Jul 22, 2024

makortel commented Jul 22, 2024

makortel commented Jul 22, 2024

cmsbuild commented Jul 22, 2024

makortel commented Aug 23, 2024

Try building with `-O3` by default ? #44931

Try building with `-O3` by default ? #44931

enabled only by `-O3`

enabled either by `-O3` or `-fprofile-use`

enabled only by `-fprofile-use`

cmsbuild commented May 8, 2024 •

edited

Loading

fwyzard commented May 9, 2024 •

edited

Loading

`CMSSW_14_1_X_2024-05-12-0000` (`-O2`)

cms-sw/cmsdist#9183 (`-O3`)

jfernan2 commented May 16, 2024 •

edited

Loading

jfernan2 commented May 17, 2024 •

edited

Loading

makortel commented May 17, 2024 •

edited

Loading

jfernan2 commented May 20, 2024 •

edited

Loading

makortel commented May 20, 2024 •

edited

Loading

jfernan2 commented May 20, 2024 •

edited

Loading

smuzaffar commented May 21, 2024 •

edited

Loading

smuzaffar commented May 28, 2024 •

edited

Loading