Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] build cmssw aarch64 with -mno-outline-atomics file #7582

Closed
wants to merge 7 commits into from

Conversation

smuzaffar
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_12_3_X/master.

@cmsbuild, @smuzaffar, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

test parameters:

  • full_cmssw = true
  • enable_test = threading
  • workflow_threading = 4.44,13.0,11634.24,140.56

@smuzaffar
Copy link
Contributor Author

please test for slc7_aarch64_gcc11

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/21960/summary.html
COMMIT: 4b4fa81
CMSSW: CMSSW_12_3_X_2022-01-23-2300/slc7_aarch64_gcc11
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7582/21960/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/21960/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/21960/git-merge-result

Build

I found compilation error when building:

Entering library rule at src/FWCore/Integration/test
>> Compiling edm plugin /data/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/CMSSW_12_3_X_2022-01-23-2300/src/FWCore/Integration/test/TestHistoryKeeping.cc
>> Building  edm plugin tmp/slc7_aarch64_gcc11/src/FWCore/Integration/test/TestHistoryKeeping/libTestHistoryKeeping.so
Leaving library rule at src/FWCore/Integration/test
/bin/sh: line 1: 2956474 Illegal instruction     (core dumped) /usr/bin/touch /data/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/CMSSW_12_3_X_2022-01-23-2300/tmp/slc7_aarch64_gcc11/cache/edm_edmPluginRefresh
gmake: *** [lib/slc7_aarch64_gcc11/TestHistoryKeeping.edmplugin] Error 132
Entering library rule at src/FWCore/Integration/test
>> Compiling edm plugin /data/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/CMSSW_12_3_X_2022-01-23-2300/src/FWCore/Integration/test/TestInterProcessProd.cc
>> Building  edm plugin tmp/slc7_aarch64_gcc11/src/FWCore/Integration/test/TestInterProcessProd/libTestInterProcessProd.so
Leaving library rule at src/FWCore/Integration/test
@@@@ Running edmWriteConfigs for TestInterProcessProd


@fwyzard
Copy link
Contributor

fwyzard commented Jan 24, 2022

Building with -mno-outline-atomics will require hardware support for the LSE instructions from the ARMv8.1 ISA.
If our build machines support only the ARMv8.0 ISA, the build is expected to fail at run time :-/

@dan131riley
Copy link

Building with -mno-outline-atomics will require hardware support for the LSE instructions from the ARMv8.1 ISA.

I thought the point of the outline atomics was to enable use of LSE at runtime, so I expected -mno-outline-atomics to revert to the pre-10.1 behavior that didn't use LSE instructions.

If our build machines support only the ARMv8.0 ISA, the build is expected to fail at run time :-/

Our build machines (at least the ones I'm aware of) have been upgraded to ThunderX2 that support the ARMv8.1 ISA, and since gcc 10.1 have been using the LSE instruction set at runtime. I was wondering if the use of LSE was causing the mysterious failures we've been seeing, so I was hoping that flag would disable LSE.

@dan131riley
Copy link

/bin/sh: line 1: 2956474 Illegal instruction (core dumped) /usr/bin/touch /data/cmsbuild/jenkins_b/workspace/ib-run-pr-tests/CMSSW_12_3_X_2022-01-23-2300/tmp/slc7_aarch64_gcc11/cache/edm_edmPluginRefresh

This is really weird--/usr/bin/touch crashed? I have no idea how adding that flag could cause that to crash.

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2022

Our build machines (at least the ones I'm aware of) have been upgraded to ThunderX2 that support the ARMv8.1 ISA

Ah, OK.
Do we have interactive access to any of those nodes ?

@smuzaffar
Copy link
Contributor Author

@fwyzard , you can login to those node. I am going to send you the instructions via email

@fwyzard
Copy link
Contributor

fwyzard commented Jan 28, 2022

Building with -mno-outline-atomics will require hardware support for the LSE instructions from the ARMv8.1 ISA.

I thought the point of the outline atomics was to enable use of LSE at runtime, so I expected -mno-outline-atomics to revert to the pre-10.1 behavior that didn't use LSE instructions.

OK, I guess I misread the GCC's documentation:

-moutline-atomics
-mno-outline-atomics

Enable or disable calls to out-of-line helpers to implement atomic operations. These helpers will, at runtime, determine if the LSE instructions from ARMv8.1-A can be used; if not, they will use the load/store-exclusive instructions that are present in the base ARMv8.0 ISA.

This option is only applicable when compiling for the base ARMv8.0 instruction set. If using a later revision, e.g. -march=armv8.1-a or -march=armv8-a+lse, the ARMv8.1-Atomics instructions will be used directly. The same applies when using -mcpu= when the selected cpu supports the lse feature. This option is on by default.

So

  • with -march=armv8.1-a or later, -moutline-atomics/-mno-outline-atomics has no effect, and LSE are always used
  • with -march=armv8-a, -moutline-atomics uses helpers to check if LSE are available and use them, or fall back to the old instructions otherwise
  • with -march=armv8-a, -mno-outline-atomics always uses the old instructions

?

@smuzaffar
Copy link
Contributor Author

please test for slc7_aarch64_gcc11

lets re-run the tests

@dan131riley
Copy link

  • with -march=armv8.1-a or later, -moutline-atomics/-mno-outline-atomics has no effect, and LSE are always used
  • with -march=armv8-a, -moutline-atomics uses helpers to check if LSE are available and use them, or fall back to the old instructions otherwise
  • with -march=armv8-a, -mno-outline-atomics always uses the old instructions

I think that's correct. We aren't specifying -march or -mcpu so I'd expect it to use the base ARMv8 instructions with -mno-outline-atomics (in fact, I got started on this line of investigation due to a handful crashes in the outline LSE routines).

@smuzaffar
Copy link
Contributor Author

correct, we are not setting -mcpu/arch for aarch64. We have two arm64 nodes

  • thunderx2: olarm-202
  • thunderx: olarm-102

but we are currently using olarm-202 only. Let me build all externals and cmssw with -mcpu=armv8-a on olarm-202 and see if that helps.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals RelVals-THREADING
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/22060/summary.html
COMMIT: 4b4fa81
CMSSW: CMSSW_12_3_X_2022-01-27-2300/slc7_aarch64_gcc11
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7582/22060/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/22060/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3a3bc3/22060/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test TestFWCoreServicesDriver had ERRORS
---> test testFWCoreUtilities had ERRORS

RelVals

----- Begin Fatal Exception 28-Jan-2022 21:02:45 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 194533 lumi: 329 event: 462355458 stream: 0
   [1] Running path 'dqmofflineOnPAT_1_step'
   [2] Prefetching for module SingleTopTChannelLeptonDQM_miniAOD/'singleTopElectronMediumDQM_miniAOD'
   [3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
   [4] Prefetching for module PATMuonSelector/'selectedPatMuons'
   [5] Prefetching for module PATMuonProducer/'patMuons'
   [6] Prefetching for module MuonProducer/'muons'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [9] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [10] Prefetching for module PFConversionProducer/'pfConversions'
   [11] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 28-Jan-2022 21:11:38 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 326479 lumi: 7 event: 1579493 stream: 0
   [1] Running path 'dqmoffline_8_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [6] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [7] Prefetching for module PFConversionProducer/'pfConversions'
   [8] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 28-Jan-2022 21:25:32 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 319450 lumi: 76 event: 106007323 stream: 0
   [1] Running path 'dqmoffline_10_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [6] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [7] Prefetching for module PFConversionProducer/'pfConversions'
   [8] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------

RelVals-THREADING

  • 13.013.0_QCD_Pt_3000_3500+QCD_Pt_3000_3500INPUT+DIGI+RECO+HARVEST/step3_QCD_Pt_3000_3500+QCD_Pt_3000_3500INPUT+DIGI+RECO+HARVEST.log
  • 4.444.44_RunElectron2012A+RunElectron2012A+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunElectron2012A+RunElectron2012A+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log
  • 4.534.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step3_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log
Expand to see more relval errors ...

@dan131riley
Copy link

Refitted track not found in list
----- End Fatal Exception -------------------------------------------------

Well, that kills that theory. Not only does the error reproduce with LSE disabled, but it also shows that the error reproduces in a single-threaded test (which I had not noticed before), which rules out synchronization primitives.

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Jan 30, 2022

@dan131riley , yes looks like outline-atomics is not the issue. I have built gcc with outline atomics disabled ( #7593 ) and still the same error ( #7593 (comment) )

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2022

Pull request #7582 was updated.

@smuzaffar smuzaffar closed this Feb 3, 2022
@smuzaffar smuzaffar deleted the smuzaffar-patch-5 branch February 3, 2022 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants