Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport OpenBLAS fixes [10_2_X] #5289

Merged
merged 3 commits into from Nov 13, 2019

Conversation

kpedro88
Copy link
Contributor

This is a combined backport of #5063 and #5091. The goal is to propagate the DeepAK8 speedup to the current analysis release. (Since ultra legacy production in 10_6_X will not finish for quite a while, 10_2_X will continue to see active use by analyzers.)

attn: @hqucms @smuzaffar @slava77

A question for @davidlange6: would we gain anything by also backporting the OpenBLAS version update from #4897?

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 21, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3061/console Started: 2019/10/21 19:25

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for branch IB/CMSSW_10_2_X/gcc700.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

-1

Tested at: c94a4b9

  • Build:

I found compilation error when building:

/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/include/c++/7.3.1/bits/unique_ptr.h:51:28: note: declared here
template class auto_ptr;
^~~~~~~~
/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/bin/g++ -pthread -shared -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/expat/2.1.0-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/expat/2.1.0-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/bz2lib/1.0.6-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/bz2lib/1.0.6-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/db6/6.0.30-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/db6/6.0.30-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gdbm/1.10-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gdbm/1.10-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/openssl/1.0.2d-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/openssl/1.0.2d-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/libffi/3.2.1-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/libffi/3.2.1-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/zlib-x86_64/1.2.11-omkpbe2/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/zlib-x86_64/1.2.11-omkpbe2/lib64 -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/sqlite/3.22.0-omkpbe/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/sqlite/3.22.0-omkpbe/lib64 build/temp.linux-x86_64-2.7/rivet/core.o -L/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc6_amd64_gcc700/external/rivet/2.5.4-d3edaa/Rivet-2.5.4/src/.libs -L/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/hepmc/2.06.07-omkpbe2/lib -L/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/fastjet/3.3.0-omkpbe/lib -L/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/yoda/1.6.7-d3edaa/lib -L/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/OpenBLAS/0.2.20-d3edaa/lib -L/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/python/2.7.14-omkpbe4/lib -Wl,-R/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/hepmc/2.06.07-omkpbe2/lib -Wl,-R/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/fastjet/3.3.0-omkpbe/lib -Wl,-R/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/yoda/1.6.7-d3edaa/lib -Wl,-R/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/OpenBLAS/0.2.20-d3edaa/lib -lgsl -lHepMC -lfastjet -lYODA -lRivet -lpython2.7 -o build/lib.linux-x86_64-2.7/rivet/core.so
/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/bin/../lib/gcc/x86_64-unknown-linux-gnu/7.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: cannot find -lgsl
collect2: error: ld returned 1 exit status
error: command '/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/bin/g++' failed with exit status 1
make[2]: *** [all-local] Error 1
make[2]: Leaving directory `/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc6_amd64_gcc700/external/rivet/2.5.4-d3edaa/Rivet-2.5.4/pyext'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc6_amd64_gcc700/external/rivet/2.5.4-d3edaa/Rivet-2.5.4/pyext'


You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3edaa/3061/summary.html

@kpedro88
Copy link
Contributor Author

The excerpt cut out an important part of the log, which specifies that the issue occurred when building rivet:

* The action "build-external+rivet+2.5.4-d3edaa" was not completed successfully because Failed to build rivet. Log file in /build/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc6_amd64_gcc700/external/rivet/2.5.4-d3edaa/log. Final lines of the log file:
^~~~~~~~
In file included from /cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/include/c++/7.3.1/memory:80:0,
from rivet/core.cpp:455:
/cvmfs/cms-ib.cern.ch/nweek-02599/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/include/c++/7.3.1/bits/unique_ptr.h:51:28: note: declared here

There are a lot of changes between https://github.com/cms-sw/cmsdist/commits/IB/CMSSW_10_2_X/gcc700/rivet.spec and https://github.com/cms-sw/cmsdist/commits/IB/CMSSW_11_0_X/master/rivet.spec (note that this PR does not directly modify rivet.spec). @intrepid42 can you comment if there is a specific change that would solve this problem?

@davidlange6
Copy link
Contributor

davidlange6 commented Oct 21, 2019 via email

@mseidel42
Copy link
Contributor

mseidel42 commented Oct 21, 2019

Hi, Rivet 2.5.4 still (optionally?) depends on GSL. The easiest solution is probably a small version bump to 2.6.1 #4427 or 2.7.2 #5005 (no RivetInterface changes required)

@mseidel42
Copy link
Contributor

Correction: small CMSSW changes are needed for both 2.6.1 -> cms-sw/cmssw#25817 and 2.7.2 -> cms-sw/cmssw#26936

You can also try to fix (like in Herwig++/ThePEG) or remove the GSL dependency.

@cmsbuild
Copy link
Contributor

Pull request #5289 was updated.

@kpedro88
Copy link
Contributor Author

Well, that was a fun rabbit hole.

I reran this build with and without my changes. The successful command (in the clean build) has these flags:

-Wl,-R/data/pedrok/phase2/clean/CMSSW_10_2_17/pkgs/slc7_amd64_gcc700/external/gsl/2.2.1-cms/lib -lgsl

whereas the failing command (in the changed build) has these flags:

-Wl,-R/data/pedrok/phase2/CMSSW_10_2_17/pkgs/slc7_amd64_gcc700/external/OpenBLAS/0.2.20-cms/lib -lgsl

Since libgsl is not located in OpenBLAS, the command naturally fails. One might wonder why these arguments change when rivet does not even depend on OpenBLAS.

Here's the relevant output from gsl-config, before:

configure: GSL LDFLAGS is -L/data/pedrok/phase2/clean/CMSSW_10_2_17/pkgs/slc7_amd64_gcc700/external/gsl/2.2.1-cms/lib -lgsl -lgslcblas -lm

and after:

configure: GSL LDFLAGS is -L/data/pedrok/phase2/CMSSW_10_2_17/pkgs/slc7_amd64_gcc700/external/gsl/2.2.1-cms/lib -lgsl -L/data/pedrok/phase2/CMSSW_10_2_17/pkgs/slc7_amd64_gcc700/external/OpenBLAS/0.2.20-cms/lib -lopenblas -lm

So rivet is not using GSL_LDFLAGS directly, but some derived information. The derivation occurs using a python regex in pyext/setup.py, which itself is generated by make using pyext/setup.py.in. The regex does not correctly handle the new GSL_LDFLAGS value that specifies two -L arguments, incorrectly picking the OpenBLAS one.

I have fixed the regex to handle this case correctly, picking the gsl lib directory, and propagated this fix to the rivet.spec file. This is the least intrusive option, as far as I can tell.

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 22, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3107/console Started: 2019/10/23 00:39

@cmsbuild
Copy link
Contributor

-1

Tested at: 20cb24f

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3edaa/3107/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test EcalDAQ_O2O_test had ERRORS

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3edaa/3175/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 31
  • DQMHistoTests: Total histograms compared: 3007491
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3007298
  • DQMHistoTests: Total skipped: 190
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 30 files compared)
  • Checked 129 log files, 14 edm output root files, 31 DQM output files

@fabiocos
Copy link
Contributor

fabiocos commented Nov 7, 2019

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3375/console Started: 2019/11/07 10:50

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

-1

Tested at: 20cb24f

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3edaa/3375/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test EcalDAQ_O2O_test had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3edaa/3375/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 31
  • DQMHistoTests: Total histograms compared: 3007491
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3007299
  • DQMHistoTests: Total skipped: 190
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 30 files compared)
  • Checked 129 log files, 14 edm output root files, 31 DQM output files

@fabiocos
Copy link
Contributor

+1

@fabiocos
Copy link
Contributor

@smuzaffar the test failure is unrelated to this PR, due to an update of the conditions test by Giacomo (all the 10_2_X IBs are affected). SO I would just merge this in view of next build

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_10_2_X/gcc700 IBs (but tests are reportedly failing).

@fabiocos
Copy link
Contributor

merge

as the test failed the bot did not acted on our approvals

@cmsbuild cmsbuild merged commit 90a8cfe into cms-sw:IB/CMSSW_10_2_X/gcc700 Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants