Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF] Update TF to version 2.1 using externals from cms distribution #5525

Merged
merged 20 commits into from Feb 26, 2020

Conversation

smuzaffar
Copy link
Contributor

@smuzaffar smuzaffar commented Feb 6, 2020

For building Tensorflow, following externals are now used from cms distribution. We can also use eigen from cms externals but currently the eigen version we have in cmsdist is bit newer then the one needed by TF (so TF failed to build due to some missing headers). We need to sync these versions so that TF can use our verison of eigen

  • png
  • jpeg
  • zlib
  • curl
  • protobuf
  • pcre
  • gif
  • sqlite3
  • swig
  • cython
  • py2-functools32
  • py2-enum34
  • py2-astor
  • py2-six
  • py2-absl_py
  • py2-termcolor
  • py2-keras_applications
  • py2-pasta
  • py2-wrapt
  • py2-gast
  • py2-backports_weakref
  • py2-opt_einsum

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_11_1_X/master.

@cmsbuild, @smuzaffar, @mrodozov, @tulamor can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Feb 6, 2020

test parameters

@smuzaffar
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-externals/tensorflow#6

@mrodozov
Copy link
Contributor

@mrodozov
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 25, 2020

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-sw/cmssw#28711,cms-data/RecoTauTag-TrainingFiles#4
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/4855/console Started: 2020/02/25 10:19

@cmsbuild
Copy link
Contributor

+1
Tested at: 2ec64b2
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c700e/4855/summary.html
CMSSW: CMSSW_11_1_X_2020-02-24-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c700e/4855/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2695371
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2695049
  • DQMHistoTests: Total skipped: 319
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@silviodonato
Copy link
Contributor

unhold

@silviodonato
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_11_1_X/master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit 46124bb into IB/CMSSW_11_1_X/master Feb 26, 2020
@slava77
Copy link
Contributor

slava77 commented Mar 5, 2020

I'm running miniAOD on this on my old AMD machine (AMD Opteron 6128) and the baseline (CMSSW_11_1_X_2020-02-26-1100) is crashing with
A fatal system signal has occurred: illegal instruction

The bad symbols are apparently in the tensorflow library tensorflow/2.1.0/lib/libtensorflow_framework.so.2
dgb points to
=> 0x00007f6fde0c84b2 <+194>: pshufb %xmm1,%xmm0
this is SSSE3 according to https://www.felixcloutier.com/x86/pshufb .
Curiously, this is appropriately present in core2 https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html, but is not in the AMD Opteron. I guess me/we were lucky so far that compiler didn't use this SSSE3-specific instruction.

@makortel @smuzaffar
should we start abandoning old/existing hardware support
or instead make an effort to make it run?

@slava77
Copy link
Contributor

slava77 commented Mar 5, 2020

It looks like I'm rediscovering a similar issue as in #5220
There we changed to -msse3

@slava77
Copy link
Contributor

slava77 commented Mar 5, 2020

It looks like I'm rediscovering a similar issue as in #5220
There we changed to -msse3

@makortel @smuzaffar @mrodozov
should I just submit a PR similar to #5220 to enforce -msse3?
Is tensorflow-sources.file the right (and only) file for the TF build flag parameters?

@smuzaffar
Copy link
Contributor Author

@slava77 , yes tensorflow-sources.file is the correct file. Please go ahead and sumbit the change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants