Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Tensorflow 2.4.1 #6674

Merged
merged 11 commits into from Mar 8, 2021
Merged

Update Tensorflow 2.4.1 #6674

merged 11 commits into from Mar 8, 2021

Conversation

smuzaffar
Copy link
Contributor

@smuzaffar smuzaffar commented Feb 25, 2021

  • Tensorflow: 2.4.1 only for python3
    • Dropped python2 support for tensorflow and tensorboard
    • Tensorflow 2.4.1 only builds for py3. There are changes in TF python code which do not work/run with python2.
    • use cms GRPC and typing_extensions
  • GRPC: 1.35.0 which builds with c++17. This also needs protobuf 3.12+.
    • It builds fine with system OpenSSL but needs patches for CMS OpenSSL.
  • Protobuf: 3.15.1
    • Build with c++17
  • OpenCV: 4.5.1
  • Eigen: used commit 011e0db31d1bed8b7f73662be6d57d9f30fa457a from master branch

This still needs CMSSW TF code updates
FYI @riga

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_11_3_X/master.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

@fwyzard , we need to update Eigen to go with this TF update. TF is using 011e0db31d1bed8b7f73662be6d57d9f30fa457a eigen commit. I have created cms/master/011e0db31d1bed8b7f73662be6d57d9f30fa457a branch for https://github.com/cms-externals/eigen-git-mirror . I tried to get your fixes for cuda ( eigenteam/eigen-git-mirror@d812f41...cms-externals:cms/master/d812f411c3f9 ) on top of 011e0db31d1bed8b7f73662be6d57d9f30fa457a but there are too many conflicits. Can you please check if these are still needed and can you please provide PR to go on top of cms/master/011e0db31d1bed8b7f73662be6d57d9f30fa457a branch?

@fwyzard
Copy link
Contributor

fwyzard commented Feb 25, 2021

I will be able to have a look only late next week (or later).

@cmsbuild
Copy link
Contributor

Pull request #6674 was updated.

@smuzaffar
Copy link
Contributor Author

please test with cms-sw/cmssw#32993

@cmsbuild
Copy link
Contributor

Pull request #6674 was updated.

@smuzaffar
Copy link
Contributor Author

please test with cms-sw/cmssw#32993

@cms-sw cms-sw deleted a comment from cmsbuild Mar 4, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 5, 2021

Pull request #6674 was updated.

@smuzaffar
Copy link
Contributor Author

please test

@smuzaffar
Copy link
Contributor Author

please test for slc7_aarch64_gcc9

@smuzaffar
Copy link
Contributor Author

please test for slc7_ppc64le_gcc9

@cms-sw cms-sw deleted a comment from cmsbuild Mar 6, 2021
@cms-sw cms-sw deleted a comment from cmsbuild Mar 6, 2021
@cms-sw cms-sw deleted a comment from cmsbuild Mar 6, 2021
@cms-sw cms-sw deleted a comment from cmsbuild Mar 6, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a654e7/13306/summary.html
COMMIT: 8fe2db9
CMSSW: CMSSW_11_3_X_2021-03-05-1200/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6674/13306/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-a654e7/34634.0_TTbar_14TeV+2026D76+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTrigger+RecoGlobal+HARVESTGlobal
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-a654e7/34834.999_TTbar_14TeV+2026D76PU_PMXS1S2PR+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+PREMIX_PremixHLBeamSpot14PU+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2849195
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2849170
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 37 files compared)
  • Checked 160 log files, 37 edm output root files, 38 DQM output files

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2021

-1

Failed Tests: UnitTests RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a654e7/13309/summary.html
COMMIT: 8fe2db9
CMSSW: CMSSW_11_3_X_2021-03-05-2300/slc7_ppc64le_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6674/13309/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test test_PrepareInputDb had ERRORS
---> test test_MpsWorkFlow had ERRORS
---> test testUnits had ERRORS
---> test GeometryDTGeometryBuilderTestDriver had ERRORS
and more ...

RelVals

  • 11634.91111634.911_TTbar_14TeV+2021_DD4hep+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step1_TTbar_14TeV+2021_DD4hep+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log

@fwyzard
Copy link
Contributor

fwyzard commented Mar 6, 2021

OK, I was finally able to validate all GPU-related workflows with the new Eigen version, and didn't find any regression in the performance:

Eigen version old new
I/O only 347.2 ± 0.3 ev/s 346.7 ± 0.8 ev/s
11634.501 327.6 ± 0.5 ev/s 327.3 ± 0.5 ev/s
11634.502 344.9 ± 0.4 ev/s 346.0 ± 0.3 ev/s
11634.505 311.2 ± 0.8 ev/s 309.8 ± 1.0 ev/s
11634.506 316.9 ± 1.0 ev/s 317.7 ± 0.4 ev/s
11634.511 322.0 ± 0.4 ev/s 322.3 ± 0.4 ev/s
11634.512 345.7 ± 1.5 ev/s 346.1 ± 0.8 ev/s
11634.521 325.3 ± 1.0 ev/s 324.2 ± 0.1 ev/s
11634.522 346.9 ± 0.4 ev/s 345.7 ± 0.8 ev/s

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 7, 2021

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a654e7/13308/summary.html
COMMIT: 8fe2db9
CMSSW: CMSSW_11_3_X_2021-03-05-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6674/13308/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test PhiTest had ERRORS

@smuzaffar
Copy link
Contributor Author

+externals
all looks good for this PR to go in. @riga do you want to do any final performance tests before we merge it?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 7, 2021

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_11_3_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@smuzaffar
Copy link
Contributor Author

lets get this in IBs

@smuzaffar smuzaffar merged commit d517e6b into IB/CMSSW_11_3_X/master Mar 8, 2021
@smuzaffar smuzaffar deleted the tf-241 branch March 10, 2021 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants