Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ML related software #6649

Merged
merged 7 commits into from Feb 17, 2021
Merged

Conversation

riga
Copy link
Contributor

@riga riga commented Feb 15, 2021

This PR updates software related to ML workflows. Details:

  • Update ONNXRuntime from 1.3.0 to 1.6.0. The current fork is at this branch and could be included to cms_externals/onnxruntime. When 1.6.0 is working properly, the race conditions mentioned in Apparent data race in onnxruntime on aarch64 cmssw#32899 should be checked again.
  • Add standalone XGBoost library to be used in C++. The Python3 bindings are linked to that version. The legacy Python2 bindings are pinned to an older XGBoost version and thus bring their own library as before.
  • Add the cmsml python package containing useful tools for working with ML in a cms-specific context. These tools are meant to work independently of cmssw, hence the externalization. Note: similar yet older versions of some of the tools are currently living in PhysicsTools/TensorFlow/python and should be removed (or properly deprecated) (Adapt python code to updated ML software cmssw#32942).
  • Add onnxmltools which contains several converters to onnx model format. I only added the Python3 tools as they dropped Python2 quite a while ago, and we should have the latest version available to support more models.
  • Minor version updates of other ml related software and add the luigi/law workflow tools (plus 2 dependencies) as they are used in some groups.

I tested with workflow 23424.0 (TTbar_13+2026D49PU): runTheMatrix.py -w upgrade -l 23424.0.

@mialiu149

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @riga (Marcel R.) for branch IB/CMSSW_11_3_X/master.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

Pull request #6649 was updated.

@smuzaffar
Copy link
Contributor

thanks @riga for the updates. I have included your changes in cms-externals/onnxruntime repo now

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

@smuzaffar
Copy link
Contributor

please test for slc7_ppc64le_gcc9

@smuzaffar
Copy link
Contributor

please test for slc7_amd64_gcc10

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12902/summary.html
COMMIT: 2e927cd
CMSSW: CMSSW_11_3_X_2021-02-14-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12902/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

FATAL: malformed spec found while quering it. Command: 
source /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc10/rpm-env.sh ;  rpm -q --specfile /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-python_tools --info --define "cmsdist_directory /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cmsdist" --define "compilerv 1020" --define "cmscompilerv 10" --define "cmsos slc7_amd64" --define "package_vectorization %{nil}" --define 'buildroot /foo'
Resulted in:

warning: Macro %rpmbuild_libdir defined but not used within scope
error: line 369: Unknown tag: <<<<<<< HEAD
error: query of specfile /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-python_tools failed, can't parse
Traceback (most recent call last):
  File "./pkgtools/cmsBuild", line 4396, in 
    build(opts, args[1:], PKGFactory)
  File "./pkgtools/cmsBuild", line 3695, in build


@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12899/summary.html
COMMIT: 2e927cd
CMSSW: CMSSW_11_3_X_2021-02-15-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12899/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

call_subprocess(
File "/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/py3-pip/20.3.3/lib/python3.8/site-packages/pip/_internal/utils/subprocess.py", line 240, in call_subprocess
raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/slc7_amd64_gcc900/external/python3/3.8.2-bcolbf/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-o8alg2k5/setup.py'"'"'; __file__='"'"'/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-o8alg2k5/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-record-1_72s80u/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/532d7cb9a6d277c956517be10bbbf61e/opt/cmssw/slc7_amd64_gcc900/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/include/python3.8/xgboost --use-system-libxgboost Check the logs for full command output.
Removed build tracker: '/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc900/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-tracker-pd1s0gv2'
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.LCH162 (%build)


RPM build errors:
Macro %rpmbuild_libdir defined but not used within scope
Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.LCH162 (%build)


@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12901/summary.html
COMMIT: 2e927cd
CMSSW: CMSSW_11_3_X_2021-02-14-2300/slc7_ppc64le_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12901/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

call_subprocess(
File "/scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/slc7_ppc64le_gcc9/external/py3-pip/20.3.3/lib/python3.8/site-packages/pip/_internal/utils/subprocess.py", line 240, in call_subprocess
raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/slc7_ppc64le_gcc9/external/python3/3.8.2-0923d45facc02c507f248eced2e49422/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_ppc64le_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-sdz7_9oo/setup.py'"'"'; __file__='"'"'/scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_ppc64le_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-sdz7_9oo/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_ppc64le_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-record-9vjbksf9/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/532d7cb9a6d277c956517be10bbbf61e/opt/cmssw/slc7_ppc64le_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/include/python3.8/xgboost --use-system-libxgboost Check the logs for full command output.
Removed build tracker: '/scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_ppc64le_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-tracker-s_z3162t'
error: Bad exit status from /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.u8vSVa (%build)


RPM build errors:
Macro %rpmbuild_libdir defined but not used within scope
Bad exit status from /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.u8vSVa (%build)


@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12900/summary.html
COMMIT: 2e927cd
CMSSW: CMSSW_11_3_X_2021-02-15-1100/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12900/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

call_subprocess(
File "/home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/slc7_aarch64_gcc9/external/py3-pip/20.3.3/lib/python3.8/site-packages/pip/_internal/utils/subprocess.py", line 240, in call_subprocess
raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/slc7_aarch64_gcc9/external/python3/3.8.2/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_aarch64_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-4plvzbys/setup.py'"'"'; __file__='"'"'/home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_aarch64_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-build-4plvzbys/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_aarch64_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-record-nd4vkf49/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/532d7cb9a6d277c956517be10bbbf61e/opt/cmssw/slc7_aarch64_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/include/python3.8/xgboost --use-system-libxgboost Check the logs for full command output.
Removed build tracker: '/home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_aarch64_gcc9/external/py3-xgboost/1.3.3-532d7cb9a6d277c956517be10bbbf61e/cmsdist-tmp/pip-req-tracker-mpeouj1_'
error: Bad exit status from /home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.COk8NR (%build)


RPM build errors:
Macro %rpmbuild_libdir defined but not used within scope
Bad exit status from /home/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.COk8NR (%build)


@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

Pull request #6649 was updated.

@smuzaffar
Copy link
Contributor

please test for slc7_ppc64le_gcc9

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

@riga
Copy link
Contributor Author

riga commented Feb 16, 2021

The error in cms/6649/slc7_amd64_gcc10 looks like an unresolved merge conflict but I can't see where it comes from.

The failed cms/6649/slc7_ppc64le_gcc9/relvals test is due to a segfault in OscarMTProducer:g4SimHits, however, I can't see the connection to the changes in this PR yet.

@smuzaffar
Copy link
Contributor

11634.911 for pcc64le is already failing in power IBs

@smuzaffar
Copy link
Contributor

please test for CMSSW_11_3_PY3_X
lets hope there are no merge conflicts

@mrodozov
Copy link
Contributor

It comes from python_tools which seems to be unmergeble into gcc10 (gcc900 to gcc10), fails to merge it with a conflict and then the .spec file remains with the conflict lines -> malformed spec.
I think we can get the rest in and fix gcc10 python_tools by hand and test it.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12926/summary.html
COMMIT: 0a5f9e1
CMSSW: CMSSW_11_3_PY3_X_2021-02-15-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12926/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testhep_ml had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-ae302a/11634.911_TTbar_14TeV+2021_DD4hep+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4822 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2752246
  • DQMHistoTests: Total failures: 777
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2751469
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 156 log files, 37 edm output root files, 37 DQM output files

@riga
Copy link
Contributor Author

riga commented Feb 17, 2021

The failed unit test in PhysicsTools/PythonAnalysis/test/testhep_ml.py uses a wrong import in Python3 and is easy to fix. I will open a PR to CMSSW to change that. I can use the same PR to make the changes to the tools in PhysicsTools/TensorFlow/python/tools.py as mentioned above:

Note: similar yet older versions of some of the tools are currently living in PhysicsTools/TensorFlow/python and should be removed (or properly deprecated).

@smuzaffar
Copy link
Contributor

please test

@davidlange6
Copy link
Contributor

davidlange6 commented Feb 17, 2021 via email

@smuzaffar
Copy link
Contributor

please test for slc7_amd64_gcc10
gcc10 branch was missing a new line in python_tools. It should be fine now

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12952/summary.html
COMMIT: 0a5f9e1
CMSSW: CMSSW_11_3_X_2021-02-16-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12952/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12953/summary.html
COMMIT: 0a5f9e1
CMSSW: CMSSW_11_3_X_2021-02-16-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12953/install.sh to create a dev area with all the needed externals and cmssw changes.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ae302a/12951/summary.html
COMMIT: 0a5f9e1
CMSSW: CMSSW_11_3_X_2021-02-16-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6649/12951/install.sh to create a dev area with all the needed externals and cmssw changes.

@smuzaffar
Copy link
Contributor

smuzaffar commented Feb 17, 2021

+externals
looks good to go. PY3 unit tests needs cmssw PR

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_11_3_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants