Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU support for ONNXRuntime #6776

Merged

Conversation

hqucms
Copy link
Contributor

@hqucms hqucms commented Mar 31, 2021

This PR adds the GPU support for ONNXRuntime. The built library still runs on CPU by default (thus current applications in CMSSW are unaffected), while GPU inference can be enabled if needed (see example).

A few changes needed:

  • a small modification on cuda is needed (namely, keeping libcudart_static.a) for cmake to detect nvcc correctly
  • cudnn is added as a dependency. Note that generally downloading cudnn needs NVIDIA Developer Program Membership, though direct download link w/o authentication exists (and used here). Experts should probably double check if the way we distribute it complies with its SLA.

Also I am not sure if this will compile on ppc64le or aarch64 (though cudnn exists for them).

FYI @riga @mialiu149

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @hqucms (Huilin Qu) for branch IB/CMSSW_11_3_X/master.

@cmsbuild, @smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

cuda.spec Outdated
@@ -34,6 +34,7 @@ mkdir -p %{i}/lib64/stubs

# package only the runtime static library
mv %_builddir/build/lib64/libcudadevrt.a %{i}/lib64/
mv %_builddir/build/lib64/libcudart_static.a %{i}/lib64/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any objections/concern @fwyzard ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm... I don't think that mixing the shared and static version of libcudart is a good idea.
Can ONNX not use the dynamic version ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This libcudart_static.a is needed otherwise enable_language(CUDA) in cmake crashes. I don't think onnxruntime really uses it.

Copy link
Contributor

@fwyzard fwyzard Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need cmake ?
it looks like cuDNN is not built, it's simply unpacked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set the CUDA_USE_STATIC_CUDA_RUNTIME cmake option to OFF ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need cmake ?
it looks like cuDNN is not built, it's simply unpacked.

We use cmake for ONNXRuntime. For cuDNN indeed it's simply unpacked.

can you set the CUDA_USE_STATIC_CUDA_RUNTIME cmake option to OFF ?

Sure, I can try that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard Unfortunately CUDA_USE_STATIC_CUDA_RUNTIME does not make a difference. It seems it's superseded by CMAKE_CUDA_RUNTIME_LIBRARY now?

In fact I managed to work around this with the following lines:

   -DCMAKE_CUDA_FLAGS="-cudart shared" \
   -DCMAKE_CUDA_RUNTIME_LIBRARY=Shared \
   -DCMAKE_TRY_COMPILE_PLATFORM_VARIABLES="CMAKE_CUDA_RUNTIME_LIBRARY" \

Now we don't need libcudart_static.a anymore. In fact I think the really useful one is -DCMAKE_CUDA_FLAGS="-cudart shared", which is sufficient to solve the problem in a newer cmake version, but somehow I need all three lines in the cmake version we use...

Also the problem is purely in cmake -- when calling enable_language(CUDA) it tries to compile a test program with nvcc and then parse the output to set up various CUDA paths/flags, and there it links to libcudart_static.a since the -cudart defaults to static in nvcc. After that stage, whether linking to cudart statically or dynamically can be controlled by CMAKE_CUDA_RUNTIME_LIBRARY and it's set to Shared in onnxruntime.

cudnn.spec Outdated
### RPM external cudnn 8.1.1.33
## INITENV +PATH LD_LIBRARY_PATH %i/lib64

%define cudaver_maj 11.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just nitpicking, but this is not the "major" CUDA version.
Can you just use cudaver ?

Also, @smuzaffar , is there a way to get this directly from the CUDA spec file ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, This is parsed and used by cmsBuild (even before installing dependencies), so at that time cmsBuild do not know the actual value. We can use a common file ( just like https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_3_X/master/cuda-flags.file ) where one can define this version and then include it in both cuda and cudnn. But this looks very much over killed for this purpose.

I would suggest that in%prep section just check if $CUDA_VERSION and %{cudaver_maj} are same (some sed/cut/grep is needed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if it cannot be extracted form the CUDA spec file, better leave it hard coded here then - this way we don't need to rebuild cuDNN for minor updates to CUDA (i.e. 11.2.0 -> 11.2.1 --> 11.2.2).

I assume we'll notice soon enough if we do a update CUDA and fail to update CUDNN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I renamed cudaver_maj to cudaver added a check in the %prep section now.

cuda.spec Outdated
@@ -101,6 +102,7 @@ ln -sf libnvidia-ptxjitcompiler.so.1 %{i}
sed \
-e"/^TOP *=/s|= .*|= $CMS_INSTALL_PREFIX/%{pkgrel}|" \
-e's|$(_HERE_)|$(TOP)/bin|g' \
-e's|$(TOP)/lib|$(TOP)/lib64|g' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks correct, but in fact should not be needed, after scram has set up the environment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's not really needed. I reverted all the changes on cuda.spec now.

@hqucms hqucms force-pushed the dev/CMSSW_11_3_X/onnx-gpu branch from 6dc4f7a to ee8a694 Compare April 1, 2021 12:21
@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2021

Pull request #6776 was updated.

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

nice!

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

please test

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

please test for slc7_aarch64_gcc9

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

please test for slc7_ppc64le_gcc9

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2021

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e46e0/13926/summary.html
COMMIT: ee8a694
CMSSW: CMSSW_11_3_X_2021-03-31-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6776/13926/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

/cvmfs/cms-ib.cern.ch/nweek-02674/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_3_X_2021-03-31-2300/external/slc7_amd64_gcc900/lib/libonnxruntime.so: undefined reference to `cudnnFindConvolutionForwardAlgorithmEx@libcudnn.so.8'
/cvmfs/cms-ib.cern.ch/nweek-02674/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_3_X_2021-03-31-2300/external/slc7_amd64_gcc900/lib/libonnxruntime.so: undefined reference to `cudnnBatchNormalizationForwardInference@libcudnn.so.8'
/cvmfs/cms-ib.cern.ch/nweek-02674/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_3_X_2021-03-31-2300/external/slc7_amd64_gcc900/lib/libonnxruntime.so: undefined reference to `cudnnDestroyRNNDataDescriptor@libcudnn.so.8'
/cvmfs/cms-ib.cern.ch/nweek-02674/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_3_X_2021-03-31-2300/external/slc7_amd64_gcc900/lib/libonnxruntime.so: undefined reference to `cudnnDestroyDropoutDescriptor@libcudnn.so.8'
/cvmfs/cms-ib.cern.ch/nweek-02674/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_3_X_2021-03-31-2300/external/slc7_amd64_gcc900/lib/libonnxruntime.so: undefined reference to `cudnnDeriveBNTensorDescriptor@libcudnn.so.8'
collect2: error: ld returned 1 exit status
>> Deleted: tmp/slc7_amd64_gcc900/src/PhysicsTools/ONNXRuntime/test/testONNXRuntime/testONNXRuntime
gmake: *** [tmp/slc7_amd64_gcc900/src/PhysicsTools/ONNXRuntime/test/testONNXRuntime/testONNXRuntime] Error 1
>> Leaving Package PhysicsTools/ONNXRuntime
>> Package PhysicsTools/ONNXRuntime built
>> Subsystem PhysicsTools built


@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2021

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e46e0/13927/summary.html
COMMIT: ee8a694
CMSSW: CMSSW_11_3_X_2021-03-31-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6776/13927/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

File "./pkgtools/cmsBuild", line 3487, in installPackage
installRpm(pkg, pkg.options.bootstrap)
File "./pkgtools/cmsBuild", line 3235, in installRpm
raise RpmInstallFailed(pkg, output)
RpmInstallFailed: Failed to install package cudnn. Reason:
error: Failed dependencies:
	libm.so.6(GLIBC_2.27)(64bit) is needed by external+cudnn+8.1.1.33-1fe5d615f0e3571e760119e066121081-1-1.aarch64

* The action "build-external+python_tools+2.0-8e466f93932071702fa843dee44853e2" was not completed successfully because The following dependencies could not complete:
install-external+onnxruntime+1.6.0-b578716d6932c1ae9ed96cefdf913fea
* The action "final-job" was not completed successfully because The following dependencies could not complete:


@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 1, 2021

Pull request #6776 was updated.

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

please test

@fwyzard
Copy link
Contributor

fwyzard commented Apr 1, 2021

@hqucms with these changes

  • does ONNX still run fine without a GPU ?
  • does it always use a GPU if it detects one, or does it need some runtime configuration ?

@hqucms
Copy link
Contributor Author

hqucms commented Apr 8, 2021

@smuzaffar The unit test failures look unrelated to this PR?

@smuzaffar
Copy link
Contributor

test parameters:

  • enable_test = threading,gpu

@smuzaffar
Copy link
Contributor

please test
yes @hqucms , unit tests failure are not related to this PR. Let me re-run it with threading on for production arch.

@hqucms
Copy link
Contributor Author

hqucms commented Apr 8, 2021

please test
yes @hqucms , unit tests failure are not related to this PR. Let me re-run it with threading on for production arch.

Thank you for confirming @smuzaffar !
One thing I think would be useful to add is a unittest that tests ONNXRuntime on GPU. But I think it can be added after this PR is finalized?

@smuzaffar
Copy link
Contributor

yes ONNXRuntime on GPU unit tests should be added to make sure the functionaly is working.
If you already have something to test then please go ahead and open cmssw pr.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 8, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e46e0/14106/summary.html
COMMIT: 7cc7fb0
CMSSW: CMSSW_11_3_X_2021-04-08-1100/slc7_amd64_gcc900
Additional Tests: THREADING,GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/6776/14106/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 9575
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 9575
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

+externals
looks good to go in

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 8, 2021

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_11_3_X/master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit f6c50ee into cms-sw:IB/CMSSW_11_3_X/master Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants