New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problems using tensorflow_probability #893
Comments
Adding some more results here that also include CPU. Turning on
With |
Here are few more benchmarks for a more realistic (not toy data) model from my own research. The likelihood function (pure tensorflow) for Model 2, in particular, has a lot of Once the
|
XLA compilation failure might be the root cause of the performance issues outlined above. As I outline in #908, there seems to be a bug in |
The benchmark timings should be ignored here. Instead, please see #954. |
System information
Have I written custom code (as opposed to using a stock
example script provided in TensorFlow): Yes.
OS Platform and Distribution
Linux Ubuntu 18.04 using upstream radeon kernel driver and launching ROCM scripts in the latest docker container using
TF_ROCM_FUSION_ENABLE=1
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if
N/A
TensorFlow installed from (source or
binary): installed from docker as
rocm/tensorflow:latest
TensorFlow version (use command below): v2.1.0-15-g5466af3 2.1.0
Python version: 3.5.2
Bazel version (if compiling from source): Build label: 0.29.1
GCC/Compiler version (if compiling from source): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CUDA/cuDNN version:
GPU model and memory: Radeon VII gfx906
Note: I installed tensorflow_probability in the ROCM docker container using the
--no-deps option
. After that I had to install an additionally dependency or two.Describe the current behavior
Use of tensorflow probability methods (e.g. mcmc) are slower using rocm gpu relative to the
tf-gpu
stack on nvidia hardware. The test code shows that Tensorflow only functions in the ROCM stack run at speeds comparable to the cuda versions, but once tensorflow-probability routines are called using these functions, the ROCM stack is much slower. In the results below, NUTS uses the NUTS step method with adaptation (see this) and Function represents the execution of the pure tensorflow log-likelihood function with gradients and excludes any jit compile time since the function is invoked and then invoked again for timing purposes. The likelihood uses somelinalg
andreduce
operations. The numbers in the table are runtimes in seconds using time.time() differences.Eyeballing
watch -n .1 rocm-smi
as the script executes shows GPU usage at 100% most of the time for the GPU tests.Describe the expected behavior
Given that the tensorflow log-likelihood functions are comparable in execution times, I would expect The MCMC sampling times to be much closer. Instead we see that the ROCM stack takes nearly 3x longer to complete.
Standalone code to reproduce the issue
The script generating these results can be found at this gist
Other info / logs
The text was updated successfully, but these errors were encountered: