Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness on Fashion MNIST and RNN sample programs on MI100 (gfx908) rocm 5.3.3 #2025

Closed
Epliz opened this issue Mar 23, 2023 · 2 comments
Closed

Comments

@Epliz
Copy link

Epliz commented Mar 23, 2023

Issue Type

Performance

Have you reproduced the bug with TF nightly?

No

Source

binary

Tensorflow Version

v2.11.0-3797-gfe65ef3bbcf 2.11.0

Custom Code

No

OS Platform and Distribution

Ubuntu 22.04.1 LTS

Mobile device

No response

Python version

3.10.6

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

rocm-5.3.3

GPU model and memory

MI100 (gfx908)

Current Behaviour?

I tried running the Fashion MNIST sample mentioned at https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.3/page/Deep_Learning_Training.html , and another sample from Tensorflow Keras at https://www.tensorflow.org/text/tutorials/text_generation .

In both cases, the speed on my MI100 system is rather low, it is actually slower than running on my laptop CPU that is a AMD Ryzen 7 5800U CPU: on Fashion MNIST, on MI100 it takes 4s per epoch, while on my laptop CPU it takes 1s per epoch; similar slowdown on the RNN example.

Could you please indicate what performance I should be able to reach on Fashion MNIST to determine if I have an issue on my side or it is the best I can hope for?

Standalone code to reproduce the issue

Run the code at https://github.com/ROCmSoftwarePlatform/tensorflow_fashionmnist/blob/main/fashion_mnist.py

Or the code at https://github.com/tensorflow/text/blob/master/docs/tutorials/text_generation.ipynb

Relevant log output

No response

@Epliz
Copy link
Author

Epliz commented Mar 25, 2023

When upgrading to ROCM 5.4.3, I cannot even run the RNN text generation sample anymore and I hit the issue at ROCm/rocBLAS#1267 , then after solving it I get an issue with XLA apparently...
I will open a separate issue for that one

@Epliz Epliz changed the title Slowness on Fashion MNIST and RNN sample programs on MI100 (gfx908) Slowness on Fashion MNIST and RNN sample programs on MI100 (gfx908) rocm 5.3.3 Mar 25, 2023
@Epliz
Copy link
Author

Epliz commented Mar 25, 2023

Slowness issue solved after upgrading & fixing tensile library issue and XLA issue ( #2026 )

@Epliz Epliz closed this as completed Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant