You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting the current error:
2023-03-25 10:08:12.600064: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7fa7f8102fa0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2023-03-25 10:08:12.600101: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): AMD Instinct MI100, AMDGPU ISA version: gfx908:sramecc+:xnack-
2023-03-25 10:08:12.623830: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-03-25 10:08:12.724333: E tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:289] bitcode module is required by this HLO module but was not found at ./opencl.bc
2023-03-25 10:08:12.725250: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2023-03-25 10:08:12.725363: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: bitcode module not found at ./opencl.bc
2023-03-25 10:08:12.749220: E tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:289] bitcode module is required by this HLO module but was not found at ./opencl.bc
2023-03-25 10:08:12.749567: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: bitcode module not found at ./opencl.bc
2023-03-25 10:08:12.782255: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
Traceback (most recent call last):
File "/home/me/git/ml/textgen_rnn/./rnn.py", line 148, in<module>history = model.fit(dataset, epochs=EPOCHS, batch_size=BATCH_SIZE)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node 'StatefulPartitionedCall_5' defined at (most recent call last):
File "/home/me/git/ml/textgen_rnn/./rnn.py", line 148, in<module>history = model.fit(dataset, epochs=EPOCHS, batch_size=BATCH_SIZE)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
returnsuper().apply_gradients(grads_and_vars, name=name)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_5'
bitcode module not found at ./opencl.bc
[[{{node StatefulPartitionedCall_5}}]] [Op:__inference_train_function_2592]
### Standalone code to reproduce the issue
```shell
code from https://www.tensorflow.org/text/tutorials/text_generation
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
using the solution from there to set ROCM_PATH worked.
Please make the env variable to be automatically set instead of making users having to figure it out by themselves.
Issue Type
Bug
Have you reproduced the bug with TF nightly?
No
Source
binary
Tensorflow Version
tf-rocm 2.11
Custom Code
No
OS Platform and Distribution
Ubuntu 22.04.2 LTS
Mobile device
No response
Python version
3.10.6
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
rocm 5.4.3
GPU model and memory
MI100
Current Behaviour?
Relevant log output
No response
The text was updated successfully, but these errors were encountered: