Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD gpu update #168

Open
stevelcb opened this issue Jun 15, 2024 · 6 comments
Open

AMD gpu update #168

stevelcb opened this issue Jun 15, 2024 · 6 comments

Comments

@stevelcb
Copy link

stevelcb commented Jun 15, 2024

Ubuntu 22.04

Hi everyone
I thought I'd update on this having tried to get ROCm gpu acceleration recognised via onnx.

We created the environment for building GraX as here:
https://github.com/Steffenhir/GraXpert

Then activated AMD's ROCm as here:
https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-onnx.html

That works fine and AMD's ROCm is indeed available via onnxruntime:


>>> import onnxruntime as ort
>>> ort.get_available_providers()
['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

We then build...
However, GraX still sees only the CPU:


2024-06-15 12:45:13,787 MainProcess root INFO     Starting denoising
2024-06-15 12:45:15,548 MainProcess root INFO     Available inference providers : ['CPUExecutionProvider']
2024-06-15 12:45:15,548 MainProcess root INFO     Used inference providers : ['CPUExecutionProvider']
2024-06-15 12:45:17,962 MainProcess root INFO     Progress: 1%
2024-06-15 12:45:19,893 MainProcess root INFO     Progress: 2%

Reading to the end of the AMD document, I see that it works with:
Radeon: RX 7900 XTX, RX 7900 XT, RX 7900, GRE PRO W7900 and PRO W7800

I have a gfx90, so not sure if the gpu will be visible to GraX. It is to other programs, such as StarTools but that's via opencl.

Still thinking... Any ideas anyone?
Cheers and TIA

@schmelly
Copy link
Collaborator

Hi,

I believe you have to include the ROCMExecutionProvider in graxpert/ai_model_handling.py, cf.:

def get_execution_providers_ordered(gpu_acceleration=True):

CS, David

@stevelcb
Copy link
Author

Thanks David
Unfortunately:


 python -m graxpert.main
2024-06-18 22:13:02,761 MainProcess root WARNING  Could not check for newest version
2024-06-18 22:13:11,367 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,367 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-06-18 22:13:11,822 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,822 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:11,853 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-06-18 22:13:24,273 MainProcess root INFO     Progress: 8%
2024-06-18 22:13:24,278 MainProcess root INFO     Progress: 16%
2024-06-18 22:13:24,280 MainProcess root INFO     Progress: 24%
2024-06-18 22:13:24,280 MainProcess root INFO     Progress: 32%
2024-06-18 22:13:25,119 MainProcess root INFO     Providers : ['ROCMExecutionProvider', 'CPUExecutionProvider']
2024-06-18 22:13:25,119 MainProcess root INFO     Used providers : ['ROCMExecutionProvider', 'CPUExecutionProvider']
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
2024-06-18 22:13:25.122326547 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
2024-06-18 22:13:25.122341807 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
2024-06-18 22:13:25,177 MainProcess root ERROR    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 
Traceback (most recent call last):
  File "/home/steve/GraXpert/graxpert/application/app.py", line 149, in on_calculate_request
    extract_background(
  File "/home/steve/GraXpert/graxpert/background_extraction.py", line 80, in extract_background
    background = session.run(None, {"gen_input_image": np.expand_dims(imarray_shrink, axis=0)})[0][0]
  File "/home/steve/GraXpert/graxpert-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'StatefulPartitionedCall/model/sequential/conv2d/Conv2D__6' Status Message: ROCBLAS failure 6: rocblas_status_internal_error ; GPU=0 ; hostname=cocina ; file=/onnxruntime/build/Linux/Release/amdgpu/onnxruntime/core/providers/rocm/tensor/transpose.cc ; line=65 ; expr=rocblasTransposeHelper(stream, rocblas_handle, rocblas_operation_transpose, rocblas_operation_transpose, M, N, &one, input_data, N, &zero, input_data, N, output_data, M); 

@stevelcb
Copy link
Author

log attached
graxpert.log.5.txt

@Rikyf3
Copy link
Contributor

Rikyf3 commented Aug 17, 2024

have you installed rocblas?

@stevelcb
Copy link
Author

stevelcb commented Aug 20, 2024

Thanks. It's not packaged for Ubuntu ATM. Should be as of 24.04, end of August.

@stevelcb
Copy link
Author

stevelcb commented Sep 3, 2024

OK. This is with rocblas installed on Ubuntu 24.04. It now finds only the CPUExecutionProvider:


python -m graxpert.main
2024-09-05 08:18:46,909 MainProcess root WARNING  Could not check for newest version
2024-09-05 08:18:58,739 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-09-05 08:18:58,740 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-09-05 08:18:58,740 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-09-05 08:18:58,739 ForkProcess-2 root INFO     stretch.stretch_channel started
2024-09-05 08:18:58,740 ForkProcess-3 root INFO     stretch.stretch_channel started
2024-09-05 08:18:58,740 ForkProcess-4 root INFO     stretch.stretch_channel started
2024-09-05 08:18:59,221 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-09-05 08:18:59,221 ForkProcess-2 root INFO     stretch.stretch_channel finished
2024-09-05 08:18:59,267 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-09-05 08:18:59,267 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-09-05 08:18:59,267 ForkProcess-4 root INFO     stretch.stretch_channel finished
2024-09-05 08:18:59,267 ForkProcess-3 root INFO     stretch.stretch_channel finished
2024-09-05 08:19:18,560 MainProcess root INFO     Starting denoising
2024-09-05 08:19:20,280 MainProcess root INFO     Available inference providers : ['CPUExecutionProvider']
2024-09-05 08:19:20,280 MainProcess root INFO     Used inference providers : ['CPUExecutionProvider']
2024-09-05 08:19:22,459 MainProcess root INFO     Progress: 1%
2024-09-05 08:19:24,157 MainProcess root INFO     Progress: 2%
2024-09-05 08:19:26,205 MainProcess root INFO     Progress: 3%
2024-09-05 08:19:27,916 MainProcess root INFO     Progress: 4%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants