Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm fails to install from APT repository in 22.04 #1713

Closed
erkinalp opened this issue Mar 24, 2022 · 101 comments
Closed

ROCm fails to install from APT repository in 22.04 #1713

erkinalp opened this issue Mar 24, 2022 · 101 comments

Comments

@erkinalp
Copy link

erkinalp commented Mar 24, 2022

Ubuntu 22.04's feature freeze has already passed and the version in the APT repository is not installable due to missing dependencies (in APT-based distributions, feature freeze is also the minor version freeze).

Main issues being:

  • python 3.8, whereas the earliest version available is 3.9
  • libstdc++ and libgcc symbol version 5 but the earliest version available is 9
@Bengt
Copy link

Bengt commented Mar 29, 2022

Hi, @erkinalp!

Thanks for this bug report. I am glad that users are already testing ROCm in the Ubuntu pre-release version.

For more context, see also the other issues about Ubuntu 22.04.

As a workaround for Python 3.8 not being part of Ubuntu 22.04, one should soon be able to install Python 3.8 from the commonly-used deadsnakes PPA. However, it does not support Ubuntu 22.04 (Jammy Jellyfish), yet: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa

@xuhuisheng
Copy link
Contributor

xuhuisheng commented Apr 22, 2022

So I move my test result to the issue.

because of gcc-11, I cannot install rocm-dev rocm-libs in ubuntu-22.04 docker image.

work@7cead9071756:/var/spool/apt-mirror/mirror$ sudo apt install -y rocm-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 openmp-extras : Depends: libstdc++-5-dev but it is not installable or
                          libstdc++-7-dev but it is not installable
                 Depends: libgcc-5-dev but it is not installable or
                          libgcc-7-dev but it is not installable
                 Recommends: gcc but it is not going to be installed
                 Recommends: g++ but it is not going to be installed
 rocm-gdb : Depends: libpython3.8 but it is not installable
 rocm-llvm : Depends: python but it is not installable
             Depends: libstdc++-5-dev but it is not installable or
                      libstdc++-7-dev but it is not installable
             Depends: libgcc-5-dev but it is not installable or
                      libgcc-7-dev but it is not installable
             Recommends: gcc but it is not going to be installed
             Recommends: g++ but it is not going to be installed
             Recommends: gcc-multilib but it is not going to be installed
             Recommends: g++-multilib but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

And I try to compile ROCm-5.1.1 in the ubuntu-22.04 docker image. boost-1.72.0, rocsolver had some compile errors.
The pytorch need to compile with python-3.10, and there seems to be some configuration errors, I am going on dig.

@Bengt
Copy link

Bengt commented Apr 23, 2022

DeadSnakes PPA now has support for Ubuntu 22.04 "Jammy Jellyfish":

https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa?field.series_filter=jammy

So, to fix the issue of Python 3.8 not being installed, you can install it like so:

sudo add-apt-repository --yes ppa:deadsnakes/ppa && \
sudo apt-get update && \
sudo apt install --yes python3.8

@erkinalp
Copy link
Author

@Bengt Python 3.8 is not supported for the entire lifetime of 22.04.

@Bengt
Copy link

Bengt commented Apr 23, 2022

True. I didn't mean to suggest that installing Python from the PPA should be the permanent solution for end users. I meant it as a workaround for people who want to debug the installation issues, like @xuhuisheng.

@xuhuisheng
Copy link
Contributor

Thank you @Bengt .
I have got that you want to show me a way to test pytorch on ubuntu-22.04 and pytorch-3.8.
But I think the major issue is gcc-11.

I had built a pytorch-1.11.0-py310.whl with a little patch of breadpad. but then I met a compile error when run mnist. It looks like miopen want to compile naive_conv.cpp for torchvision, And this kernel cannot compile properly on gcc-11.

@xuhuisheng
Copy link
Contributor

xuhuisheng commented Apr 24, 2022

update: 2022-04-24

I had commetted out a lot of codes to pass the compile step on ubuntu-22.04.
The main part of work is skip log_bench() in rocSOLVER, skip include some header, likes thread.h in hip_runtime.h.

Now pytorch-1.11.0 and tensorflow_rocm-2.8.1 run properly with ROCm-5.1.1 on ubuntu-22.04.
Just some small samples, I only test mnist today.

I think it wont be a problem for ROCm team to support ubuntu-22.04, just wait for them a while.

I will stay on ubuntu-20.04 recently. Maybe next year, maybe ROCm-6.0 can add official support for ubuntu-22.04, then I can upgrade to the new LTS.

@Bengt
Copy link

Bengt commented Apr 24, 2022

Hi, @xuhuisheng!

Yes, I meant installing Python from the PPA as a workaround, only. Maybe that has aided you in isolating the core issue of GCC-11.

Thanks for testing ROCm installation and compilation so thoroughly. I hope your information can help the @qROCmSupport team in their effort to make ROCm work under Ubuntu 22.04.

To my eyes, the next, most obvious target would be Ubuntu 22.04.1. This will be the first Linux kernel update which needs to be considered for building the lower-level libraries. However, only supporting the first point release of Ubuntu's LTS versions is not what I would like to consider solid support.

@erkinalp
Copy link
Author

Indeed, as @Bengt has always been saying, this should have been ready much earlier, with all the bugs ironed out by now.

@Laitaps
Copy link

Laitaps commented Apr 25, 2022

This makes it really hard to support AMD. On my laptop with an RTX 3080, everything just works. My desktop with Radeon Vll, not so much. I am very surprised. If AMD does not take steps to improve support, they will never have parity with Nvidia in ML. This was an opportunity to build some goodwill and you blew it.

@L3P3
Copy link

L3P3 commented May 11, 2022

@xuhuisheng could you please provide your changes as a fork so I may test it also?

@xuhuisheng
Copy link
Contributor

@L3P3 rocsolver had solved fmt issue. We can wait for the next version 5.2.

ROCm/rocSOLVER@2bbfb89

@SciPyPanda
Copy link

Is this issue still being solved? Any updates?

@xuhuisheng
Copy link
Contributor

@L3P3
Since ROCm-5.2 didn't release on May.
I upload a document for patching ROCm-5.1.3 on ubuntu-22.04.

https://github.com/xuhuisheng/rocm-build/tree/develop/ubuntu2204

@L3P3
Copy link

L3P3 commented Jun 27, 2022

@xuhuisheng sorry, I tried to follow your instructions but it is too hard to understand what I need to do.
Instructions are incomplete, not all steps explained. I hope they will release it officially soon. Thanks for help.

@L3P3
Copy link

L3P3 commented Jul 2, 2022

ROCm was released. Is this issue closable then?

@SciPyPanda
Copy link

Still erroring on old dependencies in my case.

sudo apt install rocm-dev5.2.0

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 rocm-gdb5.2.0 : Depends: libpython3.8 but it is not installable
 rocm-llvm5.2.0 : Depends: python but it is not installable
                  Depends: libstdc++-5-dev but it is not installable or
                           libstdc++-7-dev but it is not installable
                  Depends: libgcc-5-dev but it is not installable or
                           libgcc-7-dev but it is not installable
                  Recommends: gcc-multilib but it is not going to be installed
                  Recommends: g++-multilib but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

@xuhuisheng
Copy link
Contributor

The good news is the issues of roctracer and rocsolver had been solved.
https://github.com/xuhuisheng/rocm-build/tree/master/ubuntu2204

@arinc9
Copy link

arinc9 commented Jul 12, 2022

This makes it really hard to support AMD. On my laptop with an RTX 3080, everything just works. My desktop with Radeon Vll, not so much. I am very surprised. If AMD does not take steps to improve support, they will never have parity with Nvidia in ML. This was an opportunity to build some goodwill and you blew it.

@B0tBuilder I have a Radeon VII on my desktop too. Do you have multiple displays? I experience Gnome freezing for about 30 seconds when I turn off my second display. Waking up from suspend is also problematic. It takes about 30 seconds to wake up and the "Join Displays" option changes to "Single Display" and the primary display changes to the second display.

I assumed there might be something wrong with the GPU so I wanted to install the amdgpu driver but I see the same dependency issue as everyone else after running amdgpu-install.

@Laitaps
Copy link

Laitaps commented Jul 12, 2022

This makes it really hard to support AMD. On my laptop with an RTX 3080, everything just works. My desktop with Radeon Vll, not so much. I am very surprised. If AMD does not take steps to improve support, they will never have parity with Nvidia in ML. This was an opportunity to build some goodwill and you blew it.

@B0tBuilder I have a Radeon VII on my desktop too. Do you have multiple displays? I experience Gnome freezing for about 30 seconds when I turn off my second display. Waking up from suspend is also problematic. It takes about 30 seconds to wake up and the "Join Displays" option changes to "Single Display" and the primary display changes to the second display.

I assumed there might be something wrong with the GPU so I wanted to install the amdgpu driver but I see the same dependency issue as everyone else after running amdgpu-install.

I experience no such issue.

@mchaker
Copy link

mchaker commented Jul 18, 2022

Attempting to follow the ROCm installation docs, even with accepting the proprietary EULA:

sudo amdgpu-install --usecase="dkms,workstation,rocm" --opencl=rocr,legacy --vulkan=amdvlk,pro --accept-eula --no-32

results in:

The following packages have unmet dependencies:
 rocm-llvm : Depends: python but it is not installable
             Depends: libstdc++-5-dev but it is not installable or
                      libstdc++-7-dev but it is not installable
             Depends: libgcc-5-dev but it is not installable or
                      libgcc-7-dev but it is not installable
             Recommends: gcc-multilib but it is not going to be installed
             Recommends: g++-multilib but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

@Rmalavally
Copy link
Contributor

Thank you for reaching out with your query. Please note that ROCm v5.2 does not support Ubuntu v22.04.

Support for Ubuntu v22.04 will be made available in a future release.

ROCm Documentation Team

@Laitaps
Copy link

Laitaps commented Jul 18, 2022

What future release and when?

@Rmalavally
Copy link
Contributor

As a best practice, we do not commit to fixes in specific releases. Please continue to review our release documentation on our new portal at https://docs.amd.com.

ROCm Documentation Team

@Laitaps
Copy link

Laitaps commented Jul 18, 2022

It has been quite a while since 22.04 was released. Does anyone find this situation acceptable?

@L3P3
Copy link

L3P3 commented Jul 19, 2022

It has been quite a while since 22.04 was released. Does anyone find this situation acceptable?

I agree that this is a very annoying way of communication. AMD guys can do better, I think. This is just bad organisation.

@Martinc4321
Copy link

Still no fix ? @Rmalavally
Still getting an error on my Ubuntu 22.04 ...
With running "amdgpu-install --usecase=rocm"
All friends have fun with stable diffusion,
while me cannot get drivers :/

The following packages have unmet dependencies:
 openmp-extras : Depends: libstdc++-5-dev but it is not installable or
                          libstdc++-7-dev but it is not installable
                 Depends: libgcc-5-dev but it is not installable or
                          libgcc-7-dev but it is not installable
 rocm-llvm : Depends: python but it is not installable
             Depends: libstdc++-5-dev but it is not installable or
                      libstdc++-7-dev but it is not installable
             Depends: libgcc-5-dev but it is not installable or
                      libgcc-7-dev but it is not installable

@Martinc4321
Copy link

@zhang2amd ?

@Martinc4321
Copy link

I removed --purge amdgpu-install +
After installing using manual/ guide to get latest version:
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.3/page/How_to_Install_ROCm.html
I am getting this ...


amdgpu-install --usecase=rocm
Hit:1 http://sk.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://sk.archive.ubuntu.com/ubuntu jammy-updates InRelease                                                              
Hit:3 http://sk.archive.ubuntu.com/ubuntu jammy-backports InRelease                                                            
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease                                                               
Hit:5 https://apt.repos.intel.com/mkl all InRelease                                                                            
Hit:6 https://ppa.launchpadcontent.net/kisak/kisak-mesa/ubuntu jammy InRelease                                                 
Hit:7 https://ppa.launchpadcontent.net/oibaf/graphics-drivers/ubuntu jammy InRelease                     
Hit:8 https://repo.radeon.com/amdgpu/5.3/ubuntu focal InRelease       
Hit:9 https://repo.radeon.com/rocm/apt/5.3 focal InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
linux-headers-5.15.0-50-generic is already the newest version (5.15.0-50.56).
linux-headers-5.15.0-50-generic set to manually installed.
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 rocm-dev : Depends: rocm-cmake (= 0.8.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-device-libs (= 1.0.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-utils (= 5.3.0.50300-63~20.04) but it is not going to be installed

@Rmalavally
Copy link
Contributor

Thank you for reaching out. Let me discuss this internally and get back to you. Appreciate your letting us know.

Regards,
ROCm Documentation Team

@Martinc4321
Copy link

example
I tried to isntall older version of rocm using
"sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.0 --no-dkms"

Got this:```

sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.0 --no-dkms
Hit:1 http://sk.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://sk.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://sk.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 https://ppa.launchpadcontent.net/kisak/kisak-mesa/ubuntu jammy InRelease
Hit:5 https://apt.repos.intel.com/mkl all InRelease
Hit:6 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:7 https://ppa.launchpadcontent.net/oibaf/graphics-drivers/ubuntu jammy InRelease
Hit:8 https://repo.radeon.com/amdgpu/5.3/ubuntu focal InRelease
Hit:9 https://repo.radeon.com/rocm/apt/5.3 focal InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package rocm-dev4.5.0
E: Couldn't find any package by glob 'rocm-dev4.5.0'
E: Couldn't find any package by regex 'rocm-dev4.5.0'

@Martinc4321
Copy link

My gpu info:

sudo lshw -C display
  *-display                 
       description: VGA compatible controller
       product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:28:00.0
       logical name: /dev/fb0
       version: c4
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom fb
       configuration: depth=32 driver=amdgpu latency=0 resolution=2560,1080
       resources: irq:105 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fcd00000-fcd7ffff memory:c0000-dffff

@Martinc4321
Copy link

Here is rocminfo output:

(base) conto@conto-MS-7B85:~$ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16302516(0xf8c1b4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16302516(0xf8c1b4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16302516(0xf8c1b4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5700                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
  Chip ID:                 29471(0x731f)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1750                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

@xuhuisheng
Copy link
Contributor

@Martinc4321
Unfortunately, gfx1010 is not on the list of ROCm offical support list.

And I test gfx1010 on ROCm-5.3.0 with export HSA_OVERRIDE_GFX_VERSION=10.3.0 hack, but it broke on ROCm-5.3.0 this time.

@njzjz
Copy link

njzjz commented Oct 26, 2022

The following packages have unmet dependencies:
 rocm-dev : Depends: rocm-cmake (= 0.8.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-device-libs (= 1.0.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-utils (= 5.3.0.50300-63~20.04) but it is not going to be installed

@Martinc4321 I got how to resolve this error. Add a pin-priority file to /etc/apt/preferences.d before executing apt-get:

printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | tee /etc/apt/preferences.d/rocm-pin-600 

@martadinata666
Copy link

I just following the guide, and it install correclty, i saw the apt log that many still using focal instead updating repo to jammy

deb [arch=amd64] https://repo.radeon.com/amdgpu/5.3/ubuntu jammy main

@erkinalp
Copy link
Author

Resolved in 5.3

@mchaker
Copy link

mchaker commented Nov 10, 2022

Thank you so much Radeon/ROCm team!

image

@hostingnuggets
Copy link

hostingnuggets commented Nov 11, 2022

So is this also finally fixed inside the AMD GPU installer from AMD for Ubuntu 22.04 LTS?

@Martinc4321
Copy link

I can confirm. Finally i can use my hardware. And use i will.
Many thanks ;)

There was just One small error with python libs but i was able to solve it pretty quickly.
"ImportError: libamdhip64.so.5

@Martinc4321
Copy link

Martinc4321 commented Nov 22, 2022

@Martinc4321 Unfortunately, gfx1010 is not on the list of ROCm offical support list.

And I test gfx1010 on ROCm-5.3.0 with export HSA_OVERRIDE_GFX_VERSION=10.3.0 hack, but it broke on ROCm-5.3.0 this time.

As you sad now i can see message about unsupported device.
Export you mentioned enables it to not show given error but it will die on:

2022-11-22 14:22:53.461610: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.535615: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.535677: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.535991: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-22 14:22:53.536651: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536716: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536752: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536858: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536908: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536946: I tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-22 14:22:53.536969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7676 MB memory:  -> device: 0, name: AMD Radeon RX 5700, pci bus id: 0000:28:00.0
2022-11-22 14:22:53.746371: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:00.310785: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
Transforming target 0
Transforming target 1
Transforming target 2
Transforming target 3
2022-11-22 14:23:08.376620: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:08.714553: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
TF dataset transform done
Creating TF input data structure
2022-11-22 14:23:20.345216: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.345847: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.351685: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.358503: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.359383: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.378206: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.380787: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-11-22 14:23:20.383370: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
Memory access fault by GPU node-1 (Agent handle: 0x557dd7244580) on address 0x7fee70b23000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

EDIT:
Different version seems to work
export HSA_OVERRIDE_GFX_VERSION=10.0.0

EDIT:
I found out it uses CPU with that env var. ...
So yeah it is not working as you said .

@Martinc4321
Copy link

OK, combination of Ubuntu 20.04 and rocm-5.2.0 on the official TensorFlow docker image (hub.docker using instruction in description)
Inside the docker image running
and adding export HSA_OVERRIDE_GFX_VERSION=10.3.0
and runing base stuff as "apt update", installing pip+packages(you use).

It uses my GPU Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]!!! ( Checked by radeontop, that it is uses its resources.)
(sadly it is slower than using CPU outside docker ( But someone might make OS image from that docker and run it natively. ( I don't plan to do that.)))

@karim789
Copy link

karim789 commented Dec 12, 2022

doesnt work for Ubuntu 22.04.1 and installer amdgpu-install_22.20.50200-1_all.deb

$ sudo LANG=C amdgpu-install
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [114 kB]
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:7 https://repo.radeon.com/amdgpu/22.20/ubuntu jammy InRelease
Hit:8 https://repo.radeon.com/rocm/apt/5.2 ubuntu InRelease
Fetched 224 kB in 1s (371 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
linux-headers-5.15.0-56-generic is already the newest version (5.15.0-56.62).
linux-headers-5.15.0-56-generic set to manually installed.
linux-modules-extra-5.15.0-56-generic is already the newest version (5.15.0-56.62).
linux-modules-extra-5.15.0-56-generic set to manually installed.
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
rocm-llvm : Depends: python but it is not installable
Depends: libstdc++-5-dev but it is not installable or
libstdc++-7-dev but it is not installable
Depends: libgcc-5-dev but it is not installable or
libgcc-7-dev but it is not installable
Recommends: gcc-multilib but it is not going to be installed
Recommends: g++-multilib but it is not going to be installed
E: Unable to correct problems, you have held broken packages

@xuhuisheng
Copy link
Contributor

@karim789
Please try latest ROCm-5.4
https://repo.radeon.com/amdgpu/5.4/ubuntu/

@karim789
Copy link

@karim789 Please try latest ROCm-5.4 https://repo.radeon.com/amdgpu/5.4/ubuntu/

The issue is rather that the link to the amdgpu installer on the AMD driver page points to an obsolete driver .deb file.

Anyway it appears that my R9 280X isn't supported by the amdgpu driver.
I wanted to try proprietary driver hoping it would fix the issue where I have no Atmos audio working below a screen refresh rate of 30hz, therefore no 24p which is annoying to watch movies.
Considering the issue comes with R9 280X or intel HD4600 and they share the same intel_hda_audio driver and that it works with a Nvidia card, then I probably it's this driver that has an issue.

@Apisteftos
Copy link

I installed the latest version rocm-5.4.3 and is not compatible with my hardware gfx 8.3.0. I am using Ubuntu 22.04.1 LTS. I have Radeon RX 480 with compatible VGA controller Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev c7). Hence, I uninstalled the rocm-5.4.3 version and I am trying to install the rocm-4.5.x version. While I trying to install this version under the command amdgpu-install -y --usecase=rocm throws me couple of dependencies packages.

The following packages have unmet dependencies:
 openmp-extras : Depends: libstdc++-5-dev but it is not installable or
                          libstdc++-7-dev but it is not installable
                          Depends: libgcc-5-dev but it is not installable or
                          libgcc-7-dev but it is not installable
 rocm-llvm : Depends: python but it is not installable
                    Depends: libstdc++-5-dev but it is not installable or
                      libstdc++-7-dev but it is not installable
                      Depends: libgcc-5-dev but it is not installable or
                      libgcc-7-dev but it is not installable
E: Unable to correct problems, you have held broken packages.

Now, I came to this thread and reading all the above comments, tried to install these dependencies. I installed python 3.8 under the command:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.8

downloading the package: sudo apt-get download rocm-llvm was just a brocken package. Opening the Synaptic Package Manager and going on search rocm-llvm , right click, mark for installation throws me a message:

Package rocm-llvm has no available version, but exists in the database.
This typically means that the package was mentioned in a dependency and never uploaded, has been obsoleted or is not available with the contents of sources.list

So it's not possible to install the rocm-4.5.x

@stackcverflow
Copy link

For those who want to do it quickly,

Here's a summary of the procedure for Ubuntu v22.04 (Jammy) from the ROCm documentation
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html#d23e2091

Installation

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/jammy/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb

# Add repositories
echo 'deb [arch=amd64] https://repo.radeon.com/amdgpu/latest/ubuntu jammy main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ jammy main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update

# Install Kernel mode (That may already be installed using the above commands)
sudo apt install amdgpu-dkms

# Reboot
sudo reboot

Verification after reboot

### Kernel mode check
dkms status
>>> Response : "amdgpu/5.18.13-1538762.22.04....."

### ROCm installation check
sudo /opt/rocm-5.4.3/bin/rocminfo
>>> Response : "ROCk module is loaded"

Note: It works for my old RX 480, so it should work if you have a newer processor.

you can find the ROCm support GPU list here: https://github.com/ROCm/ROCm.github.io/blob/master/hardware.md)

@chriamue
Copy link

chriamue commented Oct 2, 2023

The following packages have unmet dependencies:
 rocm-dev : Depends: rocm-cmake (= 0.8.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-device-libs (= 1.0.0.50300-63~20.04) but 5.0.0-1 is to be installed
            Depends: rocm-utils (= 5.3.0.50300-63~20.04) but it is not going to be installed

@Martinc4321 I got how to resolve this error. Add a pin-priority file to /etc/apt/preferences.d before executing apt-get:

printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | tee /etc/apt/preferences.d/rocm-pin-600 

Pin-Priority alone didn't help me but running amdgpu-install with --rocmrelease=5.7.0 worked for me:

sudo amdgpu-install --usecase=dkms,graphics,rocm,hip,hiplibsdk --rocmrelease=5.7.0

@FlorianHeigl
Copy link

As a best practice, we do not commit to fixes in specific releases. Please continue to review our release documentation on our new portal at https://docs.amd.com.

ROCm Documentation Team

it should be noted in the documentation. they're presented as equal choices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests