How to allocate more memory to my Ryzen APU's GPU? #2014

winstonma · 2023-04-03T09:00:56Z

I am running AMD 6800U on my Ubuntu 22.04 and I installed the AMD driver. I checked that the default system would allocate 512MB RAM to VRAM to the GPU.

I followed some instruction from other github issue to create a rocm/pytorch docker image and it has no problem detecting my GPU but it has problem running sample program, due to OutOfMemoryError.

torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 2.00 MiB (GPU 0; 512.00 MiB total capacity; 150.39 MiB already allocated; 312.00 MiB free; 168.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

So my guess is ROCm support APU but I just need to allocate more system memory to my GPU before going into the docker environment. Are there any people who know how to modify the memory allocation of AMD APU? Thanks in advance

The text was updated successfully, but these errors were encountered:

ye-luo · 2023-04-03T15:16:30Z

An alternative route is to change the DMA buffer to 2GB in your BIOS.

winstonma · 2023-04-03T15:21:22Z

Thanks. I tried to find this setting inside BIOS with no luck.

It seems to me that AMD APU memory setting can be modified inside the windows but I have no knowledge of the corresponding configuration within Linux.

Ristovski · 2023-05-27T11:00:58Z

It appears ROCm does not take into account dynamic VRAM GTT allocation on APUs (handled by amdkfd?).

For example on my system:

[    3.524465] [drm] amdgpu: 64M of VRAM memory ready
[    3.524466] [drm] amdgpu: 15916M of GTT memory ready.

This means the iGPU could allocate up to 16GB of VRAM, however due to the small dedicated VRAM (64MB) that is reserved for the iGPU, ROCm processes fail to allocate memory.

This also seems to propagate to OpenCL:

$ clinfo |grep -i memory
  Global memory size                              67108864 (64MiB)
  Global free memory (AMD)                        65536 (64MiB) 65536 (64MiB)
  Global memory channels (AMD)                    4
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Max memory allocation                           57042528 (54.4MiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory size per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32

This is NOT the case for OpenGL tho (snippet of glxinfo -B):

Memory info (GL_ATI_meminfo):
    VBO free memory - total: 206 MB, largest block: 206 MB
    VBO free aux. memory - total: 15138 MB, largest block: 15138 MB
    Texture free memory - total: 206 MB, largest block: 206 MB
    Texture free aux. memory - total: 15138 MB, largest block: 15138 MB
    Renderbuffer free memory - total: 206 MB, largest block: 206 MB
    Renderbuffer free aux. memory - total: 15138 MB, largest block: 15138 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 256 MB
    Total available memory: 16172 MB

winstonma · 2023-05-28T07:35:34Z

@Ristovski Thanks for the answer

I think this is a needed feature (otherwise all the APU would be ROCm-compatible but it doesn't work due to the memory allocation problem).

I am not sure if this should be fixed on the driver level or the ROCm level (or both).

asbachb · 2023-06-12T02:42:38Z

This bug is indeed super annoying. Most laptop models only provide limited settings regarding UMA buffer size. So ROCm is quite limited or useless for APUs on Linux.

winstonma · 2023-06-12T06:29:11Z

Thanks @asbachb I plan to just use my laptop just to do some primitive testing with a subset of data. I didn't plan to some fancy calculation with all the data. But since I am using Ryzen CPU I think I could still get acceleration using the GPU with the subset.

Current I guess everything is working except the memory allocation.

asbachb · 2023-06-12T06:37:54Z

Just for reference some guy got some quite good results: (compared to CPU only): https://www.gabriel.urdhr.fr/2022/08/28/trying-to-run-stable-diffusion-on-amd-ryzen-5-5600g/

But he is able to set his VRAM to 16GB.

winstonma · 2023-06-12T07:45:38Z

May I share what I saw in Windows envoirnment?

My laptop has 16GB of memory. This is what is being shown inside task manager

It shows that the

Dedicated GPU memory is 512MB
Shared GPU memory is 7.6GB
GPU memory is 8.1GB (It should be 7.6GB+512MB)

While I play GTA it shows that I have 8GB of GPU memory. I would wonder if it is possible that the ROCm would have the similar fashion, "thinking" that my GPU has 8GB of GPU memory.

But when I reboot back to my Ubuntu it shows that my GPU has 512MB of memory, which I guess that's the dedicated GPU memory that you can view in Windows. However in the application perspective it sees 8GB of memory, to me it should be the same as ROCm application should be able to see that my system should have 8GB of GPU memory, similar to GTA result in Windows.

It seems to me there is a missing piece in between.

winstonma · 2023-06-12T08:01:15Z

@asbachb Sorry I forgot to answer the question in your post. First of all thank you very much.

I read the post and the author said that he modified the memory sharing behavior in BIOS. Which I don't have this settings in my laptop. Also in the previous answer the windows application sees the total GPU memory, not just the dedicated memory.

randomstuff · 2023-06-12T20:35:09Z

@winstonma, I'm the author of the "Stable diffusion on an AMD Ryzen 5 5600G" post. Yes, sadly, I did not find any way to increase the VRAM allocated to the iGPU without going to the BIOS/motherboard firmware.

winstonma · 2023-06-13T01:17:46Z

@randomstuff Thanks. But sadly the BIOS of my laptop doesn't have the VRAM allocation option. So I wonder if there are ways to increase the dedicated memory on system level (similar to Regedit in Windows)

pomoke · 2023-07-27T06:53:00Z

PyTorch will work with replaced allocator which uses hipHostMalloc. I have a working snippet at https://github.com/pomoke/torch-apu-helper.

winstonma · 2023-08-14T02:29:41Z

For all who is interested in increasing dedicated memory in system, I wrote an article Unlocking GPU Memory Allocation on AMD Ryzen™ APU and thought that method I could adjust the amount of dedicated GPU memory, even the BIOS doesn't provide that option.

winstonma · 2023-08-14T03:51:37Z

PyTorch will work with replaced allocator which uses hipHostMalloc. I have a working snippet at https://github.com/pomoke/torch-apu-helper.

Thanks.

I currently using alternative ways to unlock the BIOS, modify the amount of GPU dedicated memory to 8GB. After that I could run stable-diffusion-webui using the above method without any problem.

Also I cloned your repo, created alloc.so and add your code snippet in the launch.py of stable-diffusion-webui. When I start running stable diffusion then it starts with a warning (but it can start):

Warning: caught exception 'CUDAPluggableAllocator does not yet support getDeviceStats. If you need it, please file an issue describing your use case.', memory monitor disabled

However when I start generating image I got the following error:

RuntimeError: CUDAPluggableAllocator does not yet support getDeviceStats. If you need it, please file an issue describing your use case.

I guess it's due to the fact that CUDAPluggableAllocator couldn't provide all API which is required by the application yet. And thus no image is being generated. I am not sure if I miss anything. @pomoke not sure if you feel free to try running stable-diffusion-webui on your APU and see if that works. Thanks

Finally, I guess this is the memory auto-allocation should be the right direction. In Windows the system would auto-allocate the system memory to the GPU based on the software used. I guess PyTorch/ROCm should be talking to the system to auto-allocate a share of system memory to the GPU on demand. I just filed a feature request in PyTorch, really wish they could request GPU memory from AMD APU.

winstonma · 2023-11-09T10:20:53Z

In conclusion, there are several solutions for other AMD APU user

Override the BIOS settings to allocate more memory. This method like the old days that you set your dedicated video memory in BIOS. More VRAM means less system memory.
torch-apu-helper uses the the Unified Memory Architecture (UMA), the APU would be able to allocate the memory from the system dynamically. It is a good demo but this way would not get all API working (e.g. getDeviceStats). If you are using application based on PyTorch, it would be likely that it would not work. I filed an issue on PyTorch, hopefully they can add native AMD APU support.

pomoke · 2023-11-11T11:00:13Z

Generally, we still need something like what we get on Tegra (Such as Jetson Developer Kit). CUDA on Tegra is capable of allocating GTT properly.

winstonma · 2023-11-13T04:26:24Z

Yes I think the problem now is the torch-apu-helper shows that PyTorch could grab system memory from the system as video memory. However I think it still need more PyTorch API porting so we can say this way is PyTorch ready. Not sure AMD or PyTorch would make this possible in the future.

gwyllion92 · 2024-01-01T23:35:50Z

Yes I think the problem now is the torch-apu-helper shows that PyTorch could grab system memory from the system as video memory. However I think it still need more PyTorch API porting so we can say this way is PyTorch ready. Not sure AMD or PyTorch would make this possible in the future.

Hi, I'm trying to follow your guide https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056 but when I run this command SA_OVERRIDE_GFX_VERSION=10.3.0 python test-rocm.py

I get this message:
Command "python" not found. Maybe your meant:
"python3" command from the deb package "python3".
the "python" command from the deb package "python-is-python3".

So I changed the command to HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 test-rocm.py

But I get this message now:
Traceback (most recent call last):
File "/home/myUser/test-rocm.py", line 1, in
import torch, grp, pwd, os, subprocess
ModuleNotFoundError: No module named 'torch'

I can't find any solution, and I'm not a programmer person.

Thanks for your time!

winstonma · 2024-01-02T00:13:38Z

Yes I think the problem now is the torch-apu-helper shows that PyTorch could grab system memory from the system as video memory. However I think it still need more PyTorch API porting so we can say this way is PyTorch ready. Not sure AMD or PyTorch would make this possible in the future.

Hi, I'm trying to follow your guide https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056 but when I run this command SA_OVERRIDE_GFX_VERSION=10.3.0 python test-rocm.py

I get this message: Command "python" not found. Maybe your meant: "python3" command from the deb package "python3". the "python" command from the deb package "python-is-python3".

So I changed the command to HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 test-rocm.py

But I get this message now: Traceback (most recent call last): File "/home/myUser/test-rocm.py", line 1, in import torch, grp, pwd, os, subprocess ModuleNotFoundError: No module named 'torch'

I can't find any solution, and I'm not a programmer person.

Thanks for your time!

First of all, you need to make sure that you are

Running Linux (I personally prefer Ubuntu LTS but any major linux release would do)
You are running Ryzen APU, you can go to AMD APU Website and see if you can find your CPU

For PyTorch installation you need to check the PyTorch Official website. Here are the steps:

Go to PyTorch Official Website
Scroll down a bit. You will see the widget like the one below

Just choose the option like the above image
You will get the command you need at the end. Then just copy and paste the command on your terminal. Then wait til the installation is completed
After PyTorch is installed, you can rerun the test script again

gwyllion92 · 2024-01-02T01:05:01Z

Yes I think the problem now is the torch-apu-helper shows that PyTorch could grab system memory from the system as video memory. However I think it still need more PyTorch API porting so we can say this way is PyTorch ready. Not sure AMD or PyTorch would make this possible in the future.

Hi, I'm trying to follow your guide https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056 but when I run this command SA_OVERRIDE_GFX_VERSION=10.3.0 python test-rocm.py
I get this message: Command "python" not found. Maybe your meant: "python3" command from the deb package "python3". the "python" command from the deb package "python-is-python3".
So I changed the command to HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 test-rocm.py
But I get this message now: Traceback (most recent call last): File "/home/myUser/test-rocm.py", line 1, in import torch, grp, pwd, os, subprocess ModuleNotFoundError: No module named 'torch'
I can't find any solution, and I'm not a programmer person.
Thanks for your time!

First of all, you need to make sure that you are
1. Running Linux (I personally prefer Ubuntu LTS but any major linux release would do)

2. You are running Ryzen APU, you can go to [AMD APU Website](https://www.amd.com/en/products/specifications/apu) and see if you can find your CPU
For PyTorch installation you need to check the PyTorch Official website. Here are the steps:
1. Go to [PyTorch Official Website](https://pytorch.org/get-started/locally/)

2. Scroll down a bit. You will see the widget like the one below
   ![image](https://private-user-images.githubusercontent.com/1215090/293612351-b34133a5-9574-49e8-aeca-c1efcb1b7dc3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDQxNTUyMjIsIm5iZiI6MTcwNDE1NDkyMiwicGF0aCI6Ii8xMjE1MDkwLzI5MzYxMjM1MS1iMzQxMzNhNS05NTc0LTQ5ZTgtYWVjYS1jMWVmY2IxYjdkYzMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDEwMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAxMDJUMDAyMjAyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MDdjZjdmYTNlNjgzMGEwNDQyMTc3MGQ4OGMxZDFkZmE4YjFkMzU2MzYwMjU1MjI5Njk0NDJlNTEyZjVkZTkzYyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Sci9x2SI3xvOmpY82FKL4XLMMQjVcbHlUta_Lg5GmiM)
   Just choose the option like the above image

3. You will get the command you need at the end. Then just copy and paste the command on your terminal. Then wait til the installation is completed

4. After PyTorch is installed, you can rerun the test script again

Hi, thanks for your fast response.

I run the command and I get this message:
Checking ROCM support...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.

So that means that I'm stuck with 512mb vram? (My bios don't have the option to modify it ).

winstonma · 2024-01-02T03:41:08Z

rocminfo exist only after AMD Graphic Driver is installed. You need to run the following command to install the AMD Driver

Option 1

Here is the official installation guide from AMD. I think it should work for everybody

Option 2

# Install AMD Driver
TEMP_FOLDER="/tmp"
TEMP_DRIVER_HTML="amd-driver.html"
DISTRO_CODENAME=$(lsb_release --codename --short)

# Find the package URL from AMD website
AMD_DRIVER_URL="https://www.amd.com/en/support/linux-drivers"
URL_RESPONSE=$(wget -U 'Mozilla/5.0' -qO- ${AMD_DRIVER_URL})
AMD_DEB_URL=$(echo $URL_RESPONSE | grep -o 'https://[^ "<]*.deb' | grep $DISTRO_CODENAME | head -1)
FILENAME=$(basename $AMD_DEB_URL)

# Download and install the driver package
wget -P $TEMP_FOLDER $AMD_DEB_URL
sudo dpkg -i $TEMP_FOLDER/$FILENAME
rm $TEMP_FOLDER/$FILENAME

amdgpu-install -y --usecase=rocm

After the AMD Graphic driver is installed, you can run the test script again. You can also find this snippet inside the document that I wrote.

By the way I suggest you running the following command first, I would like to ensure your APU could be supported

$ lscpu | grep "Model name"

edt-xx · 2024-01-02T14:57:03Z

It appears ROCm does not take into account dynamic VRAM GTT allocation on APUs (handled by amdkfd?).

For example on my system:
[    3.524465] [drm] amdgpu: 64M of VRAM memory ready
[    3.524466] [drm] amdgpu: 15916M of GTT memory ready.
This means the iGPU could allocate up to 16GB of VRAM, however due to the small dedicated VRAM (64MB) that is reserved for the iGPU, ROCm processes fail to allocate memory.

This also seems to propagate to OpenCL:

If you use the mesa rusticl opencl driver you will find that it uses the GTT memory as well as the VRAM.

gwyllion92 · 2024-01-04T02:51:43Z

rocminfo exist only after AMD Graphic Driver is installed. You need to run the following command to install the AMD Driver
# Install AMD Driver
TEMP_FOLDER="/tmp"
TEMP_DRIVER_HTML="amd-driver.html"
DISTRO_CODENAME=$(lsb_release --codename --short)

# Find the package URL from AMD website
AMD_DRIVER_URL="https://www.amd.com/en/support/linux-drivers"
URL_RESPONSE=$(wget -U 'Mozilla/5.0' -qO- ${AMD_DRIVER_URL})
AMD_DEB_URL=$(echo $URL_RESPONSE | grep -o 'https://[^ "<]*.deb' | grep $DISTRO_CODENAME | head -1)
FILENAME=$(basename $AMD_DEB_URL)

# Download and install the driver package
wget -P $TEMP_FOLDER $AMD_DEB_URL
sudo dpkg -i $TEMP_FOLDER/$FILENAME
rm $TEMP_FOLDER/$FILENAME

amdgpu-install -y --usecase=rocm
After the AMD Graphic driver is installed, you can run the test script again. You can also find this snippet inside the document that I wrote.

By the way I suggest you running the following command first, I would like to ensure your APU could be supported
$ lscpu | grep "Model name"

Hi, after I run amdgpu-install -y --usecase=rocm , I get this WARNING: amdgpu dkms failed for running kernel

winstonma · 2024-01-04T04:37:01Z

I guess the reason the AMD Graphics Driver doesn't officially support default Ubuntu kernel or your kernel is too old. I think you should run the following step to install the latest default Ubuntu kernel:

sudo apt update
sudo apt install linux-generic-hwe-22.04

Then reboot your system, and wait for the GRUB boot menu comes up. Once in the GRUB menu, select the Advanced options for Ubuntu using the arrow keys and press Enter. Then select the default Ubuntu kernel (in my system it is Ubuntu, with Linux 6.2.0-39-generic).

Then go back to terminal and check what kernel is running on your system (this is my output to let you know what is the expected output):

$ awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg 
Ubuntu
Ubuntu, with Linux 6.6.8-zabbly+
Ubuntu, with Linux 6.6.8-zabbly+ (recovery mode)
Ubuntu, with Linux 6.2.0-39-generic
Ubuntu, with Linux 6.2.0-39-generic (recovery mode)

Memory test (memtest86+x64.efi, serial console)
Windows Boot Manager (on /dev/nvme0n1p1)
UEFI Firmware Settings

The installed kernel is listed between Ubuntu and the extra line (so in my system it is Linux 6.6.8-zabbly+ and Linux 6.2.0-39-generic, ignore the recovery mode option). If you see not only Linux 6.2.0-39-generic is installed, remove that kernel (please check this guide for reference.

The default system for installing AMD Graphic Driver should have the output similar to this:

$ awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg 
Ubuntu
Ubuntu, with Linux 6.2.0-39-generic
Ubuntu, with Linux 6.2.0-39-generic (recovery mode)

Memory test (memtest86+x64.efi, serial console)
Windows Boot Manager (on /dev/nvme0n1p1)
UEFI Firmware Settings

Only one default Ubuntu kernel.

After you make sure everything is fine. Then reinstall the AMD Graphics Driver

amdgpu-uninstall
amdgpu-install -y --usecase=rocm

After AMD Graphic Card driver is installed. You can install back the kernel of your choice (or you can stick with the default kernel which is fine).

AMD driver developer always stated that their driver are tested (only) on the default system (specific version of Linux running the default kernel). Therefore installing or upgrading the AMD Driver is a pain on my ass because I am not using default kernel. I need to boot to the default kernel, remove the custom kernel, remove the old graphic driver, install the new graphic driver, install back the custom kernel, reboot back to the custom kernel. It's really painful.

winstonma · 2024-01-04T04:51:29Z

If you use the mesa rusticl opencl driver you will find that it uses the GTT memory as well as the VRAM.

Just wonder if Pytorch use GTT if I use the mesa rusticl opencl driver. If mesa rusticl opencl driver doesn't work with Pytorch don't use GTT then I still have to use this BIOS modification method to make Pytorch (or Stable Diffusion in my case) work.

DocMAX · 2024-02-25T22:15:51Z

Doesn't work on ollama :-(

pappacena · 2024-04-02T01:20:19Z

@segurac great job! I've done some tests, and your memory allocator worked quite well!

To make it easier for future travellers, I've created this PyPI package with your code, so anyone can overcome the memory limitation by running this:

!pip install pytorch_rocm_gtt
import pytorch_rocm_gtt
pytorch_rocm_gtt.patch()

After that, I'm able to allocate way more than the 512MB dedicated to my Radeon 760M iGPU with something like this:

>>> torch.rand(1024 * 1024 * 600, dtype=torch.float32, device="cuda") * 10000
tensor([5880.2114, 1243.6473, 3648.3640,  ..., 3309.8118, 6534.8770,
        6562.5479], device='cuda:0')

DerRehberg · 2024-04-13T08:45:03Z

Can someone give me a command line tutorial to use more than 2GB with Stable Diffusion? I managed to run it and it almost generated the picture. But I only got 2GB of VRAM and it doesn't have enough video memory to finish the picture. I'm alreadys thinking about download windows to allocate more if possible. It stops right before I can see the finished images. Newbie on Arch here, I'm happy than I can run Ollama with APU Support, would love Stable Diffusion too. My System seems to use almost 1 GB of it, even after a blank restart.

winstonma · 2024-04-13T09:24:16Z

@DerRehberg Which CPU are you running? Could you print the result of the following command? This is the output from my system:

$glxinfo -B
...
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 512 MB
    Total available memory: 8126 MB
    Currently available dedicated video memory: 57 MB
...

DerRehberg · 2024-04-13T09:37:34Z

@winstonma
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 2048 MB
Total available memory: 12977 MB
Currently available dedicated video memory: 1531 MB

AMD Ryzen 7 5825U

Currently testing generating in CPU Mode, but it's 30 times slower, atleast it feels like it

winstonma · 2024-04-13T09:53:15Z

@DerRehberg Based on your output your system I think you dedicated 2GB of memory to BIOS. Could you check one more thing? I would like to see the GTT memory allowed.

$ sudo dmesg | grep amdgpu | grep drm
[    3.929856] [drm] amdgpu kernel modesetting enabled.
[    3.954417] [drm] amdgpu: 512M of VRAM memory ready
[    3.954419] [drm] amdgpu: 7614M of GTT memory ready.
[    4.271639] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:03:00.0 on minor 0
[    4.283886] fbcon: amdgpudrmfb (fb0) is primary device
[    5.698795] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device

I think you need to first disable the dedicated memory. Please use feel free to read Unlocking GPU Memory Allocation on AMD Ryzen™ APU? - Prepare the bootable USB drive then go to the next section and set it back to AUTO

Go to Device Manager→AMD CBS→NBIO Common Option→GFX Configuration
In Integrated Graphics Controller, select Auto (I am not sure the correct wording but don't use Force)
In UMA Mode, select Auto (I am not sure the correct wording but don't use UMA_SPECIFIED)

After set it back to auto then you can use force-host-alloction-APU with Pytorch to run stable diffusion on your Graphic Card.

Please leave message if you still have question.

DerRehberg · 2024-04-13T10:34:20Z

@winstonma
Can you give me a detailed instruction how to use force-host-alloction-apu on arch with stable diffusion?

[ 5.278774] [drm] amdgpu kernel modesetting enabled.
[ 5.392349] [drm] amdgpu: 2048M of VRAM memory ready
[ 5.392352] [drm] amdgpu: 10929M of GTT memory ready.
[ 6.617177] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:03:00.0 on minor 1
[ 6.624152] fbcon: amdgpudrmfb (fb0) is primary device
[ 6.667498] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 1557.803120] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[ 1557.803120] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]

I wouldn't even know how to compile

winstonma · 2024-04-13T10:49:55Z

@DerRehberg I think you should have 10GB of memory. But you better

# Replace your PyTorch CPU version with PyTorch ROCm version
$ pip uninstall torch torchvision torchaudio
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

# Go to your Stable Diffusion folder

# Compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so  -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU

# Check your HSA_OVERRIDE_GFX_VERSION value
$ rocminfo | grep gfx
  Name:                    gfx1030                            
      Name:                    amdgcn-amd-amdhsa--gfx1030    

# Run stable diffusion (Please update HSA_OVERRIDE_GFX_VERSION based on the previous output)
$ LD_PRELOAD=libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=10.3.0 python launch.py

DerRehberg · 2024-04-13T10:51:49Z

@winstonma Pip uninstall says I should use the AUR xd
Also I've set my GFX Version already in /etc/enviroments
also

LD_PRELOAD=libforcegttalloc.so python launch.py
ERROR: ld.so: object 'libforcegttalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libforcegttalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libforcegttalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Installing clip
Traceback (most recent call last):
File "/home/vacanickel/stable-diffusion-webui/launch.py", line 48, in
main()
File "/home/vacanickel/stable-diffusion-webui/launch.py", line 39, in main
prepare_environment()
File "/home/vacanickel/stable-diffusion-webui/modules/launch_utils.py", line 393, in prepare_environment
run_pip(f"install {clip_package}", "clip")
File "/home/vacanickel/stable-diffusion-webui/modules/launch_utils.py", line 143, in run_pip
return run(f'"{python}" -m pip {command} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}", live=live)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vacanickel/stable-diffusion-webui/modules/launch_utils.py", line 115, in run
raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't install clip.
Command: "/usr/bin/python" -m pip install https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip --prefer-binary
Error code: 1
stderr: ERROR: ld.so: object 'libforcegttalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object 'libforcegttalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try 'pacman -S
python-xyz', where xyz is the package you are trying to
install.

If you wish to install a non-Arch-packaged Python package,
create a virtual environment using 'python -m venv path/to/venv'.
Then use path/to/venv/bin/python and path/to/venv/bin/pip.

If you wish to install a non-Arch packaged Python application,
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. Make sure you have python-pipx
installed via pacman.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

DerRehberg · 2024-04-13T14:23:44Z

@winstonma Why can't it load libforcegttalloc?

EDIT: Got it working by using the full path, but it crashes my whole apu, my kde desktop and stable diffusion
HW Exception by GPU node-1 (Agent handle: 0x632d71637310) reason :GPU Hang

Any Fix?

pappacena · 2024-04-13T14:43:32Z

@DerRehberg I also have this exact random "GPU Hang" while testing mistral-7b (including my KDE crashing). One thing that helped a little bit was increasing the dedicated memory in BIOS (I can only increase to 2GB, and mistral-7b uses way more than that, but libforcegttalloc allows the extra allocation).

I still have this crash some times, but it appears to be less frequent this way.

winstonma · 2024-04-13T15:18:35Z

@DerRehberg @pappacena Do you think replacing the /lib/firmware with latest version would help? (instruction). At least my laptop get smooth VP9 decode and sleep of death problem is gone after updating the firmware folder. So I personally think it's worth to try.

DerRehberg · 2024-04-13T15:20:55Z

@winstonma I think my firmware is actually up to date
@pappacena My Stable Diffusion crashes but Ollama works with it

winstonma · 2024-04-13T22:41:23Z

The system firmware (including wifi/bluetooth/chipset) is different from the firmware of the motherboard. I would recommend you to take a look. It fixes several problems of my laptop after me not using the linux-firmware (which is 2 years old on Ubuntu Jammy) comes from the system and use the git version. I think there are some advantage using the latest linux-firmware and kernel but still it is up to you.

DerRehberg · 2024-04-14T06:05:55Z

I use Arch @winstonma

pomoke · 2024-04-14T15:50:44Z

Use ROCm 5.7 for Vega-based iGPUs. Versions from ROCm 6.0 will break things as they dropped support for GCN without explicit error message.

DerRehberg · 2024-04-27T17:08:54Z

Well ROCM works with any Software I can imagine on my APU with Version 6. But Stable Diffusion just plainly nicely gives me an out of memory error @pomoke

winstonma · 2024-04-28T07:44:07Z

@DerRehberg It is just a distracting story.

I just installed Ubuntu 24.04 and find out I don't need to install AMD ROCm Driver in order to get everything working.

I also checked your system and it seems to me that you also don't need to install AMD ROCm Driver. It seems to me that you only need to install hip-runtime-amd, compile force-host-allocation and everything should be working without the need of ROCm Driver.

DerRehberg · 2024-04-30T08:40:51Z

@winstonma Well I got that package installed. Still crashes on me

winstonma · 2024-04-30T16:38:23Z

@DerRehberg Yeah this is a distracting story. As I guessed it wouldn't fix your problem but it would make the installation easier as it no longer need to install graphic driver.

BTW I find this article which the user run Stable Diffusion on Ryzen 5600G. I think Google Translate would be your friend.

DerRehberg · 2024-05-01T13:35:47Z

@winstonma Google Translate Page doesn't support Japanese. And honestly. I just generate with my fucking cpu because I don't care anymore. If I use forcegtt my whole system crashes with stable diffusion

DerRehberg · 2024-05-01T13:53:23Z

@winstonma I found an english article, I would love to try it but at the checking script if rocm works I get an libtorch_cpu error. When Updating torchvision-rocm i also get an error: ImportError: /usr/lib/libtorch_cpu.so: undefined symbol: cblas_gemm_f16f16f32

Idk if anything is fucked up rn

DerRehberg · 2024-05-05T10:58:29Z

@winstonma Found it out. I only need to downgrade hip-runtime-amd to 5.7.1 and Stable Diffusion works with libforcegtt even tho the rest is at 6

winstonma · 2024-05-05T13:33:55Z

Great to hear. That matched my experience on Ubuntu 24.04 too. No need to install ROCm package and DKMS modules. To me using the new way doesn't provide any performance benefit, just easier installation.

Just wonder if it fix the problem you faced?

DerRehberg · 2024-05-08T11:17:24Z

@winstonma Well I tried downgrading all packages to 5.7.1 and it crashed on me. Updating everything except runtime worked for me, idk why tho

winstonma · 2024-05-26T02:20:20Z

Just saw Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs. Although @segurac patch works flawlessly, I still think the AMD should provide native support. Hopefully using 6.10 would no longer require additional library.

EDIT: Just ran Linux Kernel 6.10-rc1 on my 6800U laptop. As expected I can run stable diffusion without using custom mod like force-host-alloction-APU or BIOS modification. AMD eventually fix this problem!

lanwatch mentioned this issue Jul 6, 2023

Ryzen 4000 series APU support pt2 #1732

Open

winstonma changed the title ~~How could I allocate more memory to my APU's GPU?~~ How to allocate more memory to my APU's GPU? Aug 20, 2023

winstonma changed the title ~~How to allocate more memory to my APU's GPU?~~ How to allocate more memory to my Ryzen APU's GPU? Aug 20, 2023

winstonma closed this as completed Nov 9, 2023

ekg mentioned this issue Dec 13, 2023

ROCm Segmentation fault (core dumped) with -ngl 1 ggerganov/llama.cpp#2797

Closed

taweili mentioned this issue Mar 9, 2024

[Feature]: Better support for APUs like 5600G/5700G (Mostly work except GPU offloading of LLMs) #2774

Closed

DerRehberg mentioned this issue Apr 27, 2024

GPU Hang on Stable Diffusion segurac/force-host-alloction-APU#2

Open

MaciejMogilany mentioned this issue Jun 3, 2024

Support more AMD GPUs like gfx90c ggerganov/llama.cpp#6110

Open

harkgill-amd mentioned this issue Jun 24, 2024

[Issue]: Failed to find DRM devices after restart #3352

Open

How to allocate more memory to my Ryzen APU's GPU? #2014

How to allocate more memory to my Ryzen APU's GPU? #2014

Comments

winstonma commented Apr 3, 2023 • edited Loading

ye-luo commented Apr 3, 2023 • edited Loading

winstonma commented Apr 3, 2023 • edited Loading

Ristovski commented May 27, 2023

winstonma commented May 28, 2023

asbachb commented Jun 12, 2023

winstonma commented Jun 12, 2023 • edited Loading

asbachb commented Jun 12, 2023

winstonma commented Jun 12, 2023

winstonma commented Jun 12, 2023

randomstuff commented Jun 12, 2023 • edited Loading

winstonma commented Jun 13, 2023

pomoke commented Jul 27, 2023

winstonma commented Aug 14, 2023

winstonma commented Aug 14, 2023 • edited Loading

winstonma commented Nov 9, 2023 • edited Loading

pomoke commented Nov 11, 2023

winstonma commented Nov 13, 2023

gwyllion92 commented Jan 1, 2024

winstonma commented Jan 2, 2024 • edited Loading

gwyllion92 commented Jan 2, 2024

winstonma commented Jan 2, 2024 • edited Loading

Option 1

Option 2

edt-xx commented Jan 2, 2024

gwyllion92 commented Jan 4, 2024

winstonma commented Jan 4, 2024 • edited Loading

winstonma commented Jan 4, 2024 • edited Loading

DocMAX commented Feb 25, 2024

pappacena commented Apr 2, 2024

DerRehberg commented Apr 13, 2024 • edited Loading

winstonma commented Apr 13, 2024

DerRehberg commented Apr 13, 2024 • edited Loading

winstonma commented Apr 13, 2024

DerRehberg commented Apr 13, 2024 • edited Loading

winstonma commented Apr 13, 2024 • edited Loading

DerRehberg commented Apr 13, 2024 • edited Loading

DerRehberg commented Apr 13, 2024 • edited Loading

pappacena commented Apr 13, 2024

winstonma commented Apr 13, 2024

DerRehberg commented Apr 13, 2024

winstonma commented Apr 13, 2024

DerRehberg commented Apr 14, 2024

pomoke commented Apr 14, 2024

DerRehberg commented Apr 27, 2024 • edited Loading

winstonma commented Apr 28, 2024 • edited Loading

DerRehberg commented Apr 30, 2024

winstonma commented Apr 30, 2024

DerRehberg commented May 1, 2024

DerRehberg commented May 1, 2024 • edited Loading

DerRehberg commented May 5, 2024

winstonma commented May 5, 2024

DerRehberg commented May 8, 2024

winstonma commented May 26, 2024 • edited Loading

winstonma commented Apr 3, 2023 •

edited

Loading

ye-luo commented Apr 3, 2023 •

edited

Loading

winstonma commented Apr 3, 2023 •

edited

Loading

winstonma commented Jun 12, 2023 •

edited

Loading

randomstuff commented Jun 12, 2023 •

edited

Loading

winstonma commented Aug 14, 2023 •

edited

Loading

winstonma commented Nov 9, 2023 •

edited

Loading

winstonma commented Jan 2, 2024 •

edited

Loading

winstonma commented Jan 2, 2024 •

edited

Loading

winstonma commented Jan 4, 2024 •

edited

Loading

winstonma commented Jan 4, 2024 •

edited

Loading

DerRehberg commented Apr 13, 2024 •

edited

Loading

DerRehberg commented Apr 13, 2024 •

edited

Loading

DerRehberg commented Apr 13, 2024 •

edited

Loading

winstonma commented Apr 13, 2024 •

edited

Loading

DerRehberg commented Apr 13, 2024 •

edited

Loading

DerRehberg commented Apr 13, 2024 •

edited

Loading

DerRehberg commented Apr 27, 2024 •

edited

Loading

winstonma commented Apr 28, 2024 •

edited

Loading

DerRehberg commented May 1, 2024 •

edited

Loading

winstonma commented May 26, 2024 •

edited

Loading