-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to allocate more memory to my Ryzen APU's GPU? #2014
Comments
An alternative route is to change the DMA buffer to 2GB in your BIOS. |
Thanks. I tried to find this setting inside BIOS with no luck. It seems to me that AMD APU memory setting can be modified inside the windows but I have no knowledge of the corresponding configuration within Linux. |
It appears ROCm does not take into account dynamic VRAM GTT allocation on APUs (handled by amdkfd?). For example on my system:
This means the iGPU could allocate up to 16GB of VRAM, however due to the small dedicated VRAM (64MB) that is reserved for the iGPU, ROCm processes fail to allocate memory. This also seems to propagate to OpenCL:
This is NOT the case for OpenGL tho (snippet of
|
@Ristovski Thanks for the answer I think this is a needed feature (otherwise all the APU would be ROCm-compatible but it doesn't work due to the memory allocation problem). I am not sure if this should be fixed on the driver level or the ROCm level (or both). |
This bug is indeed super annoying. Most laptop models only provide limited settings regarding UMA buffer size. So ROCm is quite limited or useless for APUs on Linux. |
Thanks @asbachb I plan to just use my laptop just to do some primitive testing with a subset of data. I didn't plan to some fancy calculation with all the data. But since I am using Ryzen CPU I think I could still get acceleration using the GPU with the subset. Current I guess everything is working except the memory allocation. |
Just for reference some guy got some quite good results: (compared to CPU only): https://www.gabriel.urdhr.fr/2022/08/28/trying-to-run-stable-diffusion-on-amd-ryzen-5-5600g/ But he is able to set his VRAM to 16GB. |
@asbachb Sorry I forgot to answer the question in your post. First of all thank you very much. I read the post and the author said that he modified the memory sharing behavior in BIOS. Which I don't have this settings in my laptop. Also in the previous answer the windows application sees the total GPU memory, not just the dedicated memory. |
@winstonma, I'm the author of the "Stable diffusion on an AMD Ryzen 5 5600G" post. Yes, sadly, I did not find any way to increase the VRAM allocated to the iGPU without going to the BIOS/motherboard firmware. |
@randomstuff Thanks. But sadly the BIOS of my laptop doesn't have the VRAM allocation option. So I wonder if there are ways to increase the dedicated memory on system level (similar to Regedit in Windows) |
PyTorch will work with replaced allocator which uses |
For all who is interested in increasing dedicated memory in system, I wrote an article Unlocking GPU Memory Allocation on AMD Ryzen™ APU and thought that method I could adjust the amount of dedicated GPU memory, even the BIOS doesn't provide that option. |
Thanks. I currently using alternative ways to unlock the BIOS, modify the amount of GPU dedicated memory to 8GB. After that I could run stable-diffusion-webui using the above method without any problem. Also I cloned your repo, created
However when I start generating image I got the following error:
I guess it's due to the fact that CUDAPluggableAllocator couldn't provide all API which is required by the application yet. And thus no image is being generated. I am not sure if I miss anything. @pomoke not sure if you feel free to try running stable-diffusion-webui on your APU and see if that works. Thanks Finally, I guess this is the memory auto-allocation should be the right direction. In Windows the system would auto-allocate the system memory to the GPU based on the software used. I guess PyTorch/ROCm should be talking to the system to auto-allocate a share of system memory to the GPU on demand. I just filed a feature request in PyTorch, really wish they could request GPU memory from AMD APU. |
In conclusion, there are several solutions for other AMD APU user
|
Generally, we still need something like what we get on Tegra (Such as Jetson Developer Kit). CUDA on Tegra is capable of allocating GTT properly. |
Yes I think the problem now is the torch-apu-helper shows that PyTorch could grab system memory from the system as video memory. However I think it still need more PyTorch API porting so we can say this way is PyTorch ready. Not sure AMD or PyTorch would make this possible in the future. |
Hi, I'm trying to follow your guide https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056 but when I run this command SA_OVERRIDE_GFX_VERSION=10.3.0 python test-rocm.py I get this message: So I changed the command to HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 test-rocm.py But I get this message now: I can't find any solution, and I'm not a programmer person. Thanks for your time! |
First of all, you need to make sure that you are
For PyTorch installation you need to check the PyTorch Official website. Here are the steps:
|
Hi, thanks for your fast response. I run the command and I get this message: So that means that I'm stuck with 512mb vram? (My bios don't have the option to modify it ). |
Option 1Here is the official installation guide from AMD. I think it should work for everybody Option 2# Install AMD Driver
TEMP_FOLDER="/tmp"
TEMP_DRIVER_HTML="amd-driver.html"
DISTRO_CODENAME=$(lsb_release --codename --short)
# Find the package URL from AMD website
AMD_DRIVER_URL="https://www.amd.com/en/support/linux-drivers"
URL_RESPONSE=$(wget -U 'Mozilla/5.0' -qO- ${AMD_DRIVER_URL})
AMD_DEB_URL=$(echo $URL_RESPONSE | grep -o 'https://[^ "<]*.deb' | grep $DISTRO_CODENAME | head -1)
FILENAME=$(basename $AMD_DEB_URL)
# Download and install the driver package
wget -P $TEMP_FOLDER $AMD_DEB_URL
sudo dpkg -i $TEMP_FOLDER/$FILENAME
rm $TEMP_FOLDER/$FILENAME
amdgpu-install -y --usecase=rocm After the AMD Graphic driver is installed, you can run the test script again. You can also find this snippet inside the document that I wrote. By the way I suggest you running the following command first, I would like to ensure your APU could be supported $ lscpu | grep "Model name" |
If you use the mesa rusticl opencl driver you will find that it uses the GTT memory as well as the VRAM. |
Hi, after I run amdgpu-install -y --usecase=rocm , I get this WARNING: amdgpu dkms failed for running kernel |
I guess the reason the AMD Graphics Driver doesn't officially support default Ubuntu kernel or your kernel is too old. I think you should run the following step to install the latest default Ubuntu kernel: sudo apt update
sudo apt install linux-generic-hwe-22.04 Then reboot your system, and wait for the GRUB boot menu comes up. Once in the GRUB menu, select the Then go back to terminal and check what kernel is running on your system (this is my output to let you know what is the expected output): $ awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg
Ubuntu
Ubuntu, with Linux 6.6.8-zabbly+
Ubuntu, with Linux 6.6.8-zabbly+ (recovery mode)
Ubuntu, with Linux 6.2.0-39-generic
Ubuntu, with Linux 6.2.0-39-generic (recovery mode)
Memory test (memtest86+x64.efi, serial console)
Windows Boot Manager (on /dev/nvme0n1p1)
UEFI Firmware Settings The installed kernel is listed between Ubuntu and the extra line (so in my system it is The default system for installing AMD Graphic Driver should have the output similar to this: $ awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg
Ubuntu
Ubuntu, with Linux 6.2.0-39-generic
Ubuntu, with Linux 6.2.0-39-generic (recovery mode)
Memory test (memtest86+x64.efi, serial console)
Windows Boot Manager (on /dev/nvme0n1p1)
UEFI Firmware Settings Only one default Ubuntu kernel. After you make sure everything is fine. Then reinstall the AMD Graphics Driver amdgpu-uninstall
amdgpu-install -y --usecase=rocm After AMD Graphic Card driver is installed. You can install back the kernel of your choice (or you can stick with the default kernel which is fine). AMD driver developer always stated that their driver are tested (only) on the default system (specific version of Linux running the default kernel). Therefore installing or upgrading the AMD Driver is a pain on my ass because I am not using default kernel. I need to boot to the default kernel, remove the custom kernel, remove the old graphic driver, install the new graphic driver, install back the custom kernel, reboot back to the custom kernel. It's really painful. |
Just wonder if Pytorch use GTT if I use the mesa rusticl opencl driver. If mesa rusticl opencl driver doesn't work with Pytorch don't use GTT then I still have to use this BIOS modification method to make Pytorch (or Stable Diffusion in my case) work. |
Doesn't work on ollama :-( |
@segurac great job! I've done some tests, and your memory allocator worked quite well! To make it easier for future travellers, I've created this PyPI package with your code, so anyone can overcome the memory limitation by running this:
After that, I'm able to allocate way more than the 512MB dedicated to my Radeon 760M iGPU with something like this:
|
Can someone give me a command line tutorial to use more than 2GB with Stable Diffusion? I managed to run it and it almost generated the picture. But I only got 2GB of VRAM and it doesn't have enough video memory to finish the picture. I'm alreadys thinking about download windows to allocate more if possible. It stops right before I can see the finished images. Newbie on Arch here, I'm happy than I can run Ollama with APU Support, would love Stable Diffusion too. My System seems to use almost 1 GB of it, even after a blank restart. |
@DerRehberg Which CPU are you running? Could you print the result of the following command? This is the output from my system:
|
@winstonma AMD Ryzen 7 5825U Currently testing generating in CPU Mode, but it's 30 times slower, atleast it feels like it |
@DerRehberg Based on your output your system I think you dedicated 2GB of memory to BIOS. Could you check one more thing? I would like to see the GTT memory allowed.
I think you need to first disable the dedicated memory. Please use feel free to read Unlocking GPU Memory Allocation on AMD Ryzen™ APU? - Prepare the bootable USB drive then go to the next section and set it back to AUTO
After set it back to auto then you can use force-host-alloction-APU with Pytorch to run stable diffusion on your Graphic Card. Please leave message if you still have question. |
@winstonma [ 5.278774] [drm] amdgpu kernel modesetting enabled. I wouldn't even know how to compile |
@DerRehberg I think you should have 10GB of memory. But you better # Replace your PyTorch CPU version with PyTorch ROCm version
$ pip uninstall torch torchvision torchaudio
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
# Go to your Stable Diffusion folder
# Compile the library
$ git clone https://github.com/segurac/force-host-alloction-APU.git
$ CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc force-host-alloction-APU/forcegttalloc.c -o force-host-alloction-APU/libforcegttalloc.so -shared -fPIC
$ sudo mv force-host-alloction-APU/libforcegttalloc.so /usr/local/lib
$ rm -rf force-host-alloction-APU
# Check your HSA_OVERRIDE_GFX_VERSION value
$ rocminfo | grep gfx
Name: gfx1030
Name: amdgcn-amd-amdhsa--gfx1030
# Run stable diffusion (Please update HSA_OVERRIDE_GFX_VERSION based on the previous output)
$ LD_PRELOAD=libforcegttalloc.so HSA_OVERRIDE_GFX_VERSION=10.3.0 python launch.py |
@winstonma Pip uninstall says I should use the AUR xd LD_PRELOAD=libforcegttalloc.so python launch.py × This environment is externally managed
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages. |
@winstonma Why can't it load libforcegttalloc? EDIT: Got it working by using the full path, but it crashes my whole apu, my kde desktop and stable diffusion Any Fix? |
@DerRehberg I also have this exact random "GPU Hang" while testing mistral-7b (including my KDE crashing). One thing that helped a little bit was increasing the dedicated memory in BIOS (I can only increase to 2GB, and mistral-7b uses way more than that, but libforcegttalloc allows the extra allocation). I still have this crash some times, but it appears to be less frequent this way. |
@DerRehberg @pappacena Do you think replacing the |
@winstonma I think my firmware is actually up to date |
The system firmware (including wifi/bluetooth/chipset) is different from the firmware of the motherboard. I would recommend you to take a look. It fixes several problems of my laptop after me not using the linux-firmware (which is 2 years old on Ubuntu Jammy) comes from the system and use the git version. I think there are some advantage using the latest linux-firmware and kernel but still it is up to you. |
I use Arch @winstonma |
Use ROCm 5.7 for Vega-based iGPUs. Versions from ROCm 6.0 will break things as they dropped support for GCN without explicit error message. |
Well ROCM works with any Software I can imagine on my APU with Version 6. But Stable Diffusion just plainly nicely gives me an out of memory error @pomoke |
@DerRehberg It is just a distracting story. I just installed Ubuntu 24.04 and find out I don't need to install AMD ROCm Driver in order to get everything working. I also checked your system and it seems to me that you also don't need to install AMD ROCm Driver. It seems to me that you only need to install hip-runtime-amd, compile force-host-allocation and everything should be working without the need of ROCm Driver. |
@winstonma Well I got that package installed. Still crashes on me |
@DerRehberg Yeah this is a distracting story. As I guessed it wouldn't fix your problem but it would make the installation easier as it no longer need to install graphic driver. BTW I find this article which the user run Stable Diffusion on Ryzen 5600G. I think Google Translate would be your friend. |
@winstonma Google Translate Page doesn't support Japanese. And honestly. I just generate with my fucking cpu because I don't care anymore. If I use forcegtt my whole system crashes with stable diffusion |
@winstonma I found an english article, I would love to try it but at the checking script if rocm works I get an libtorch_cpu error. When Updating torchvision-rocm i also get an error: ImportError: /usr/lib/libtorch_cpu.so: undefined symbol: cblas_gemm_f16f16f32 Idk if anything is fucked up rn |
@winstonma Found it out. I only need to downgrade hip-runtime-amd to 5.7.1 and Stable Diffusion works with libforcegtt even tho the rest is at 6 |
Great to hear. That matched my experience on Ubuntu 24.04 too. No need to install ROCm package and DKMS modules. To me using the new way doesn't provide any performance benefit, just easier installation. Just wonder if it fix the problem you faced? |
@winstonma Well I tried downgrading all packages to 5.7.1 and it crashed on me. Updating everything except runtime worked for me, idk why tho |
Just saw Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs. Although @segurac patch works flawlessly, I still think the AMD should provide native support. Hopefully using 6.10 would no longer require additional library. EDIT: Just ran Linux Kernel 6.10-rc1 on my 6800U laptop. As expected I can run stable diffusion without using custom mod like force-host-alloction-APU or BIOS modification. AMD eventually fix this problem! |
I am running AMD 6800U on my Ubuntu 22.04 and I installed the AMD driver. I checked that the default system would allocate 512MB RAM to VRAM to the GPU.
I followed some instruction from other github issue to create a rocm/pytorch docker image and it has no problem detecting my GPU but it has problem running sample program, due to
OutOfMemoryError
.So my guess is ROCm support APU but I just need to allocate more system memory to my GPU before going into the docker environment. Are there any people who know how to modify the memory allocation of AMD APU? Thanks in advance
The text was updated successfully, but these errors were encountered: