-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When will ROCm be available with standard linux kernels? #404
Comments
Changing the title because I understand what ROCm is a bit more now. My big problem is the reliance on kernel 4.13 is a bit tough for me to deal with as of yet. When will ROCm be a little more kernel version agnostic and support stuff like kernel 4.16 or kernel 4.17, or when will the rock-dkms stuff be mainlined into the linux kernel and ROCm work with a standard linux kernel? |
By the 4.18 Linux kernel is the first Linux kernel where all the AMDGPU and KFD bit are upstream to support the ROCm userland and make it agonistic as you say. This moment forward you have standard Linux kernel that can support Vega10, FIJI and Polaris GPU's. One thing to remember prior to DKMS solution we doing today we bootleg a Linux kernel under Ubuntu. It how my team internally is running Ubuntu 18.04 right now with ROCm userland. RIght now I think you find the ROCm userland will compile in OpenSuSE, you want to use 4.18 Linux kernel |
One thing DKMS is really not ROCm thing nor is the KCL, it how Linux driver team want to target OS when they may need to backpatch kernel. KCL, Kernel Compatibility Layer is what limit which Kernel are supported. Note AMDGPUpro has the same issues since it DKMS based and uses KCL. It why I am looking forward to upstream Linux support for based driver components needed for ROCm Userland. Upstream is a much longer process to get the all core foundation in place. but we close to closing that chapter in ROCm history. |
@gstoner I have a rx550 polaris card. It sounds like kernel 4.17 could possibly support it. How would I determine this? Is there a specific message emitted by amdkfd to look for? Or do I just need to wait for 4.18? If 4.17 will do the job for me right now, then as soon as it's available in tumbleweed I'll compile the rocm 1.8 userspace and get testing. |
@gstoner so no way to test that yet? I compiled https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.18 and installed all the rocm packages, but unless I setup the dkms, i get nothing from clinfo. |
@gstoner I've tested with kernel 4.17, and installed rocm-smi, and unfortunately I see no evidence that I can use ROCm with kernel 4.17 without the rock-dkms package (which doesn't work with anything but kernel 4.13) |
@markehammons thats the conclusion I've come to also. I am running 6 vegas on amdgpu-pro 18.20-579836 without DKMS however.
|
4.18 Linux kernel has the key feature you guys want amdgpu: amdkfd: |
@gstoner I built the drm-next-4.19-wip kernel, and there is no clinfo output without rocm dkms installed. So are those pieces not in drm-next-4.19-wip yet? |
@rhlug We have to look at the Thunk it most likely where the mismatch is happing |
can that make rocm 1.9 or is it too late? |
I will talk to the Linux driver team
Greg
|
Or if rocm-dkms in 1.9 works, and doesnt break powerplay, i dont mind using it at all. But disconnecting from dkms would be ideal. |
@rhlug Yes, I am not DKMS fan. Let me see what we can get out of the team for 1.9 |
A ROCm kernel maybe? 4.17 variant like 4.11 will be 💯 |
@shimmervoid that doesnt necessarily fix any problems with powerplay if you still have to load a dkms with issues against that rocm 4.17 kernel. The dkms overloads the kernels powerplay with the dkms powerplay (ie amdgpu-1.8-151/amd/powerplay/hwmgr/vega10_*). So you potentially replace a good powerplay (ie 4.17rc2) with a broken one (ie rocm-dkms 1.8-151). And if you dont load a dkms, and you dont have an update thunk like greg mentions, you have no opencl devices detected at all. And when I say powerplay, I actually mean just pp_table, because most of it works (sclk/mclk levels, overdrive, etc)... but only way to undervolt is via pp_table, and thats where it currently has issues. @gstoner i'll be happy to test any beta if you need. |
when is rocm-dkms 1.9 being released |
With the mainline kernel 4.18.0-rc2, WIP Thunk branch ("fxkamd/drm-next-wip") and 1.8.x branches of ROCR-Runtime and ROCm-OpenCL-Runtime I get the following results:
Examples:
Tested on Gentoo Linux (Ryzen 7 1800x, Radeon 560). |
@justxi : Thanks for your hints. I followed your steps on Manjaro (kernel 4.18.0-rc2) as well as on Solus (kernel 4.17.2). Checkout of ROCT-Thunk-Interface branch ("fxkamd/drm-next-wip") and 1.8.x branch of ROCR Runtime. When compiling ROCR Runtime I got the error that HSA_ENGINE_VERSION uCodeEngineVersions in hsakmttypes.h was not defined. I did some ugly merges of hsakmttypes.h and topology.c with master of ROCT-Thunk-Interface to get this typedef and finally I got ROCR compiled. I tested vector_copy sample with result: Did you face similar problems ? My plan is to get at least the ROCm-docker container to run. Do you know what else besides ROCT-Thunk-Interface and ROCR is needed ? When I start the container and run rocminfo I get: |
Is there no way to patch the rocm 1.8 code and build locally so we can replace with a patched fixed version.i don't mind if that's the case.if so has anyone got the patched code or how to find it |
I put 18.04 bionic with version of rocm 1.8.2 up That I have working with the 4.16 with the kfd, I replaced the dkms package in the rocm installer with a dummy package and it is working on the live cd. I also patched CodeXL's pwrdriver so it also works. |
+1 for support on standard kernels. |
You will be looking forward to that for a very very very long time. Here is
a quote from mine to the ceo of redhat from awhile ago.
https://fishbowl.pastiche.org/2003/10/27/redhat_can_bite_my_shiny_metal_ass/
…On Wed, Jul 11, 2018 at 11:39 PM, Ari ***@***.***> wrote:
+1 for support on standard kernels.
Looking forward to seeing Fedora 28 and above working with ROCm for OpenCL
acceleration of darktable and gimp. Still resorting to manually extracting
the opencl bits of amdgpu-pro to get opencl to work with my RX560 (luckily,
at least this method still works)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#404 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AWn0wx1wAYSmy7vjCi4PlhJDok3dDjyDks5uFhw3gaJpZM4T0nD6>
.
|
So, I will happily provide a tar.gz for you to test. There will be no rpms
from me.
…On Wed, Jul 11, 2018 at 11:43 PM, Jason Kurtz ***@***.***> wrote:
You will be looking forward to that for a very very very long time. Here
is a quote from mine to the ceo of redhat from awhile ago.
https://fishbowl.pastiche.org/2003/10/27/redhat_can_bite_my_
shiny_metal_ass/
On Wed, Jul 11, 2018 at 11:39 PM, Ari ***@***.***> wrote:
> +1 for support on standard kernels.
> Looking forward to seeing Fedora 28 and above working with ROCm for
> OpenCL acceleration of darktable and gimp. Still resorting to manually
> extracting the opencl bits of amdgpu-pro to get opencl to work with my
> RX560 (luckily, at least this method still works)
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#404 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AWn0wx1wAYSmy7vjCi4PlhJDok3dDjyDks5uFhw3gaJpZM4T0nD6>
> .
>
|
lol. No RPMs would be fine - building rocm should be OK, as long as there is no need to build a custom kernel and assuming other dependencies can be met by installing standard fedora packages. |
The 550 will work fine, In fact it will an can work more then fine it can xfire, at the current time with 550's and even across to the radeon drivers. I have both the amdgpu and the radeon drivers running at the same time being virtualized :) |
Here is a nice shot of the bios editor running https://www.techpowerup.com/download/techpowerup-radeon-bios-editor/ |
I'm a little lost. So it will work fine with polaris 12 cards? In that case what am I doing wrong that KFD is rejecting my card? Is there anything different between an rx 550 and a mobile rx550? |
@saitam757 No I had not such problems. |
I'll check the code, but my recollection was that RX550 was not supported in the ROCm stack. AFAIK the bigger issue you are running into re: running upstream kernels is that the user/kernel interface for upstream is a bit different, so you need a modified thunk as well. Felix published a tree at the same time he pushed the kernel code - I'll see if I can find it and if not check with Felix on Monday. |
gfx804 for the 550
…On Mon, Jul 30, 2018 at 12:37 AM, johnbridgman ***@***.***> wrote:
I'll check the code, but my recollection was that RX550 was not supported
in the ROCm stack.
AFAIK the bigger issue you are running into re: running upstream kernels
is that the user/kernel interface for upstream is a bit different, so you
need a modified thunk as well. Felix published a tree at the same time he
pushed the kernel code - I'll see if I can find it and if not check with
Felix on Monday.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#404 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AWn0w09PAObKp1I6mdydfyo0FTb3SGkoks5uLeTPgaJpZM4T0nD6>
.
|
Felix's WIP thunk stack is here: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/fxkamd/drm-next-wip I confirmed that the commits to align with upstream IOCTLs are there, however I have not tested this against upstream kernel code myself. |
FWIW there is some activity on the Fedora / rawhide front, the rocm-runtime package was accepted for Fedora 29: https://src.fedoraproject.org/rpms/rocm-runtime I guess that implies that Fedora's kernel now has the required bits. I do notice though the release of ROCM they are currently building is 1.6 rather than 1.8 |
Tested some HC/C++AMP samples on ~stock (4.18.5-arch1-1-ARCH) with the fxkamd/drm-next-wip branch on an RX480. They seem to work, pass their CPU verify step. The OpenCL runtime, on the other hand, is segfaulting somewhere between |
Hi! I am trying to use amd-staging-drm-next branch to work with amdkfd (built into amdgpu) for the AMD Instinct MI25 device. As a first step I compiled libhsakmt 1.8.x and tried to run kfdtest. But it produces lots of failures (see below). ... Does it mean that current amdkfd from the kernel cant be used with libhsakmt 1.8.x? or I am doing something wrong... Best, |
We are planning on releasing ROCm 1.9 very soon (today, I think) which will update all of this. The Thunk and amdkfd are tightly coupled, so changes in the KFD can affect how the Thunk interacts. I would not expect that the Thunk for 1.8 would work with the latest KFD (roughly for 1.9). There is a roc-1.9.x branch in the ROCt (Thunk -- libhsakmt) github tracker; you can try building that, perhaps. |
ROCm 1.9.0 was just released and should allow ROCm to work with the upstream kernel. See README.md for more information. |
This is great, but apparently I still can't test it cause my card is not supported. Apparently Lexa cards have no kfd support yet. |
Polaris 12 (Lexa) is not currently on our list of supported GPUs. We are focusing most of our official development efforts on our more powerful GPUs, such as Polaris 10, Vega 10, and upcoming chips. While I appreciate the desire to use GPUs such as Polaris 12 with the ROCm stack, our team must focus our limited amount of resources where we think they will have the most impact. That isn't to say that we don't want to get Polaris 12 working in ROCm -- only that at this time we do not offer official support for it. I can't tell you anything about if such support would be added in the future. Your request has been heard, however. That said, I'm going to close this specific issue because I believe, with the release of ROCm 1.9.0, this software should work with standard (upstream) Linux kernels. |
Question for those using Ubuntu 18.01LTS (stock kernel 4.15). Does installing rocm 1.9 require replacing the distro kernel with a patched one like in prior rocm releases? |
I may be misunderstanding your question, but I believe our current release just rebuilds a custom amdgpu/amdkfd/amdkcl module using DKMS. We do not require a custom kernel anymore, like we did back in the ROCm 1.6 days. This was also true for ROCm 1.7 and 1.8, so I hope I'm understanding your question correctly. You can choose to use the ROCm user-level code with upstream kernels 4.17 and above. In this case, you don't need to install any kernel-level changes. However, if you want to install rock-dkms (our custom modules, described in the paragraph above), then you will get more features that you may find interesting. Basically, ROCK includes our most up-to-date changes, most of which we are trying to get upstreamed into the Linux kernel. Because upstreaming takes time, we release these new features into ROCK while we wait. This is especially useful for folks who don't want to run bleeding-edge kernels that are outside of their distro's stock kernel list. So yes, you should be able to run ROCm 1.9 with a stock Ubuntu kernel. For example, I'm running it on Ubuntu 18.04.1 LTS with 4.15.0-34-generic right now.
|
@jlgreathouse thanks for the fast response. Indeed you got the question right, I see I was assuming wrongly that the patched kernel was still needed as of 1.8. Question - I see via dkms you had a new amdgpu driver built, does this change OpenGL (in addition to enabling OpenCL) in any way with respect to the distro-provided amdgpu driver? ( i.e. any risk of app compatibility issues) If the answer is yes, would doing this for and OpenCL-only install:
... eliminate that risk? |
Hi @arigit I can't speak much towards app compatibility issues with respect to OpenGL. I suppose I should ping @kentrussell to get the driver team's feedback on this question. However, it is the case that ROCK is essentially a replacement Doing the commands you listed would not eliminate this risk, as |
@jlgreathouse understood 100%, thanks for the clarity. For the prod environment I will wait a few more months for the next distro release and jump to rocm at that point |
Hi, I want to install ROCm in ubuntu 18.04 running the latest kernel (upstream). Reading the documentation I see that I should not install the rock-dkms package. How can I install ROCm without installing that package? |
This post may help you do that. |
Great, thanks! |
@markehammons I've switched up the tags because, as of the 2.0 release, "Polaris 12" should be be enabled in ROCm. I just sat down and tested a batch of OpenCL, HCC, and HIP applications with a Polaris 12 board, and things appear to be working as expected. Note that you will need to be on a distro that supports our rock-dkms driver to have this support, since the last bit that needed to be in place was a driver change. Support for this is also in the amd-staging-next drivers, but will not hit upstream Linux until post-4.20. OpenSUSE support is not yet official. However, you might try to keep an eye on the new Experimental ROC repo, which includes build and install scripts for a variety of distros, including ones that AMD does not officially support. I would like to get OpenSUSE working (or get the community's help to get OpenSUSE working). |
@markehammons I'm eager to get ROCm to run on OpenSUSE. Maybe I can give it try. Currently I'm using the compute stack inside a docker container but a "native" support would be nice. |
@jlgreathouse Sorry, write my previous post to the wrong person. (-: Please see my post here: |
Just a short heads-up due to the lack of a better place: Tom Stellard is currently packaging ROCm 2.0 for Fedora but it will take some more time (packaging the whole eco system is quite a challenge, need to remove some hard coded paths etc). |
Or at the very least, when will amdgpu-pro drivers be more widely available? I'd love to test ROCm on my rx550 with i7-8550u on opensuse-tumbleweed, but it looks like my system wouldn't be compatible and I'm lost on how to build the toolchain or if I could even get stuff set up without a custom kernel.
The text was updated successfully, but these errors were encountered: