-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in build step: Inconsistency detected by ld.so #68
Comments
I'm happy to help you figure this out, but I need you to post what you tried to do, where it failed, and the errors and stacktrace you got. Otherwise I don't know what doesn't work. |
I went through this documentation again: The code I tried to run:
The results:
That is why I tried:
Which I guess is somehow affected by LLVM? I tried to change the ld.lld package but I just don't understand what is going on, so I don't know if it was necessary. Changed the permission of the /dev/kfd and tried to set LD_LIBRARY_PATH without success. |
Ahh yes I recall you showing me this error. This is almost definitely not an issue with Julia (as far as I can tell), but an issue with one of your ROCm libraries. You can try adding some Lines 138 to 180 in f681252
|
Please retry this with AMDGPU#master, we've merged support for ROCT and ROCR artifacts, so this might help alleviate this error. |
Hey! I till try it tomorrow! Thank you for your efforts! :) (I check the package and follow its update btw, just still didn't know if I could try again, great notification!) :) |
I reinstalled everything by following these steps:
So it definitely improved, I could allocate arrays! For the code:
I get the following error message:
Is this error message somewhat helpful for you? |
Can you pull AMDGPU again? I don't know how you ended up in this situation (that version of AMDGPU shouldn't allow you to use GPUCompiler 0.10), but the latest master of AMDGPU has explicit support for GPUCompiler 0.10. |
I pulled AMDGPU again and now:
After running the command I don1t get the aritmetics to perform, so:
Running |
So you should always do |
I didn't select any GPU and yeah it hangs forever. |
You don't necessarily need to set a GPU; AMDGPU selects the first available GPU for you, which you can see with agents = AMDGPU.agents()
AMDGPU.DEFAULT_AGENT[] = agents[2] # Make the 2nd GPU the default The reason I asked what GPU you're using (or really, what GPU AMDGPU selects for you) is that you could be using an unsupported GPU, which can possibly hang when trying to use it. I've had that happen with my Raven Ridge integrated GPU. |
Hey,
I selected GPU 3 to test an other one as you described and it worked I think. Runnig this:
Interesting error, now I don't know what can be the problem? |
Hmm, I've never personally tested any of the gfx906 chips, although they should probably work. You might consider updating your Linux kernel and making sure you're on the latest ROCm packages (currently we distribute ROCm 3.8; we should probably get those updated to 4.0). What value of |
I used N=32, but yeah I got a strange error at bigger values. :D For me I can't believe different chips can produce this big deal. It would be a hell of work if there is no common interface for them. :o Do you think I should update to 4.0 then to solve? |
Yeah, that one's on me, it really should have been an error. Just remember, ROCm is basically beta software right now (even though they're on version >4.0). Bugs and broken configurations are easy to stumble upon. I never asked you, what Linux Kernel version are you using? |
I think:
I will try to install this 4.0.1 but I see this is not a one click for now and don't want to restart my whole workspace due to a lot of other work. I will do it in the end of week maybe! :) |
That kernel is probably recent enough for most cards, but the VII might be too new to work on that kernel. I'd consider upgrading to something newer, if you can. |
Hey, I am really suprised since I don't know how did I install 4.0 now I didn't find any description for installing specified versions. But now everything seems working after another fresh install. Weeeellll I guess installing the master branch and the 4.0 rocm-dkms maybe solved then now. :) Well done! ;) Ok let's check speeds!! :D |
I just write my ideas to improve as I face the problem during writing a basic speed testing code. Sorry to use this thread:
but of course this isn't that nice as it would be and could be a little bit more convenient if possible, so it would be easier to descover for anyone. Also this is the function that will be used a lot in the beginning so it could improve the start for every single developer. :)
Sadly I couldn't make the c .= a .+ b to work parallel with the kernel function. But I made this... It works but slow because I couldn't make to work in parallel
I guess it does it somehow wrong, I didn't get how does this work now. :) But really great work all in all, I just added some note so you can see how do the beginner fails based on the documentation. |
So that's my fault: I should have documented that
I don't recommend using
Fixed, thanks!
As pointed out in the docs (near the end of https://juliagpu.github.io/AMDGPU.jl/stable/quickstart/#Running-a-simple-kernel), you need to
Can you elaborate on what you mean by this? Are you thinking that we should have a function that lets you see all the useful information about a thread's location in a kernel, which can then be printed? If so, I agree, and I'd be happy to accept a PR that implements this 🙂
I appreciate the enthusiasm! I think what would be best the ability to pass
Issue filed; feel free to help add these docs if you know how grids and groups work (they're the same as OpenCL workgroups and grids).
That's no longer the case on v0.2.3, maybe you need to update AMDGPU.jl?
Try this: function vadd!(c, a, b)
idx = (workgroupDim().x * (workgroupIdx().x-1)) + workitemIdx().x
c[idx] = a[idx] + b[idx]
nothing
end
@time wait(@roc groupsize=min(1024,length(c_d)) gridsize=length(c_d) vadd!(c_d, a_d, b_d)) I get
Thanks for reporting all of these! If you get the chance to help fix some of these issues, I would greatly appreciate it 😄 |
Oh I see soo grid size is the size of the aritmetic operation WOW! Very cool! That sounds really effective! I realised i used @rocprintf so sorry for the typo. For groupsize=auto is great, maybe it would be nice to consider making it default. Also if you ever used @Everywhere [workers list] fn(), it would be nice to be able to specify the device maybe like "worker", but I know this is harder then I say. I am workin on a company I think it is more beneficial if I try to build a team and adapt AMDGPU with our open packages. Also I think we have the best machine learning library on the way adapting AMDGPU support would be crazy for that. I know there is Flux Zygote and many more out there but hey all have seruous hard time and limitations because of the core. On the figring out the environment topic, what you described is really nice I would just be satisfied to have a 10 liner code that shows all the information I have during a kernel run. Just a whole example that shows everything from my runtime environment, so wit running the cose and reading the doc I could learn the whole kernel programming in 15 seconds and understand the details. :) I know this is just an idea, what could be the best way to simply make the whole AMDGPU simple and easy to learn. Btw I would be glad if you could tell me if you think it possible to do 10,100 kerel operation in a row to have a syntax like this: (Sent from mobile) |
Hey, It is interesting to see the timing of the example code you wrote is similar in my case too. If everything measures right then this is 10x speedup ATM. I think it would be a good idea to update the basic example to this one you wrote here. It explains a lot also and shows the way how this whole system work. Also what do you think about the "cuarray" approach of defining the aritmetic operations between ROCArray-s? Is that possible like CUDA did? I feel like the broadcasting could allow the gridsize to work in case of aritmetic redefining. I am just asking if this all is possible in the future, because that would allow ROCArray to be a 1on1 replacement for Array? |
Closing since this issue meandered over too many unrelated things; further discussion can continue on Discourse, or specific issues should be filed separately. |
Dear @jpsamaroo,
I want to know what is the best source to setup this library? I tried to follow the basic setup mentioned in the docs, which is a little bit confusing for me and I don't know if am missed something or not but trieing hard for a hour already without success. 😞
Is it possible to ask a cleaner install instruction list?
I would be glad if I could use it, it looks damn promising!
The text was updated successfully, but these errors were encountered: