Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Support Needs Clarifying #123

Closed
ghost opened this issue May 17, 2017 · 15 comments
Closed

GPU Support Needs Clarifying #123

ghost opened this issue May 17, 2017 · 15 comments

Comments

@ghost
Copy link

ghost commented May 17, 2017

It's not clear in the docs whether some recent but not cutting-edge cards, such as my R9 390, can use RocM. The docs. make it unclear whether the cards simply don't work, or if they just don't play well with the CPU/one-another.

I'm currently trying to use ROCm and hipCaffe, and anything CL-related, even clinfo, seems to freeze early and occupy a full CPU core but negligible RAM until killed. I don't know if this is because my hardware isn't supported, or if I'm missing some key component (e.g., I uninstalled AMDGPU-pro, did I need to reinstall another OpenCL driver, or did that come with the ROCm package?).

Could do with some clarification, and I'd appreciate some quick answers also.

@gstoner
Copy link

gstoner commented May 17, 2017 via email

@ghost
Copy link
Author

ghost commented May 17, 2017

Sorry, that wasn't clear; I uninstalled AMDGPU-pro when installing ROCm. Should I take your message to mean that ROCm does include an OpenCL-compatible driver? So, that's not my problem.

Should ROCm work with an R9 390? I bought the card to do deep learning on, based on early support in Torch7 and Caffe. I'd love to use the hipCaffe stack, because compared to Tensorflow it uses less proprietary garbage, and compared to Torch7 it's not Lua. :)

@gstoner
Copy link

gstoner commented May 17, 2017 via email

@ghost
Copy link
Author

ghost commented May 17, 2017

Let me check the status on R390 and ROCm 1.5

Thanks a million @gstoner!

Post ROCm install to get OpenCL on the system you need use
this command
sudo apt-get install rocm-opencl-dev

Have done, but with no success.

I may try booting from a persistent live USB or something, and will attempt a clean install on an unaltered Ubuntu 16.04. That would rule out difficulties with a pre-existing CL platform.

I did just discover that I was using the clinfo binary previously installed by whatever OpenCL procedure I followed with AMDGPU-pro. I ran the clinfo that came with ROCm, and got this result:

cathal@thinkum:~/Projects/scrapers/altreich/misesorg$ /opt/rocm/opencl/bin/x86_64/clinfo -v
terminate called after throwing an instance of 'cl::Error'
  what():  clGetPlatformIDs
Aborted (core dumped)

@gstoner
Copy link

gstoner commented May 17, 2017 via email

@ghost
Copy link
Author

ghost commented May 17, 2017 via email

@scchan
Copy link
Collaborator

scchan commented May 17, 2017

Could you do a uname -a make sure the correct kernel is loaded? You should see compute-rocm-rel-1.5 in the name. I'd also suggest trying /opt/rocm/bin/rocm-smi -a and see if the tool sees the GPU.

@ghost
Copy link
Author

ghost commented May 17, 2017 via email

@scchan
Copy link
Collaborator

scchan commented May 17, 2017

It looks like you didn't pass the -a switch to rocm-smi :)
I'd suggest that you start with /opt/rocm/hsa/sample/vector_copy first to make sure it works reliably.

For HIP/HCC, since the compiler generates ISA code for Fiji by default, you'll have to re-compile your programs with specifying the architecture for Hawaii (i.e. R390).

You could override the default architecture for your card by doing

export HCC_AMDGPU_TARGET=gfx701

more details: https://github.com/RadeonOpenCompute/hcc/wiki#compiling-for-different-gpu-architectures

@ghost
Copy link
Author

ghost commented May 17, 2017 via email

@ghost
Copy link
Author

ghost commented May 17, 2017 via email

@ghost
Copy link
Author

ghost commented May 18, 2017

OK, breakthrough, kinda:

  1. Set up new hard drive with fresh Ubuntu 16.04 LTS install.
  2. Follow installation instructions here as far as checking vector_copy.

With the provided envvar you suggested, this now builds and runs. And. the ROCm clinfo program runs. So that's great!

Problem: This is all done in TTY1, because after installing rocm and restarting, the login screen won't display. I get a white screen with pretty pastel specks all over it. TTYs will load, but if I try to go to the login screen, I just get an unchanging white/cream field with pastel speckles. :/

I'll build Caffe later and see how that goes, meanwhile I need to boot back into my actual harddrive. Though, the AMDGPU-pro -> ROCm transition seems to have destabilised that install, so I think I'll be reinstalling soon!

@ghost
Copy link
Author

ghost commented May 21, 2017

Giving up on this, at least for now. It looks like ROCm is being developed with very limited card support right now, and I need a stable system more than I need Tensorflow.

Thanks anyway..

@ghost ghost closed this as completed May 21, 2017
@gstoner
Copy link

gstoner commented May 22, 2017

@cathalgarvey

Sorry for the trouble with the R9-390, ROCm is more optimized for GFX8 and newer GPU which support Atomics. GFX7 had a number of limitation relative to newer hardware. For your purpose would of running Tensorflow RX480 be better GPU. If you let me know your contact info privately, we can get one out to you for testing.

@ghost
Copy link
Author

ghost commented May 22, 2017

Wow, that's really generous @gstoner - I couldn't turn down an offer like that. :)

I'll email your HSA email, very happy to be a guinea pig for Open Source GPU deep learning!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants