Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Running caffe on AMD GPUs #2

Closed
gsedej opened this issue May 12, 2017 · 12 comments
Closed

Running caffe on AMD GPUs #2

gsedej opened this issue May 12, 2017 · 12 comments

Comments

@gsedej
Copy link

gsedej commented May 12, 2017

Hello!
The reason I am opening issue here is because I didn't find better place for discussion of caffe running on AMD gpus. If there is better place for discussion, please say so.

So I was able to install ROCm 1.5 on my Ubuntu 16.04 running on i7 6700 and Radeon RX 480 8GB.

I did manage to build hipCaffe and run mnist example using AMD GPU. I also tried same sample on cpu (using multithreaded OpenBLAS) and the result were not that impressive.

I am testing using command time ./examples/mnist/train_lenet.sh
When using CPU only (8 threads), the result is:

real	7m38.942s
user	22m0.692s
sys	34m6.000s

When using GPU (rx 480 + ROCm 1.5):

real	5m46.945s
user	5m27.120s
sys	0m7.256s

The speed is better also CPU is more free.
Compared to out "gpu grid" server when using 1x nvidia Titan X:

real	2m0.855s
user	1m34.332s
sys	0m43.948s

I do understand that Titan X is much faster than RX 480 and nvidia has put MUCH more resources in to deep learning optimisation.
Are those results expected, or should I get better? Anyone else tried hipCaffe on any AMD GPU?

Your system configuration

Operating system: Ubuntu 16.04 + ROCm 1.5 (kernel 4.9-kfd) + RX 480 8GB
BLAS: OpenBLAS (and amd hip BLAS variant)
Python or MATLAB version (for pycaffe and matcaffe respectively): no python/matlab

@gsedej
Copy link
Author

gsedej commented May 12, 2017

I also replied to an issue in "ROCm" github tracker: ROCm/ROCm#86

@gstoner
Copy link
Contributor

gstoner commented May 12, 2017

You are comparing for Titan X a foundation that leverages cuDNN which is optimized set fo solver for running deep learning. Current our Caffe framework is just using IM2COL and GEMM. So its, not an Apple to Apple test.

We have a piece of the puzzle that attacks critical performance bottleneck in deep learning; Optimized solvers, this will remove the performance gap. Our solution MIOpen is how we are approaching Deep Learning Solver; this will have all of the core functionality at launch, and we will be building from there. We just have not released it into the market yet. We are not far from this happing.

The Caffe library was released not to see how fast it is so that we can start the conversation of upstreaming of the libraries. But I am glad you could build it and run it, this is the current feedback we are looking for from the developers.

@vicproon
Copy link

@gstoner looking forward towards the release of MIOpen!

@gsedej
Copy link
Author

gsedej commented May 15, 2017

@gstoner thanks for reply.
I do understand comparing rx480 vs titanx both in performce and drivers.

I am interested is if someone gets better results on AMD gpus, compared to CPU-only when training. If i set my it 6700 8thread cpu to use only 4 theads (OPENBLAS_NUM_THREADS=4) i get better results than when used with hip+rx 480

export OPENBLAS_NUM_THREADS=4
time ./examples/mnist/train_lenet.sh
real	4m54.193s
user	9m47.540s
sys	4m49.708s

Does the release of opencl source code () by amd have any influence on hipCaffe
(
)

If this "issue" is not meant to be open, please close it

@gsedej
Copy link
Author

gsedej commented May 18, 2017

I just wated to report again that when learning my own network, I noticed MUCH better speedup on radeon vs cpu. (i just updated ROCm 1.5.80)
this is when using cpu (i7, 6700, all 8 threads)

real	9m11.601s
user	23m6.120s
sys	47m11.336s

when using rx 480

real	3m37.433s
user	3m5.044s
sys	0m6.564s

Notice that cpu usage was small (sys)

@bensander
Copy link
Contributor

bensander commented May 18, 2017 via email

@daveselinger
Copy link

@gsedej Do you mind sharing how you got ROCM up and running. I've been trying for about a week and I'm just having about zero luck. I've put some GIST's up to document what I'm doing and I'm not quite sure. It's probably something pretty stupid but I'm new to the AMD side.

  • Do I use the amdgpu pro driver?
  • Do I need to install the APP-SDK as well?
  • I am installing the ROCM kernel using apt and the instructions on ROCM website...

https://gist.github.com/daveselinger/9504a8496bef102a5b60613106255621
https://gist.github.com/daveselinger/8cba6d41eaa70b220725091390ff52c1

@gsedej
Copy link
Author

gsedej commented Aug 30, 2017

Hi. @daveselinger ! I didnt use hip-caffe for some time, since I am using segnet (caffe based for segmentation) that has own layers that (probably) doesn't work with hip-caffe. (somebody would need to rewrite added layers to .hpp)

Are you familiar with ordinary caffe? I was working on ordinary caffe and just copied data and prototext files over to hip-caffe and it was working.

I only have ubuntu 16.04 and rocm that I installed like in instructions. hipcaffe needs to be compiled.

No fglrx or amdgpu-pro.

I will try if it's still running and report

@gsedej
Copy link
Author

gsedej commented Aug 30, 2017

So the old compiled hipCaffe does not work, so I tried compiling from source, but it breaks because I have updated mesa 3D opengl drivers, that relay on llvm 5.0 (clang) but probably rocm/hip does not yet support llvm 5.0.

Anyway you need to have installed libraries like: hip_base hip_hcc miopen-hip hipblas

What kind of error do you get?

@gsedej
Copy link
Author

gsedej commented Aug 30, 2017

So i did manage to compile and run mnist example. But i had to disable opencv in Makefile.conf (uncomment USE_OPENCV := 0), probably due to CLANG/LLVM version and opencv version

@daveselinger
Copy link

@gsedej THANKS! So I'm obviously over-thinking the software part. I'm going through the process now of testing other MB's and other GPU's. I'll keep you posted. THANK YOU SO MUCH for the quick response!

@daveselinger
Copy link

@gsedej OK, so switching cards apparently makes it work. For whatever reason, the RX550 does not work, but the RX580 works fine. I was pretty surprised by this, but I'm pretty amazed at how easy it is once you make that switch! :) I've also tried the OpenCL driver on the 550 with Theano and that works about as well as my concrete shoes are good at swimming...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants