Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Support for other Device Types, OpenCL AMD GPU #621

Open
philtomson opened this issue Nov 18, 2015 · 43 comments
Open

Support for other Device Types, OpenCL AMD GPU #621

philtomson opened this issue Nov 18, 2015 · 43 comments

Comments

@philtomson
Copy link

It would be nice to eventually have OpenCL support for those of us with GPUs that don't do CUDA.

@jermainewang
Copy link
Contributor

Hi,

We are considering this as well! The thing is that for company like AMD, they are actually developing environments that is compatible with CUDA in the future: http://www.anandtech.com/show/9792/amd-sc15-boltzmann-initiative-announced-c-and-cuda-compilers-for-amd-gpus. So we are still deciding whether we are going to put our limited human resources into supporint OpenCL. If you are interested, you are more than welcome to help us enhance the project in this direction.

Thank you,
Minjie

@gujunli
Copy link

gujunli commented Nov 18, 2015

Hi Minjie,

Maybe we can collaborate on extending MXnet with OpenCL support. we have
Opencl caffe open sourced, I guess we can reuse the core kernels?

thanks!
Junli
On Nov 18, 2015 10:53 AM, "Minjie Wang" notifications@github.com wrote:

Hi,

We are considering this as well! The thing is that for company like AMD,
they are actually developing environments that is compatible with CUDA in
the future:
http://www.anandtech.com/show/9792/amd-sc15-boltzmann-initiative-announced-c-and-cuda-compilers-for-amd-gpus.
So we are still deciding whether we are going to put our limited human
resources into supporint OpenCL. If you are interested, you are more than
welcome to help us enhance the project in this direction.

Thank you,
Minjie


Reply to this email directly or view it on GitHub
#621 (comment).

@jermainewang
Copy link
Contributor

Ha, it's great to hear voice from AMD people here :p. I heard that the problem of integrating OpenCL is mainly due to its support for template which is widely used in mshadow. @tqchen may know more details about this.

Minjie

@mli
Copy link
Member

mli commented Nov 18, 2015

@gujunli it's very nice to see you here, (we met at icml beijing last year). we are definitely interested on opencl, and hope to support it asap. but the current issue is that we used C++ template while opencl doesn't support it, see dmlc/mshadow#71

@gujunli
Copy link

gujunli commented Nov 18, 2015

template is an issue. On AMD devices we have special keyword to support, no
problem. The problem is the same key word does not work on NV GPUs. we are
also figuring out a general solution. I would like to hear your thoughts on
this.@limu @minjie

junli

On Wed, Nov 18, 2015 at 11:43 AM, Mu Li notifications@github.com wrote:

@gujunli https://github.com/gujunli it's very nice to see you here, (we
met at icml beijing last year). we are definitely interested on opencl, and
hope to support it asap. but the current issue is that we used C++ template
while opencl doesn't support it, see dmlc/mshadow#71
dmlc/mshadow#71


Reply to this email directly or view it on GitHub
#621 (comment).


Junli Gu--谷俊丽
Coordinated Science Lab
University of Illinois at Urbana-Champaign


@mli
Copy link
Member

mli commented Nov 18, 2015

nvidia gpu should be fine with cuda. our main motivation to support opencl is for amd gpus and other devices, such as fpga. for example, altera also contacted us to make mxnet run on their devices.

@tqchen
Copy link
Member

tqchen commented Nov 18, 2015

This won't pose a problem as long as AMD's version of compiler support somewhat similar thing as nvcc does, i.e. template programming and allow integration of host and device code.

What can be done is to have something like tensor_gpu-inl.amd.cc to specialize for AMD's version of keyword. As long as the extra keyword is minimum and the compiler can be detected by marco, it should be fine.

@tqchen tqchen changed the title OpenCL support? Support for other Device Types Nov 18, 2015
@philtomson
Copy link
Author

It would be nice to be able to target FPGAs and OpenCL would allow that to be done much more easily through the Altera tool chain.

Also: I'm not sure I understand the templates issue, isn't there a C API that could be used to get around that?

@philtomson
Copy link
Author

I'll also add that OpenCL would allow targetting Intel Integrated Graphics which is pretty common on a lot of laptops as well as desktops these days.

@vchuravy
Copy link
Contributor

@philtomson The problem is more for the kernel code. OpenCL uses C as a language for its kernels and MXNet uses C++ for CUDA and CPU kernels and is able to generate both from the same template, which is nice because you don't need to support 2 or 3 different versions of things.

@tqchen tqchen changed the title Support for other Device Types Support for other Device Types, OpenCL AMD GPU Nov 21, 2015
@ieee8023
Copy link

+1 I want to experiment on my laptop which does not have cuda support!

@liangfu
Copy link
Member

liangfu commented Nov 10, 2016

@vchuravy Speak of portability, maybe it's the problem of using template itself in mxnet, because a neural network implement doesn't really need templates for different data types. For a typical neural network implementation, single precision floating point is most commonly used, because double precision is unnecessary and leads to much more computational cost, and half precision computation is not native supported among many devices. Using fixed point data types are completely another case for performance optimization. What people really want is a single efficient, flexible, minimal and yet portable neural network implementation, that can be ported to multiple CPUs, GPUs and FPGAs. The design principle of mxnet meets almost all of these features except the last one.

@mz24cn
Copy link

mz24cn commented Dec 16, 2016

Is there anyone who tried AMD HIP tools on MXNet?

@kernel8liang
Copy link
Contributor

+1

1 similar comment
@skirdey
Copy link

skirdey commented Jan 17, 2017

+1

@ghost
Copy link

ghost commented Jan 18, 2017

Really want to see this happen someday for a major Python framework besides Tensorflow (and without using a limited, experimental, proprietary compiler framework). Competition!

@mz24cn
Copy link

mz24cn commented Jan 19, 2017

https://www.khronos.org/registry/OpenCL/ opencl 2.2 C++ language, including templates support, now is in provisional status. Of course, till now there is no manufacturers releases 2.2 drivers.

@delijati
Copy link

This could help convert the cuda kernels to opencl https://github.com/hughperkins/cuda-on-cl

@windywinter
Copy link
Contributor

Hi all,

I've been trying to tackle this problem for some time. From my investigation, cocl does not work very well because mshadow is built on Thrust which uses a lot of CUDA host side API that are not supported by cocl. @delijati
Therefore, what we found promising is to use VexCL as the vector expression library (instead of mshadow) for GPU device. Currently I have most arithmetic operators on NDArray working but still need to fill in a hell lot of symbolic operators for the whole framework to work. Proof of concept code is here: https://github.com/windywinter/mxnet

@viper7882
Copy link

Hi all,

I'm looking at PyOpenCL and it could be a solution for MXNet. The challenge that I've observed so far is PyOpenCL requires installation of Intel Open CL SDK on user's machine (if they are running Intel Graphics Card).

An example shared by Easy OpenCL with Python is that Gaston Hillar has demonstrated to use only 12 steps to build and deploy a kernel with PyOpenCL. I've tested his codes and it is working for me.

I wonder if MXNet would consider to support PyOpenCL?

@viper7882
Copy link

Update: I've tested DeepCL by Hugh Perkins to run using Intel Graphics Card to run Q-Learning and it runs perfectly in Python 2.7: https://github.com/viper7882/DeepCL.

Hugh Perkins has created EasyCL to access OpenCL based GPU @ https://github.com/hughperkins/EasyCL. I'm evaluating if it is possible to merge DeepCL with MXNET. Looks challenging to me to merge the two due to the difference of underlying structure. Any help is appreciated.

@viper7882
Copy link

Hi @jermainewang ,

Hugh Perkins has provided NVIDIA® CUDA™ cuDNN API for Coriander, OpenCL 1.2 which ideally should be able to interface with existing Mxnet NVIDIA® CUDA™ cuDNN API.

Could you take a look if it make sense to connect Mxnet with OpenCL through this interface?

@ghost
Copy link

ghost commented Jun 20, 2017 via email

@tqchen
Copy link
Member

tqchen commented Aug 17, 2017

Would like to update on this this can now be done via https://github.com/dmlc/tvm

@springishere
Copy link

@tqchen do you mean that TVM supports opencl? I would like to use mxnet with opencl to use ARM GPU (Mali).

@tqchen
Copy link
Member

tqchen commented Oct 5, 2017

yes, TVM support OpenCL, Metal, CUDA, ARM x86 javascript

@kpot
Copy link

kpot commented Oct 18, 2017

Hi all,

Guys, can anyone explain why mxnet still doesn't support OpenCL out of the box, even though it is based on nnvm now and through it on tvm, and should be able to perform all necessary computations using OpenCL devices? I checked on nnvm recently and it looks fully up to the task.
But even in the upcoming mxnet 0.12, context mxnet.gpu() still means only "CUDA" and has no associations with tvm.gpu() or tvm.cl(). Why?

Perhaps more than 30% of consumer GPUs around are AMD/Intel-made devices supporting OpenCL >= 1.2. Very often it's a great, inexpensive and less restricted hardware, and making it available for training would greatly benefit ML community.

@welnaseth
Copy link

Any updates on this? @kpot makes a good point above, tvm (and nnvm due to being built off of it) supports opencl, so to me it seems like it shouldn't be too hard to implement opencl as an option. It would be nice to have a timeline for when this can be implemented and if not, what things are blocking it?

@conradwt
Copy link

conradwt commented Mar 24, 2018

Hi All, are there any updates regarding this topic because I would like to see OpenCL be the default for MXNet as well as other ML libraries and frameworks instead of restricting GPU compute only to Nvidia hardware and CUDA?

@itsergiu
Copy link

itsergiu commented Jun 19, 2018

Do you already provide an installation kit for AMD GPU RX550?
Does it work with Windows 10?
Does it work with Jupyter, Anaconda and Keras on top of Tensorflow?

@edmondja
Copy link

edmondja commented Jul 3, 2018

+1 waiting for it

@dmidge8
Copy link

dmidge8 commented Jul 16, 2018

Also hoping to have it!

@imkow
Copy link

imkow commented Jul 21, 2018

waiting for this...

@aenikata
Copy link

At the moment there's cloud providers like gpueater pushing the AMD option, which naturally leads towards Keras+PlaidML not MXNet. My ideal would be to be able to take one of the (almost universally AMD-based) cryptocurrency rigs you can pick up for a reasonable price and see what deep learning you can do with it.

@ammarRajabA
Copy link

Can anybody update us about this?

@mz24cn
Copy link

mz24cn commented May 7, 2019

@metal3d
Copy link

metal3d commented Feb 3, 2020

TensorFlow, PyTorch, MxNet... none of them listen to the users for that need.
I've got a Intel card on 3 laptops, using NEO opencl with LuxRender for example and it computes 7x to 20x faster. But for ML, I can't.

OpenCL is not restrictive, open, works on a large variety of card, even Raspberry can use OpenCL, cf. Pi OpenCL.

Please, consider SYSCL for example. We are not all able to pay thunderbolt hardware...

@leezu
Copy link
Contributor

leezu commented Feb 3, 2020

@metal3d contributions welcome. Also see TVM

@metal3d
Copy link

metal3d commented Feb 4, 2020

@leezu excuse me, but your remark seems to not be serious. "Contributions welcome" is like to say "do it if you're so strong".

The core of kernel compilation for machine learning in that kind of framework is "central", that's something that is chosen at the beginning and along the development process.

Contribution by "one guy external to the project" is not possible for that. If I want to work on that:

  • I need to learn OpenCL in detail
  • I need to onboard MxNet framework development
  • I need a team
  • and a lot of time

The problem is that we are asking for OpenCL in a lot of frameworks since months, or years - and there is rarely some answers about:

  • Why to choose CUDA that is closed, proprietary and has got a cost for hardware, working on closed driver, with libraries you can download after registration on NVidia website
  • Why to not give any news on OpenCL integration possibilities, only some "news"

We don't force authors to use OpenCL, we only wonder why there is nothing done in that direction.
Look at that issue, it is opened since 4 years.

4 years !

Look at the question on TF: tensorflow/tensorflow#22

4 years too.

As for Mxet, we never had "clear" answer. No track of something that can explain why and/or how to fix that need. If someone is working on...

Worst: TF has only one active project to compile it with SYSCL. You need to register your user, and try a long compilation that fails 90% of time.

So, sorry if my comment, question, and answers seem to be "aggressive" but 4 years is a bit long without any clear answer like "we won't do that", or "we cannot", or "we will try" and/or why it is not in the way.

So, "contributions welcome"... please... it's like if you're telling to someone in the street "sorry, what time is it" for 5 hours... and after that the man answer "go buy a watch"

@leezu
Copy link
Contributor

leezu commented Feb 4, 2020

I don't see any blocker to add the feature you're requesting, just there's noone willing to work on it. You pointed out the constraints correctly, at lot of ressources are required. Thus my comment is serious.
TVM will solve the problem in the not too-far future, so there is no strong incentive to invest resources now into manually writing code targeting OpenCL. Did you take a look at https://docs.tvm.ai/tutorials/get_started.html#generate-opencl-code ?

@metal3d
Copy link

metal3d commented Feb 4, 2020

At first, thanks for your answer.

I don't see any blocker to add the feature you're requesting, just there's noone willing to work on it.

That's the problem we point.
The problem that I see (and other than me can see also) is that it seems that major frameworks are trying to make things "faster and easier" before to make the framework more largely usable. That's all we say... That's cool that CUDA is supported and that AWS or Google proposes GPU on demand. But in reality, OpenCL can help to make ML more accessible for modest hardware owners.

And it's now 4 or 5 years that the problem persists. I wish you understand the frustration.

More than that, this give a large monopoly to NVidia that no one seems to want to stop...

As explained in https://towardsdatascience.com/on-the-state-of-deep-learning-outside-of-cudas-walled-garden-d88c8bbb4342 article:

Open source code that targets only a proprietary target is not exactly open open source. We can do better!

And I agree with that.

You said:

Thus my comment is serious.

Excuse me, it could be a translation problem (I'm not English, excuse my bad English BTW), but in French it sounds like "do it yourself". That probably why I answered a bit aggressively.

TVM, no, sorry I didn't know that project and I will take a look. I'm not sure it will resolve the issue, but reading the page you pointed seems to be interesting. Thanks for that.

I hope that you don't take my comment too severely.

@joaomamede
Copy link

joaomamede commented Nov 26, 2020

@metal3d It's the tradition of where the coders are. Some projects opt to cut resources to the minimum working objective, meaning that integration to a wide variety of choices is left behind and spending money on other things doesn't seem to be a problem though (like nvidia HW).
Why do you think people still code these things in windows although it's a terrible platform from it. Tradition.
It's the sad reality of resources limitations and mostly tradition of training.
RocM now works with mxnet apparently..by using nvcc code lol. I also think openCL should be the way to go, as intel,amd,nvidia, etc are all supported.
And I guess for work, I'll be forced to buy a 3x the price nvidia (instead of 3 GPU of same performance) to run my software because most toolkits I use are for cuda.
AMD is a rich company and they lagged behind, and now are forced to adapt ROCm to CUDA....instead of having something more generalistic

@samurai815
Copy link

+1 waiting for it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests