GPU Support? #63

alexbw · 2016-03-27T16:48:07Z

I've got a couple packages I'm preparing for upload that rely on GPUs. I'm not up to speed on what open-source CI solutions offer, but would building against a VM w/ GPUs be supported? If it would require pitching in or donating to the project, I'm pretty sure I can figure some way to help.

pelson · 2016-03-27T16:52:11Z

I'm completely out of my depth on this one. @msarahan any knowledge on the subject?

jakirkham · 2016-03-27T16:56:21Z

I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs.

alexbw · 2016-03-27T17:03:46Z

Unfortunately, CUDA really is the de-facto standard for machine learning. I
think NVidia is generally interested in helping out the open-source
community, so it might be worth starting a conversation with them about
helping out. I'll test out conda-forge with non-GPU packages, and if things
seem to work smoothly, then can start talking with them.

Question -- is conda and conda-build updated regularly on this system? My
packages are in Lua, and support for them was only added recently.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs.
However, if you are willing to work with OpenCL, which works with CPUs and
GPUs, then we can work together on getting that support on CIs.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#63 (comment)

jakirkham · 2016-03-27T17:19:39Z

I thought you might say that. Unfortunately, without the CI support, we are kind of in a bind on this one. If NVIDIA is willing to work on CI support with GPUs, that would be great.

To be completely honest with you, I don't think we should be holding our breath on this. The real problem is that most CI services are leasing time on other infrastructure. Primarily Google and Amazon's infrastructure. Unless someone has the infrastructure with GPUs that they are willing to lease to some CI service for this purpose, we are kind of stuck. I think we can all imagine what they would prefer to do with this infrastructure, right? However, if you figure out something on this, please let us know and we can work on something.

I'm guessing you are using Torch then? At a bare minimum, let's work on getting Torch's dependencies in here. At least, that will make your job a little easier, right? For that matter any run-of-the-mill Lua packages that you have would be good to try to get in, as well. It should help you and others looking for more Lua support in conda. How does that sound?

jakirkham · 2016-03-27T22:27:37Z

This repo seems to use NVIDA components for their CI.

jakirkham · 2016-03-30T18:37:07Z

Did you see the link above, @alexbw?

This might not be totally impossible after all, but I think we should do some research on how this works. What platforms are you trying to support? Just Linux? Mac also? I'm totally out of my depth on Windows. So we may need someone else to show us the ropes there.

alexbw · 2016-03-30T18:41:00Z

Saw the link. Looking more into this, but on the timescale of a few weeks.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs.
However, if you are willing to work with OpenCL, which works with CPUs and
GPUs, then we can work together on getting that support on CIs.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#63 (comment)

jakirkham · 2016-04-05T14:42:44Z

So, I looked a little bit more closely at this and it looks like one could add GPU libs to CentOS 6 (what we are currently using). There is Mac and Windows support too, but IMHO this is secondary to getting Linux up and working. However, I am not seeing any support for CentOS 5 (a platform we were debating switching too), which is something to keep in mind.

msarahan · 2016-04-05T14:44:36Z

Good to know. We are collecting datapoints on whether continuing with CentOS 5 is a good idea. If anyone knows of definitive reasons to stay with CentOS5, it is currently preventing:

Qt5
complete LLVM
now, GPU libs

jakirkham · 2016-04-05T14:46:51Z

Glad you saw this, @msarahan. Was debating cross referencing, but didn't want to have a mess of links. Are there still many customers using a CentOS 5 equivalent Linux? Could we maybe build the compiler on CentOS 5 and somehow add it to CentOS 6?

msarahan · 2016-04-05T14:52:39Z

Building the compiler on the older architecture doesn't help. What matters is the GLibc version present on the build system when packages are built.

We don't have hard data on customers, outside of HTTP headers for downloads of packages. We're digging through that to see how many people have kernels older than the one corresponding to CentOS 6.

jakirkham · 2016-04-05T14:56:09Z

Right. I was just hoping there was a way we could somehow have both. I guess in the worst case some things could be CentOS 6 as needed. Will that have any consequences if we mix the two? Is that already done to some extent (noting that Qt5 was mentioned).

Yeah, it seems like it would be good to give a survey. Might need an incentive to make sure it actually gets filled out.

jakirkham · 2016-04-05T15:33:31Z

Also, interesting footnote (though I would appreciate if other people check and make sure I am reading this right as IANAL), it appears that at least some of the CUDA libraries can be shipped. This means we could create a CUDA package that simply makes sure CUDA is installed in the CI and moves them into the conda build prefix for packaging. The resulting package could then be added as a dependency of anything that requires them (e.g. Torch, Caffe, etc.). This would avoid us having to add these hacks in multiple places and risk losing them when we re-render. Furthermore, we would be able to guarantee that the libraries we used to build would be installed on the user's system.

jakirkham · 2016-04-05T15:39:02Z

We should verify whether one version of CUDA for one Linux distro and version can be used on other Linux distros and versions easily or if we need to have multiple flavors. This survey will have to be extended to other OSes at some point, but starting with Linux makes the most sense to me.

jakirkham · 2016-04-05T16:28:19Z

So, I am trying something out in this PR ( conda-forge/caffe-feedstock#1 ). In it, I am installing CUDA libraries into the docker container and attempting to have Caffe build against them. It is possible some tests will fail if we don't have access to an NVIDA GPU so we will have to play with that. Also, we don't have cuDNN as that appears to require some registration process that I have not looked into yet and may be a pain to download in batch mode.

In the long run, I expect the CUDA libraries will be wrapped in their own package for installation and packages needing them will simply install these libraries. We may need to features to differentiate different GPU variants (CUDA/OpenCL). However, that CUDA package will probably need to hack the CI script in a similar way.

Definitely am interested in feedback. So, please feel free to share.

jakirkham · 2016-04-05T17:01:39Z

Another thought might be that we don't ship the CUDA libraries. Instead we have a package that merely checks to set that they are installed via a pre or post-link step. If it fails to find them, the install fails. This would avoid figuring out where the libraries can or cannot be distributed safely. Hopefully, as we are linking against the CUDA API. All that will matter is that we have an acceptable version of the CUDA libraries to run regardless of what Linux distribution it was initially built on.

jakirkham · 2016-04-05T17:13:49Z

Appears Circle CI does provide GPU support or at least that is what my testing suggests.

jakirkham · 2016-04-07T15:16:12Z

Also, as FYI, in case you didn't already know @msarahan, CentOS 5 maintenance support ends March 2017. In other words, less than a year. That sounds like a pretty big negative to me. Given how many recipes have been added from conda-recipes and how many remain to be added at this point, trying to add a CentOS 5 switch before that point sounds challenging. Not to mention, we may find ourselves needing to migrate back to CentOS 6 by that point. Maybe it is just me, but I'm starting to feel a lot of friction in switching to CentOS 5. Is it reasonable that we consider just accepting CentOS 6 as part of this transition?

kyleabeauchamp · 2016-04-09T18:29:12Z

FWIW, we have GPU support on Omnia. Might be worth reading over.

https://github.com/omnia-md/conda-recipes

https://github.com/omnia-md/omnia-build-box

jakirkham · 2016-04-09T18:41:18Z

Thanks for these links @kyleabeauchamp. I'll certainly try to brush up on this.

Do you have any thoughts on this PR ( conda-forge/caffe-feedstock#1 )? Also, how do you guys handle the GPU lib dependency? Is that packaged somehow, used from the system (possibly with some sort of check), some other way?

kyleabeauchamp · 2016-04-09T18:44:32Z

So AFAIK our main use of GPUs was building a simulation engine OpenMM (openmm.org). OpenMM is a C++ library and can dynamically detect the presence of CUDA support (via shared libraries) at runtime. This means that we did not package and ship anything related to CUDA. We basically just needed CUDA on the build box to build the conda package, then let OpenMM handle things later dynamically.

kyleabeauchamp · 2016-04-09T18:47:07Z

Looks like our dockerfile is somewhat similar to your CUDA handling:

https://github.com/omnia-md/omnia-build-box/blob/master/Dockerfile#L25

jakirkham · 2016-04-09T19:24:33Z

Ah, ok, thanks for clarifying.

The trick with Caffe, in particular, is that it can use CPU, CUDA, or OpenCL. CPU support is always present; however, a BLAS is required, which includes a user's CPU choice (OpenBLAS, ATLAS, MKL, or possibly some hack to add other options) and a GPU choice (if any) cuBLAS or ViennaCL. Thus, having this dynamically determined ends up not really being as nice as it could be. To allow actual selection will require feature support and possibly multiple rebuilds of Caffe.

One simple route might be to just always use ViennaCL, which can abstract the difference between the OpenCL and CUDA options. Also, it can always fallback to the CPU if no GPU support is present. Though I expect this layer of abstraction comes at some penalty, the question is how severe is that penalty. Would a solution like this work with OpenMM? I don't know if its GPU support proceeds primarily through a GPU BLAS or some other mechanisms. For instance, is it using FFTs?

If you have deep learning interests, this may be relevant. In the case of Caffe, it can optionally support cuDNN. Researchers will want this support out of the box. Not only is this tricky because it may not be provided for due to hardware or software reasons, it is tricky because downloading cuDNN requires a registration step with unclear licensing restrictions. One way we might resolve this is to request cuDNN be loaded on an appropriate Docker container. NVIDA does do this with Ubuntu 14. However, I don't see a similar container for CentOS 6 and am unclear on whether it would be a supported platform. Ultimately, it will require us to communicate with NVIDA at some point to see what we need to do here to stay above board while providing users state of the art support.

Fortunately, NVIDA is very clear as to what parts of the CUDA libraries can be distributed down to file level of what can and cannot be distributed. So, the concerns with cuDNN do not affect this.

jakirkham · 2016-04-09T19:36:04Z

Another thought for more versatile support would be to use the clMath libraries.

kyleabeauchamp · 2016-04-09T19:51:28Z

OpenMM dynamically chooses the best platform at runtime, with options including CPU (SSE), CUDA, OpenCL, and CPU (no SSE / reference). It does use FFTs. The idea with OpenMM is to build and ship binaries that support all possible platforms, then select at runtime.

hmaarrfk · 2019-03-22T00:08:42Z

@scopatz did the legal council state what clause made the cuda toolkit non-redistributable?

Are we allowed to link to cuda stuff so long as we pull in the dependency from defaults?

Ref: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
xref: https://github.com/conda-forge/pytorch-cpu-feedstock

scopatz · 2019-03-22T13:00:31Z

Basically, counsel has said that the EULA - even attachment A - only refers to applications (it does), and conda-forge cannot be considered an application under any reasonable definition.

Also, it looks like they have added the following to attachment A, which we don't meet:

The NVIDIA CUDA Driver Libraries are only distributable in applications that meet this criteria:

1. The application was developed starting from a NVIDIA CUDA container obtained from Docker Hub or the NVIDIA GPU Cloud, and
2. The resulting application is packaged as a Docker container and distributed to users on Docker Hub or the NVIDIA GPU Cloud only.

hmaarrfk · 2019-03-22T13:30:21Z

I guess we can't distribute the Cuda driver. Many people are OK with installing that themselves :/.

Second question would be:
Is linking to the libraries on defaults OK by conda-forge standards?

scopatz · 2019-03-22T14:23:55Z

Linking to the deafults libraries is fine for us

hadim · 2019-03-22T14:25:59Z

Then we can say goodbye to reproducible workflow involving CUDA. This is really a non sense we can't ship it in conda-forge.

I understand the legal aspect of it. What I don't understand is why they took that decision...(unless it's a misunderstanding and we can actually ship it conda?)

scopatz · 2019-03-22T14:32:36Z

Then we can say goodbye to reproducible workflow involving CUDA. This is really a non sense we can't ship it in conda-forge.

I understand the legal aspect of it. What I don't understand is why they took that decision.

I agree on all counts.

(unless it's a misunderstanding and we can actually ship it conda?)

From speaking with NVIDIA, I am reasonably sure that this is not a misunderstanding on their part. The folks who work on ML and cuda and PyData stuff at NVIDIA are different than those in their legal department. I don't know the justification for the license as it stands, but we have to live with it until:

they change the EULA
they give us a written exception, which I have asked for but they haven't responded yes or no

hadim · 2019-03-22T15:07:03Z

Thank you @scopatz for working on all those things.

In the meantime for those who want to have reproducible CUDA installation (only Linux unfortunately), see the following gist: https://gist.github.com/hadim/a4fe638587ef0d7baea01e005523e425

ericpre · 2020-04-28T14:49:31Z

Now that building package with CUDA is working fine on linux (in case of https://github.com/conda-forge/prismatic_split-feedstock, it went smoothly), how would it be possible to add windows support? Is it worth trying to add a cudatoolkit-dev package for windows in order to build GPU packages on windows?

isuruf · 2020-04-28T14:54:11Z

PRs welcome. We could use cudatoolkit-dev in Linux too to avoid using /usr/include because NVIDIA images have been polluting the includes into that folder.

ericpre · 2020-04-28T15:04:48Z

Ok, I will give a try.

jakirkham · 2020-04-28T16:51:30Z

@ericpre, for an example of how to build GPU packages would take a look at nvidia-apex as a simple example. Sorry there are no docs atm.

isuruf · 2020-04-28T19:03:34Z

@jakirkham, does nvidia-apex-feedstock build windows cuda packages?

znmeb · 2020-10-18T05:52:08Z

I've got an NVIDIA Jetson AGX Xavier (8-core Tegra aarch64 CPU plus a 512-core Volta GPU) that I can test on. I just discovered Miniforge a couple of days ago - it appears to be working out of the box on the Tegra CPU.

I also have a laptop with a GTX 1050 Ti GPU / x86_64 CPU but there's plenty of other ways to get at the GPU there.

tyler274 · 2020-11-22T05:48:01Z

Why isn't OpenCL included in OpenCV already. CI support and license issues aren't an issue so why is it blocked on Cuda?

This was referenced Apr 5, 2016

Add caffe conda-forge/staged-recipes#256

Merged

WIP: Add CUDA support (proposal) conda-forge/caffe-feedstock#1

Closed

This was referenced Apr 7, 2016

Should we use gcc from the default channel for Linux (and maybe OS X)? #29

Closed

Organising the conda communities and establishing best practices. conda-forge/staged-recipes#299

Closed

jakirkham mentioned this issue May 14, 2016

Add a .condarc file to manage channels JuliaPy/Conda.jl#20

Closed

croth1 mentioned this issue Apr 3, 2019

How about cuda10.1? #762

Closed

croth1 pinned this issue Apr 3, 2019

isuruf unpinned this issue Apr 3, 2019

isuruf pinned this issue Apr 3, 2019

isuruf unpinned this issue Apr 6, 2019

isuruf pinned this issue Apr 6, 2019

CJ-Wright mentioned this issue Jun 15, 2019

Initiating a discussion on supporting more package variants NSLS-II/lightsource2-recipes#712

Open

bgruening mentioned this issue Aug 23, 2019

Assistance in building GPU Racon bioconda/bioconda-recipes#16358

Closed

xhochy mentioned this issue Sep 11, 2019

Package Addons on conda-forge tensorflow/addons#201

Closed

joaander mentioned this issue Nov 15, 2019

Easier GPU support? glotzerlab/hoomd-blue#516

Closed

ltalirz mentioned this issue Apr 8, 2021

features to enable conda-forge/cp2k-feedstock#19

Open

8 tasks

h-vetinari mentioned this issue Oct 25, 2021

Support for Apple GPU builds #1537

Open

isuruf unpinned this issue Jul 5, 2023

jcar87 mentioned this issue Oct 24, 2023

CUDA - Waiting for a meaningful proposal conan-io/conan-center-index#11448

Open

2 tasks

jaimergp mentioned this issue Dec 4, 2023

document gpu/long-running builds #2038

Merged

2 tasks

isuruf closed this as completed in #2038 Dec 5, 2023

GPU Support? #63

GPU Support? #63

Comments

alexbw commented Mar 27, 2016

pelson commented Mar 27, 2016

jakirkham commented Mar 27, 2016

alexbw commented Mar 27, 2016

jakirkham commented Mar 27, 2016

jakirkham commented Mar 27, 2016

jakirkham commented Mar 30, 2016

alexbw commented Mar 30, 2016

jakirkham commented Apr 5, 2016

msarahan commented Apr 5, 2016

jakirkham commented Apr 5, 2016

msarahan commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 5, 2016

jakirkham commented Apr 7, 2016

kyleabeauchamp commented Apr 9, 2016

jakirkham commented Apr 9, 2016

kyleabeauchamp commented Apr 9, 2016

kyleabeauchamp commented Apr 9, 2016

jakirkham commented Apr 9, 2016

jakirkham commented Apr 9, 2016

kyleabeauchamp commented Apr 9, 2016

hmaarrfk commented Mar 22, 2019 • edited Loading

scopatz commented Mar 22, 2019

hmaarrfk commented Mar 22, 2019

scopatz commented Mar 22, 2019

hadim commented Mar 22, 2019

scopatz commented Mar 22, 2019

hadim commented Mar 22, 2019

ericpre commented Apr 28, 2020

isuruf commented Apr 28, 2020

ericpre commented Apr 28, 2020

jakirkham commented Apr 28, 2020

isuruf commented Apr 28, 2020

znmeb commented Oct 18, 2020

tyler274 commented Nov 22, 2020

hmaarrfk commented Mar 22, 2019 •

edited

Loading