Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not bundle cuDNN / NCCL for all wheel packages starting in CuPy v9 #4850

Closed
kmaehashi opened this issue Mar 9, 2021 · 13 comments · Fixed by #4932
Closed

Do not bundle cuDNN / NCCL for all wheel packages starting in CuPy v9 #4850

kmaehashi opened this issue Mar 9, 2021 · 13 comments · Fixed by #4932

Comments

@kmaehashi
Copy link
Member

We have been distributing binary packages (cupy-cudaXXX) via PyPI. However recently the size of GPU-related packages (including CuPy) is starting to cause a problem on PyPI (see discussions on Python forum).

We take this problem seriously, and to help PyPI ecosystem healthy, we are planning to stop bundling cuDNN / NCCL shared libraries from all wheels, starting in v9 releases.
Note that cuDNN / NCCL are still supported and enabled in wheels, but users who want to use these features need to install the library via python -m cupyx.tools.install_library command.


FYI, here are our past efforts to reduce the package size on PyPI:

@kmaehashi kmaehashi pinned this issue Mar 9, 2021
@pentschev
Copy link
Member

As we discussed earlier today on our call, could we extend this effort to conda packages, effectively making cuDNN an optional install rather than a requirement?

@leofang
Copy link
Member

leofang commented Mar 9, 2021

As we discussed earlier today on our call, could we extend this effort to conda packages, effectively making cuDNN an optional install rather than a requirement?

@pentschev This is totally doable (cc: @jakirkham). Do you mind opening up an issue in https://github.com/conda-forge/cupy-feedstock so we can track this need there?

@leofang
Copy link
Member

leofang commented Mar 9, 2021

This is totally doable

Sorry, I spoke too fast. There is a tiny bit of work that needs to be done in CuPy, namely how to detect the absence of optional dependencies (cudnn/nccl/cutensor) and ask users to install them through Conda. We already have a mechanism to recommend doing python -m cupyx.tools.install_library as needed, so the best case scenario is we modify it to accommodate Conda.

@jakirkham
Copy link
Member

So what happens when nccl is not present? It raises an error and asks the user to install?

@pentschev
Copy link
Member

I filed conda-forge/cupy-feedstock#109 for that @leofang

@kmaehashi
Copy link
Member Author

Currently, wheels use library metadata (in JSON) which is generated and bundled during the wheel build process.
https://github.com/cupy/cupy/blob/v9.0.0b3/cupy/_environment.py#L22-L48

Warnings are displayed using the metadata as follows:
https://github.com/cupy/cupy/blob/v9.0.0b3/cupy/cuda/cudnn.py#L9-L14
https://github.com/cupy/cupy/blob/v9.0.0b3/cupy/_environment.py#L330-L341

We can extend the metadata format to support conda packages. Then conda build process can generate metadata and bundle it with CuPy distributions. (FYI we use python setup.py bdist_wheel ... --cupy-wheel-metadata metadata.json: https://github.com/cupy/cupy/blob/v9.0.0b3/cupy_setup_build.py#L698-L700 to build wheels)

How about adding 'packaging': 'pip' or 'packaging': 'conda' to the metadata top-level? Then we can show appropriate messages in _preload_warning.

@jakirkham
Copy link
Member

Yeah that makes a lot of sense (though I think Leo and yourself are more familiar with the internal details in CuPy)

When Conda-Build is running, it sets the CONDA_BUILD environment variable that we can check for when deciding whether to include this info in the metadata

@leofang
Copy link
Member

leofang commented Mar 10, 2021

We can just add the metadata to the conda recipe by hand, as I tried once (conda-forge/cupy-feedstock#75), it's not difficult.

What could be potentially concerning is runtime version pinning. How restrictive is it if CuPy builds with library vX.Y, but later on Conda-Forge we release vX.Z (with Z > Y)? I suppose all optional CUDA libraries (cuDNN, NCCL, cuTENSOR) are stable enough to follow version semantics, so this is not a big concern in reality? But then we need to ignore the version spec in the json metadata, am I right @kmaehashi? (If I understand the metadata correctly, https://github.com/cupy/cupy/blob/v9.0.0b3/cupy/_environment.py#L22-L48, it pins at a particular version.)

@leofang
Copy link
Member

leofang commented Mar 10, 2021

FYI the pinning is done here for cuDNN:

# Latest cuDNN versions: https://developer.nvidia.com/rdp/cudnn-download
_cudnn_records.append(_make_cudnn_record(
'11.2', '8.1.1',
'cudnn-11.2-linux-x64-v8.1.1.33.tgz',
'cudnn-11.2-windows-x64-v8.1.1.33.zip'))
_cudnn_records.append(_make_cudnn_record(
'11.1', '8.1.1',
'cudnn-11.2-linux-x64-v8.1.1.33.tgz',
'cudnn-11.2-windows-x64-v8.1.1.33.zip'))
_cudnn_records.append(_make_cudnn_record(
'11.0', '8.1.1',
'cudnn-11.2-linux-x64-v8.1.1.33.tgz',
'cudnn-11.2-windows-x64-v8.1.1.33.zip'))
_cudnn_records.append(_make_cudnn_record(
'10.2', '8.1.1',
'cudnn-10.2-linux-x64-v8.1.1.33.tgz',
'cudnn-10.2-windows10-x64-v8.1.1.33.zip'))
_cudnn_records.append(_make_cudnn_record(
'10.1', '8.0.5',
'cudnn-10.1-linux-x64-v8.0.5.39.tgz',
'cudnn-10.1-windows10-x64-v8.0.5.39.zip'))
_cudnn_records.append(_make_cudnn_record(
'10.0', '7.6.5',
'cudnn-10.0-linux-x64-v7.6.5.32.tgz',
'cudnn-10.0-windows10-x64-v7.6.5.32.zip'))
_cudnn_records.append(_make_cudnn_record(
'9.2', '7.6.5',
'cudnn-9.2-linux-x64-v7.6.5.32.tgz',
'cudnn-9.2-windows10-x64-v7.6.5.32.zip'))

cuTENSOR and NCCL (#4848) are done in the same file, just scroll down a bit.

@kmaehashi
Copy link
Member Author

How restrictive is it if CuPy builds with library vX.Y, but later on Conda-Forge we release vX.Z (with Z > Y)?

I think there's no problem at API/ABI level. cuDNN won't remove old API in the same major version (docs).
Only case I come up with is the regression like this: #4081

That being said, it seems other projects are pinning at vX.Y level (but not vX.Y.0 level)?
https://github.com/AnacondaRecipes/tensorflow_recipes/blob/5406148ff587642ce66b17f587039d9d9b6738f4/tensorflow-base-gpu/conda_build_config.yaml
https://github.com/AnacondaRecipes/pytorch-feedstock/blob/5e28586fefaceca3148254dfb4c28f46de732638/recipe/conda_build_config.yaml

But then we need to ignore the version spec in the json metadata, am I right @kmaehashi?

For conda packages, no preloading (ctypes.CDLL('libcudnn.so.8')) is necessary in CuPy side, because conda installs shared libraries to the location on the search path.
So json metadata is only used to generate a warning message.

@leofang
Copy link
Member

leofang commented Mar 12, 2021

That being said, it seems other projects are pinning at vX.Y level (but not vX.Y.0 level)?

I am less certain about how Anaconda pins these dependencies, but on Conda-Forge we pin cuDNN at major version for CUDA 10.2+:

Same applies to cuTENSOR and NCCL:

But then we need to ignore the version spec in the json metadata, am I right @kmaehashi?

For conda packages, no preloading (ctypes.CDLL('libcudnn.so.8')) is necessary in CuPy side, because conda installs shared libraries to the location on the search path.
So json metadata is only used to generate a warning message.

Ah I see, thanks @kmaehashi. I took a closer look and you are right. This makes things much simper! See #4873, in which I followed your earlier suggestion:

How about adding 'packaging': 'pip' or 'packaging': 'conda' to the metadata top-level? Then we can show appropriate messages in _preload_warning.

@twmht
Copy link
Contributor

twmht commented Aug 6, 2022

@kmaehashi

How did you build cupy wheel without cudnn support?

from doc

When installing CuPy from source, features provided by additional CUDA libraries will be disabled if these libraries are not available at the build time.

in other words, if we have cudnn library, it would be enable when building the wheel. How did you build cuda only wheel?

@kmaehashi
Copy link
Member Author

Currently, there's no way to programmatically force building without cuDNN when it is available.

You can comment out this line:

or, you can make a faulty dummy header so that CuPy does not recognize cuDNN as available.

echo "#error dummy" > cudnn.h
CFLAGS="-I${PWD}" pip install -v .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants