Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find libdevice in TF 2.11 + compilation fails without ptxas #296

Open
1 task done
drasmuss opened this issue Jan 12, 2023 · 27 comments
Open
1 task done

Cannot find libdevice in TF 2.11 + compilation fails without ptxas #296

drasmuss opened this issue Jan 12, 2023 · 27 comments
Labels

Comments

@drasmuss
Copy link

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

TensorFlow 2.11 broke something about how they locate the libdevice library, when cuda is installed through conda. See tensorflow/tensorflow#56927 or tensorflow/tensorflow#59013.

Here is a simple repro script:

mamba create -n tmp python=3.9 tensorflow=2.11
mamba activate tmp
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

Which gives the error:

    ...
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_1'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

I suspect that this is a bug on TensorFlow's end, not something you are really responsible for. But the only fixes in the issues linked above involve hacky workarounds, manually copying the libdevice file to some other location where TensorFlow is expecting to find it. So I'm wondering if it'd be possible to fix it more robustly in the conda-forge package, so that we don't have to manually copy files around every time we create a new environment.

Installed packages

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.3            py39hb9d737c_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
blinker                   1.5                pyhd8ed1ab_0    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
cryptography              39.0.0           py39h079d5ae_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
flatbuffers               22.12.06             hcb278e6_2    conda-forge
frozenlist                1.3.3            py39hb9d737c_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
google-auth               2.15.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpcio                    1.51.1           py39h8c60046_0    conda-forge
h5py                      3.7.0           nompi_py39h817c9c5_102    conda-forge
hdf5                      1.12.2          nompi_h4df4325_101    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.0.0              pyha770c72_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                h9c3ff4c_0    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.87.0               hdc1c0ab_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgrpc                   1.51.1               h30feacc_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39hb9d737c_2    conda-forge
multidict                 6.0.4            py39h72bdee0_0    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
numpy                     1.24.1           py39h223a676_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openssl                   3.0.7                h0b41bf4_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
protobuf                  4.21.12          py39h227be39_0    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.6.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.15          hba424b6_0_cpython    conda-forge
python-flatbuffers        23.1.4             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
re2                       2022.06.01           h27087fc_1    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.10.0           py39h7360e5f_0    conda-forge
setuptools                65.6.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
tensorboard               2.11.0             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1            py39h3ccb8fc_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.0          cuda112py39h01bd6f0_0    conda-forge
tensorflow-base           2.11.0          cuda112py39haa5674d_0    conda-forge
tensorflow-estimator      2.11.0          cuda112py39h11d7a3b_0    conda-forge
termcolor                 2.2.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
werkzeug                  2.2.2              pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1           py39hb9d737c_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.8.2            py39hb9d737c_0    conda-forge
zipp                      3.11.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge

Environment info

active environment : tmp
    active env location : /home/drasmuss/mambaforge/envs/tmp
            shell level : 9
       user config file : /home/drasmuss/.condarc
 populated config files : /home/drasmuss/mambaforge/.condarc
                          /home/drasmuss/.condarc
          conda version : 22.9.0
    conda-build version : not installed
         python version : 3.10.6.final.0
       virtual packages : __cuda=12.0=0
                          __linux=5.15.79.1=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/drasmuss/mambaforge  (writable)
      conda av data dir : /home/drasmuss/mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/drasmuss/mambaforge/pkgs
                          /home/drasmuss/.conda/pkgs
       envs directories : /home/drasmuss/mambaforge/envs
                          /home/drasmuss/.conda/envs
               platform : linux-64
             user-agent : conda/22.9.0 requests/2.28.1 CPython/3.10.6 Linux/5.15.79.1-microsoft-standard-WSL2 ubuntu/20.04.5 glibc/2.31
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False
@drasmuss drasmuss added the bug label Jan 12, 2023
@hmaarrfk
Copy link
Contributor

can you point to the specific fix?

@drasmuss
Copy link
Author

This is the clearest set of instructions I found tensorflow/tensorflow#56927 (comment)

@drasmuss
Copy link
Author

Specifically, if I do these steps, the error goes away

mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/libdevice.10.bc
XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

There's a new error about ptxas compilation, but I'm not sure if that's related to this or a separate issue.

@hmaarrfk
Copy link
Contributor

By new error, do you mean Aborted (core dumped)?

@hmaarrfk
Copy link
Contributor

Yeah, with tensorflow 2.10 i get:

python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"
2023-01-12 20:01:09.730449: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 20:01:09.814895: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 20:01:10.972323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 20:01:11.375475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5294 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:17:00.0, compute capability: 6.1
1/1 [==============================] - 0s 328ms/step - loss: 1.9309

I wonder if we just have to disable xla.

@hmaarrfk
Copy link
Contributor

For internal reference, this is the pull request that moved that code last.
tensorflow/tensorflow@e7ec37f

That said, I just don't get what the problem is. Maybe we have to disable XLA?

@drasmuss
Copy link
Author

By new error, do you mean Aborted (core dumped)?

Yes, here's the error printout I get after applying the first "fix":

2023-01-12 18:09:15.667594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 8887 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-01-12 18:09:16.442301: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7f930d167e50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-12 18:09:16.442334: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2023-01-12 18:09:16.444935: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-01-12 18:09:16.491617: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.509586: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.509637: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-01-12 18:09:16.528680: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.528775: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.    
Aborted

@hmaarrfk
Copy link
Contributor

I uploaded some packages built with the the following patch

diff --git a/recipe/build.sh b/recipe/build.sh
index 95db01e..a71c8c6 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -105,7 +105,7 @@ if [[ "${target_platform}" == "osx-arm64" ]]; then
   # See https://conda-forge.org/docs/maintainer/knowledge_base.html#newer-c-features-with-old-sdk
   export CXXFLAGS="${CXXFLAGS} -D_LIBCPP_DISABLE_AVAILABILITY"
 fi
-export TF_ENABLE_XLA=1
+export TF_ENABLE_XLA=0
 export BUILD_TARGET="//tensorflow/tools/pip_package:build_pip_package //tensorflow/tools/lib_package:libtensorflow //tensorflow:libtensorflow_cc${SHLIB_EXT}"

 # Python settings
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index 7fb9b6b..b31eb19 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -16,7 +16,7 @@ source:
     folder: tensorflow-estimator

 build:
-  number: 0
+  number: 1
   skip: true  # [win]
   skip: true  # [python_impl == 'pypy']
   skip: true  # [libabseil != '20220623.0']
https://anaconda.org/mark.harfouche/ but they somehow work worse. I can't get past the
Node: 'StatefulPartitionedCall_1'
Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

@drasmuss
Copy link
Author

Not sure if this helps, but I found that this is specifically triggered by the new optimizers that they made the default in TF 2.11.

If you use

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.SGD())

you get the error, but if you use

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.legacy.SGD())

no error.

@drasmuss
Copy link
Author

I opened an issue with the Keras team here keras-team/tf-keras#62, in case that yields any results.

@hmaarrfk
Copy link
Contributor

Do you get the same results if you install from their conda packages and not ours? Typically people don't like to debug conda-forge stuff.

@drasmuss
Copy link
Author

Following TensorFlow's recommended installation steps, i.e.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow

produces the same error.

The anaconda tensorflow package is still on 2.10, so can't test that.

@hmaarrfk
Copy link
Contributor

great thank you for confirming.

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

This is a long-standing problem with XLA needing ptxas. If you get ptxas from somewhere else, e.g., conda install -c nvidia cuda-nvcc, does your issue go away? It's the same issue with jax

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

I've been tracking this for a while. I think we don't get reports of this "bug" because people who use CUDA, usually have more than one installation and so somehow our tensorflow picks up all it needs from elsewhere if not available in conda-forge. In my experience, this is only ptxas, but it could be other things. An example is people who are on HPCs usually have native installations of cuda and ptxas is often part of that (not always, but one could always request it from admins).

The good news: a whole new way of dealing with cuda is coming to conda-forge (great!)
The bad news: it will likely take a long-ish time before that comes to fruition and there is a tendency for the nvidia team to work internally (e.g., qc, testing, etc.) before releasing stuff to the public (conda-forge)

@ngam ngam changed the title Cannot find libdevice in TF 2.11 Cannot find libdevice in TF 2.11 + compilation fails without ptxas Feb 13, 2023
@drasmuss
Copy link
Author

If you get ptxas from somewhere else, e.g., conda install -c nvidia cuda-nvcc, does your issue go away?

This doesn't make the initial libdevice error go away, but if you apply the hacky fix from here #296 (comment) then you no longer get that secondary Aborted ptxas-related error.

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

Yeah, we will need fix the libdevice issue separately

@hameer-spire
Copy link

I can confirm that installing cudatoolkit-dev from conda-forge and following #296 (comment) fixes the issue for me as well.

@sh-shahrokhi
Copy link

sh-shahrokhi commented Mar 24, 2023

I have this libdevice issue too. Fix is appreciated.

@Esokrates
Copy link

At least we see

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

with tensorflow-gpu 2.10 as well from conda-forge. Workaround was to create lib/nvvm/libdevice and copy lib/libdevice.10.bc over there, mamba install -y -c nvidia cuda-nvcc as well as export the XLA_FLAGS variable accordingly.

@hmaarrfk
Copy link
Contributor

hmm, i just hit this again. I was unable to "fix" it so I had to downgrade to tensorflow 2.13 for the moment, will revisit "soon"

@jakirkham
Copy link
Member

Thanks Mark for drawing my attention to this! 🙏

Think there is a structuring issue with NVVM in the cudatoolkit package. Have tried to outline this in issue: conda-forge/cudatoolkit-feedstock#96

Idk if just restructuring the NVVM contents is enough to fix the issue, but it is at least a required step

The CUDA 12 packages are better structured (and more complete). So it is possible using CUDA 12 will also fix the issue

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

I found a workaround for TF 2.14:

pip install nvidia-cuda-nvcc-cu11

This PyPI package contains libdevice.10.bc, and TensorFlow can find it correctly.

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

The CUDA 12 packages are better structured (and more complete). So it is possible using CUDA 12 will also fix the issue

@jakirkham Do you know what package includes NVVM files? It may need to be added to #353

@jakirkham
Copy link
Member

‎cuda-nvcc-tools contains part NVVM. The rest is in cuda-nvcc-impl

Though I think TensorFlow hasn't been rebuilt for CUDA 12 yet ( #354 )

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

Though I think TensorFlow hasn't been rebuilt for CUDA 12 yet ( #354 )

CUDA 12 migration was manually added by @xhochy in #353, in 21664ce.

@link89
Copy link

link89 commented Feb 2, 2024

pip install nvidia-cuda-nvcc-cu11

This work around works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants