Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DALI compatibility with TF2.11.0 #4529

Closed
pietroorlandi opened this issue Dec 20, 2022 · 3 comments
Closed

DALI compatibility with TF2.11.0 #4529

pietroorlandi opened this issue Dec 20, 2022 · 3 comments
Assignees
Labels
question Further information is requested TensorFlow TensorFlow compatibility

Comments

@pietroorlandi
Copy link

Hi, I wanted to ask if DALI is currently compatible with the latest version of TF2.11.0. I installed tensorflow via pip in a conda envinronment see here for more informations.
When I use tensorflow version 2.10.0 Nvidia DALI works perfectly but when I use version 2.11.0 it gives me this error when I do imports

NotFoundError                             Traceback (most recent call last)
Cell In [1], line 6
      4 import os
      5 from nvidia.dali import pipeline_def
----> 6 import nvidia.dali.plugin.tf as dali_tf
      7 import tensorflow.compat.v1 as tf_v1
      8 import logging

File ~/anaconda3/envs/tf-2.11.0/lib/python3.9/site-packages/nvidia/dali/plugin/tf.py:36
     32 from nvidia.dali_tf_plugin import dali_tf_plugin
     34 from collections.abc import Mapping, Iterable
---> 36 _dali_tf_module = dali_tf_plugin.load_dali_tf_plugin()
     37 _dali_tf = _dali_tf_module.dali
     38 _dali_tf.__doc__ = _dali_tf.__doc__ + """
     39 
     40     Please keep in mind that TensorFlow allocates almost all available device memory by default.
     41     This might cause errors in DALI due to insufficient memory. On how to change this behaviour
     42     please look into the TensorFlow documentation, as it may differ based on your use case.
     43 """

File ~/anaconda3/envs/tf-2.11.0/lib/python3.9/site-packages/nvidia/dali_tf_plugin/dali_tf_plugin.py:52, in load_dali_tf_plugin()
     50             first_error = error
     51 else:
---> 52     raise first_error or Exception(
     53         'No matching DALI plugin found for installed TensorFlow version')
     55 return _dali_tf_module

File ~/anaconda3/envs/tf-2.11.0/lib/python3.9/site-packages/nvidia/dali_tf_plugin/dali_tf_plugin.py:45, in load_dali_tf_plugin()
     43 for libdali_tf in processed_tf_plugins:
     44     try:
---> 45         _dali_tf_module = tf.load_op_library(libdali_tf)
     46         break
     47     # if plugin is not compatible skip it

File ~/anaconda3/envs/tf-2.11.0/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py:54, in load_op_library(library_filename)
     31 @tf_export('load_op_library')
     32 def load_op_library(library_filename):
     33   """Loads a TensorFlow plugin, containing custom ops and kernels.
     34 
     35   Pass "library_filename" to a platform-specific mechanism for dynamically
   (...)
     52     RuntimeError: when unable to load the library or get the python wrappers.
     53   """
---> 54   lib_handle = py_tf.TF_LoadLibrary(library_filename)
     55   try:
     56     wrappers = _pywrap_python_op_gen.GetPythonWrappers(
     57         py_tf.TF_GetOpList(lib_handle))

NotFoundError: /home/pietro/anaconda3/envs/tf-2.11.0/lib/python3.9/site-packages/nvidia/dali_tf_plugin/libdali_tf_2_10.so: undefined symbol: _ZNK10tensorflow4data11DatasetBase8FinalizeEPNS_15OpKernelContextESt8functionIFNS_8StatusOrISt10unique_ptrIS1_NS_4core15RefCountDeleterEEEEvEE

Is this error because Nvidia DALI is not yet compatible with TF2.11.0 or is it for some other reason?

@JanuszL JanuszL added the question Further information is requested label Dec 20, 2022
@JanuszL
Copy link
Contributor

JanuszL commented Dec 20, 2022

Hi @pietroorlandi,

DALI tries to support two latest TF versions. DALI 1.20 support 2.9 and 2.10 out of the box, in the case of other versions we try to compile the necessary TF plugin during installation. So when you change the TF version the best is to reinstall DALI TF plugin.

@pietroorlandi
Copy link
Author

pietroorlandi commented Dec 21, 2022

Thanks but it still doesn't work.
What I do is this:
starting from conda environment I create a new environment where I will install Tensorflow and follow this procedure to install TF2.11.0 (see here )
Once everything is installed and the conda environment is activated I try to install nvidia-dali and nvidia-dali-tf-plugin via the commands:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-tf-plugin-cuda110
The first command to install nvidia-dali works while the second one gives me an error:

  ERROR: Failed building wheel for nvidia-dali-tf-plugin-cuda110
  Running setup.py clean for nvidia-dali-tf-plugin-cuda110
Failed to build nvidia-dali-tf-plugin-cuda110
Installing collected packages: nvidia-dali-tf-plugin-cuda110
  Running setup.py install for nvidia-dali-tf-plugin-cuda110 ... error
  error: subprocess-exited-with-error```

@JanuszL
Copy link
Contributor

JanuszL commented Dec 21, 2022

Hi @pietroorlandi,

When the DALI plugin doesn't have the prebuild for the installed TensorFlow version, it tries to compile on the fly. So it requires a host compiler (C++) and cuda headers. We assume that cuda is installed in /usr/local/cuda if CUDA_HOME is not set.
What you can do:

  • install the nightly build as it already support TF 2.11.0
  • install cudatoolkit-dev and set CUDA_HOME, in my case this works in a clean ubuntu 22.04:
apt update && apt install wget libxml2 binutils g++ -y && \
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh && \
chmod a+x Miniconda3-py39_4.12.0-Linux-x86_64.sh && \
./Miniconda3-py39_4.12.0-Linux-x86_64.sh -b && \
export PATH=/root/miniconda3/condabin/:$PATH && \
conda create -n conda python=3.9 cudatoolkit=11.2 cudnn=8.1.0 cudatoolkit-dev=11.2 libxml2 -c conda-forge -y
source /root/miniconda3/bin/activate conda && \
export CUDA_HOME=/root/miniconda3/envs/conda && \
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ && \
python3 -m pip install tensorflow && \
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110 && \
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-tf-plugin-cuda110```

@klecki klecki added the TensorFlow TensorFlow compatibility label Apr 17, 2023
@JanuszL JanuszL closed this as completed Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested TensorFlow TensorFlow compatibility
Projects
None yet
Development

No branches or pull requests

3 participants