Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-gpu win64 import failure #6431

Open
seibert opened this issue Oct 5, 2017 · 27 comments
Open

tensorflow-gpu win64 import failure #6431

seibert opened this issue Oct 5, 2017 · 27 comments
Assignees

Comments

@seibert
Copy link

seibert commented Oct 5, 2017

The tensorflow-gpu package for win64 doesn't seem to import:

$ conda list tensorflow-gpu
# packages in environment at C:\Users\super\Miniconda3\envs\tftest:
#
tensorflow-gpu            1.1.0               np112py36_0

Here is the import error:

$ python -c 'import tensorflow'
Traceback (most recent call last):
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swt_helper
    return importlib.import_module(mname)
  File "C:\Users\super\Miniconda3\envs\tftest\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 978, in _gcd_import
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 648, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 560, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in <m
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swt_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
  File "C:\Users\super\Miniconda3\envs\tftest\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swt_helper
    return importlib.import_module(mname)
  File "C:\Users\super\Miniconda3\envs\tftest\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 978, in _gcd_import
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 648, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 560, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in <m
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\super\Miniconda3\envs\tftest\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swt_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
  File "C:\Users\super\Miniconda3\envs\tftest\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
(tftest)
super@numba-win64-gpu MINGW64 ~/Downloads
$
@seibert
Copy link
Author

seibert commented Oct 5, 2017

tagging @nehaljwani

@seibert
Copy link
Author

seibert commented Oct 5, 2017

OK, this seems to be due to the cudatoolkit libraries being installed into $CONDA_PREFIX/DLLs, but I don't see that directory in $PATH when I activate the environment. If I add that directory to my path, the import succeeds.

@nehaljwani
Copy link

nehaljwani commented Oct 5, 2017

Yes, if you look at the recipe for this (peek at the tarball), that location is added to PATH manually, before running the tests.

@mingwandroid @jjhelmus Should the DDLs be added to PATH by the tensorflow-gpu package or the cudatoolkit package?

@mingwandroid
Copy link

We already add far too many entries to PATH on Windows, the system with the shortest limits and biggest issues around PATH!

Can we move the DLLs to somewhere that is on PATH instead?

@seibert
Copy link
Author

seibert commented Oct 5, 2017

Yes, I think we can move things to a different location. I don't know why our DLLs were going into the DLLs directory in the first place. That said, I'm seeing other things in that directory:

super@numba-win64-gpu MINGW64 ~/Miniconda3/envs/tftest
$ ls DLLs
_asyncio.pyd*          _testimportmultiple.pyd*    nvrtc64_80.dll*
_bz2.pyd*              _testmultiphase.pyd*        nvrtc-builtins64_80.dll*
_ctypes.pyd*           _tkinter.pyd*               nvvm64_31_0.dll*
_ctypes_test.pyd*      cublas64_80.dll*            py.ico
_decimal.pyd*          cudart64_80.dll*            pyc.ico
_elementtree.pyd*      cufft64_80.dll*             pyd.ico
_hashlib.pyd*          cupti64_80.dll*             pyexpat.pyd*
_lzma.pyd*             curand64_80.dll*            python_lib.cat
_msi.pyd*              cusolver64_80.dll*          python_tools.cat
_multiprocessing.pyd*  cusparse64_80.dll*          select.pyd*
_overlapped.pyd*       libdevice.compute_20.10.bc  sqlite3.dll*
_socket.pyd*           libdevice.compute_30.10.bc  tcl86t.dll*
_sqlite3.pyd*          libdevice.compute_35.10.bc  tk86t.dll*
_ssl.pyd*              libdevice.compute_50.10.bc  unicodedata.pyd*
_testbuffer.pyd*       nppc64_80.dll*              winsound.pyd*
_testcapi.pyd*         nppi64_80.dll*
_testconsole.pyd*      npps64_80.dll*

Is there some rpath thing that should be happening with the tensorflow .pyd files so that we don't need any of these directories on the path?

@mingwandroid
Copy link

Ah yeah, does Python add that to PATH on Windows at runtime? @jjhelmus do you know? @msarahan?

Windows has no rpath thing at all unfortunately.

@seibert
Copy link
Author

seibert commented Oct 5, 2017

Also, for my reference, what is the canonical installation directory for DLLs in a conda environment? Looking at the environment I just made, I see: lib, libs, library, dlls, and files directly in $CONDA_PREFIX.

@seibert
Copy link
Author

seibert commented Oct 5, 2017

Looking at the Numba code, we specifically look for the CUDA libraries in the DLLs directory on Windows (that code is 4 years old!), so there might also be some drift in Windows packaging conventions over that time. :)

@jjhelmus
Copy link
Contributor

jjhelmus commented Oct 5, 2017

Python is putting libraries into the DLLs folder as that is the folder name from the Python.org installer.

@mingwandroid
Copy link

There are 5 entries that conda adds to PATH:

%CONDA_PREFIX%
%CONDA_PREFIX%\Library\bin
%CONDA_PREFIX%\Scripts
%CONDA_PREFIX%\usr\bin
%CONDA_PREFIX%\mingw-64\bin

We do not add DLLs in conda's activation though, where does that happen @jjhelmus?

@mingwandroid
Copy link

For MSVC-compiled DLLs, I'd say Library\bin is the right place.

@seibert
Copy link
Author

seibert commented Oct 5, 2017

Are symlinks a thing on Windows? Because Numba has been looking in DLLs for so long, we would need to add additional search paths to Numba, release it, and get them out there before we could push updates to the cudatoolkit package. However, if we could install to Library\bin and then symlink to DLLs, the migration could happen over some amount of time.

@jjhelmus
Copy link
Contributor

jjhelmus commented Oct 5, 2017

$PREFIX\DLLs gets added to the Python search path by Python itself which is why the various compiled extensions in the standard library are importable. The libraries in that directory are not on the PATH otherwise from what I can tell.

@mingwandroid
Copy link

Yeah, it ends up in sys.path from:
https://github.com/python/cpython/blob/fc1bf872e9d31f3e837f686210f94e57ad3d6582/PC/pyconfig.h#L71
via:
https://github.com/python/cpython/blob/36c1d1f1e52ba54007cbecb42c5599e5ff62aa52/Python/sysmodule.c#L2269
I guess sys.path gets fed into LoadLibrary eventually.

This reminds me, we need to stop applying this old patch: https://github.com/AnacondaRecipes/python-feedstock/blob/master/recipe/0005-Win32-Ensure-Library-bin-is-in-os.environ-PATH.patch

Maybe we can route it through sys.path / pyconfig.h instead?

The reason we should not apply this patch is because it prevents conda's activate / deactivate from working correctly since it adds an extra entry to PATH that conda is not expecting.

@jjhelmus
Copy link
Contributor

jjhelmus commented Oct 5, 2017

It looks like the DLLs folder is added to sys.path set via a registry key. Specifically HKEY_LOCAL_MACHINE
or HKEY_LOCAL_USER \SOFTWARE\Python\PythonCore\3.6\PythonPath

@mingwandroid
Copy link

mingwandroid commented Oct 5, 2017

We don't set that registry key, so I think we're falling back to:

If the Python Home cannot be located, no PYTHONPATH is specified in the environment, and no registry entries can be found, a default path with relative entries is used (e.g. .\Lib;.\plat-win, etc).

.. from pyconfig.h

@bencherian
Copy link

bencherian commented Oct 6, 2017

Did this recently change? I want to make a package that has internal DLLs that in turn depend on CUDA DLLs so adding it to the path via sys.path doesn't work and packaging my own copy seems pretty wasteful.

@jjhelmus
Copy link
Contributor

jjhelmus commented Oct 6, 2017

AFAIK the DLLs directory has never been included on the PATH, although it is possible that a package is adding it via an activation script.

I don't think cudatoolkit should be putting file in that directory, Library\bin is a better location. Given that the current packages do place the files here we will have to determine a good transition to the new location.

@seibert
Copy link
Author

seibert commented Oct 6, 2017

The reason this issue never came up before was that the only users of the cudatoolkit packages in Anaconda were Numba and Accelerate (now Pyculib). Both dynamically loaded the cuda DLLs in their Python code at runtime using an explicit search path (defined inside Numba) because the packages needed to work both with and without CUDA support being present.

It's clear now (4 years later) that the DLLs from cudatoolkit were put in the $CONDA_PREFIX\DLLs\ erroneously, probably based on seeing that was where Python was putting DLLs. If there is a way to create symbolic links (or some kind of link) on Windows between two locations, we can update the cudatoolkit packages to install to Library\bin and add library symlinks for compatibility in the DLLs\ directory while we update Numba and pyculib to look in the correct location for their libraries going forward.

Are links an option?

@bencherian
Copy link

Symlinks are not a viable option on Windows because they often require administrative privileges for creation. Is there any reason hardlinks won't work? (Will they not be handled properly by conda build?)

@seibert
Copy link
Author

seibert commented Oct 6, 2017

The tar format (and apparently Python's tar module) should handle hard links, but I don't know if there are some Windows nuances to worry about, or if hardlinks in a conda package tar file will cause trouble elsewhere in the system.

@bencherian
Copy link

bencherian commented Oct 6, 2017

It looks like hardlinks aren't handled properly on Windows.

According to the Python documentation:

If dereference is False, add symbolic and hard links to the archive. If it is True, add the content of the target files to the archive. This has no effect on systems that do not support symbolic links.

I'm guessing Windows is considered to be a system that does not support symbolic links.

I just tried creating a cudatoolkit package with hardlinks to the DLLs in $CONDA_PREFIX\DLLs\ which are created with mklink /h in bld.bat. Unfortunately this results in two copies of the DLLs being put in the tarball, which becomes twice the size of the original package (and presumably takes up twice as much space upon extraction).

@seibert
Copy link
Author

seibert commented Oct 9, 2017

OK, given that all the straightforward solutions are not available, is it worth considering a cudatoolkit package that has a post-link script to create the required hard links? (And corresponding pre-unlink script to remove them.)

I know such scripts are discouraged, but this seems like the only option, short of breaking things for existing Numba installation if the user happens to install a newer cudatoolkit.

@bencherian
Copy link

bencherian commented Oct 9, 2017

This seems like a reasonable solution to me...at the same time would it also make sense to just make cudatoolkit that only contains this post-link script and depends on subpackages that only contain specific CUDA libraries. This would save a lot of space for situations where we don't need specific libraries (e.g., cufft and NPP) and also allow you to avoid post-installation hardlinks unless you needed them for Numba. Are the cudatoolkit build recipes internal? I'm not sure how straightforward it would be to split the package up...

@nehaljwani
Copy link

The cuda recipes are present here: https://github.com/numba/conda-recipe-cudatoolkit/

@seibert
Copy link
Author

seibert commented Oct 9, 2017

OK, I'll talk to Stuart about this (the splitting up was something I think he proposed at one point). We'll put something together and get back to you.

@Mahdi-Moalla
Copy link

Hello,
I am obtaining this error with the new anaconda update (5.1.0). Did anyone experience it?
Does anyone have a solution for it?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants