Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] pyarrow wheel: DLL load failed when importing on windows #22421

Closed
asfimport opened this issue Jul 23, 2019 · 53 comments
Closed

[Python] pyarrow wheel: DLL load failed when importing on windows #22421

asfimport opened this issue Jul 23, 2019 · 53 comments

Comments

@asfimport
Copy link

asfimport commented Jul 23, 2019

When installing pyarrow 0.14.1 on windows 10 x64 with python 3.7, you get:

import pyarrow
Traceback (most recent call last):
  File "", line 1, in
  File "C:\Python37\lib\site-packages\pyarrow_init_.py", line 49, in
    from pyarrow.lib import cpu_count, set_cpu_count
  ImportError: DLL load failed: The specified module could not be found.

On 0.14.0 everything works fine.

Reporter: Ruslan Kuprieiev / @efiop
Assignee: Krisztian Szucs / @kszucs

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-6015. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
@efiop How did you install 0.14.1? Via conda or via pip? Seems like via pip.

@asfimport
Copy link
Author

Ruslan Kuprieiev / @efiop:
@kszucs Pip that comes with vanilla official Python for windows.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
I suggest you to use 0.14 or conda until it gets resolved.

@asfimport
Copy link
Author

Ruslan Kuprieiev / @efiop:
@kszucs Yep, did that already :) Thanks!

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
I suspect that the missing dso is zlib which is not bundled anymore with the wheels https://github.com/apache/arrow/pull/4886/files#diff-8cf6167d58ce775a08acafcfe6f40966L388 and it is linked dynamically instead of the intended static linkage https://github.com/apache/arrow/pull/4886/files#diff-647dde013daa22cab04c2707a6f611e5R57.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
We can also rebuild the windows wheels, but don't know how to *force* static zlib linkage. cc @pitrou

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I don't know either. On manylinux we hack around this by removing the zlib.so.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
I'm trying to rebuild the windows wheels with bundled zlib.dll https://github.com/ursa-labs/crossbow/branches/all?query=build-669

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
The produced wheels are going to be available at the following links:

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This is definitely sad. Do we need to remove the wheels from PyPI? I don't think we should do a 0.14.2 release to fix this

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
Agree with the removal of 0.14.1 windows wheels. I don't have access to do that though.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
You're listed as a maintainer on https://pypi.org/project/pyarrow/, you should be able to remove them in the web UI if you are logged in

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Please also remove the 0.14.0 Windows wheels

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
Why should we remove the 0.14.0 wheels?

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Surely they have the exact same problem, unless a patch was cherry-picked that altered the behavior?

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
The patch was cherry picked, so this issue doesn't affect the 0.14 wheels.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I see. I'm surprised that befd7df caused the libraries to have a dependency on zlib.dll. In theory -DZLIB_SOURCE=BUNDLED should result in a statically-linked version

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
Thought the same. Apparently cmake picks up the dynamic library if it is locatable, no matter of zlib_SOURCE. We'd need a way to force static linkage.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Right. CMake does not have a way of saying "prefer the static lib if present, but fall back on the dynamic lib otherwise".

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I confirm that the 3.6 wheel seems to load zlib.dll from the PyArrow install.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I confirm that the 3.6 wheel seems to load zlib.dll from the PyArrow install.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
But we don't really have any way to publish is, right?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I don't think so?

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
I'm afraid the same issue affects the OSX wheels:

libarrow.14.dylib:
        @rpath/libarrow.14.dylib (compatibility version 14.0.0, current version 14.1.0)
        /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)
        @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libarrow_boost_filesystem.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 307.5.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.50.2)

although they will work most of the cases because the same problem was present in the previous wheels, we linked the same way in 0.14.0:

libarrow.14.dylib:
        @rpath/libarrow.14.dylib (compatibility version 14.0.0, current version 14.0.0)
        /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/local/opt/openssl/lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)
        @rpath/libarrow_boost_system.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libarrow_boost_filesystem.dylib (compatibility version 0.0.0, current version 0.0.0)
        @rpath/libarrow_boost_regex.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 307.5.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.50.2)

This problem should have been captured automatically by https://github.com/apache/arrow/blob/master/dev/tasks/python-wheels/travis.osx.yml#L77
But has travis swallowed the following errors:

�[0K$ sudo find /usr -name libz.* -delete

find: -delete: unlink(/usr/lib/libz.1.1.3.dylib): Operation not permitted

find: -delete: unlink(/usr/lib/libz.1.2.5.dylib): Operation not permitted

find: -delete: unlink(/usr/lib/libz.1.2.8.dylib): Operation not permitted

find: -delete: unlink(/usr/lib/libz.1.dylib): Operation not permitted

find: -delete: unlink(/usr/lib/libz.dylib): Operation not permitted

Which is only available after loading the whole raw log: https://api.travis-ci.org/v3/job/559681560/log.txt

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
This is extremely annoying, I can revert the windows and OSX parts of #4886 to bundle zlib DSO.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
You mean the zlib isn't always available on macOS?

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
Since OSX mojave it is not shipped by default, might just be the headers though.

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I cannot reproduce this issue on my Windows 10 environment by using two pythons (conda and python) with this whl
Do I miss something to reproduce this failure?

$ wget https://www.python.org/ftp/python/3.7.4/python-3.7.4-embed-amd64.zip
$ unzip python-3.7.4-embed-amd64.zip
$ cd python-3.7.4-embed-amd64
$ wget https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py
$ wget pyarrow-0.14.1-cp37-cp37m-win_amd64.whl
$ python -m pip install pyarrow-0.14.1-cp37-cp37m-win_amd64.whl
...
Successfully installed numpy-1.17.1 pyarrow-0.14.1 six-1.12.0
$ python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> print (pyarrow.cpu_count())
4
>>>
$ activate arrow-dev
$ wget pyarrow-0.14.1-cp37-cp37m-win_amd64.whl
$ pip install pyarrow-0.14.1-cp37-cp37m-win_amd64.whl
...
Installing collected packages: pyarrow
Successfully installed pyarrow-0.14.1
>python
Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> print (pyarrow.cpu_count())
4
>>>

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
This can of issue depends on which DLLs are already installed on your system. So if the wheel is missing e.g. some compression libraries (such as zstd or brotli) but you have them on your system already, the wheel will work fine for you. This is also what makes it more difficult to ensure that Windows wheels are correctly generated...

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I see. Thank you for your quick response. It looks more complex.

Have we already identified which libraries are missed when this failure occurs? Or, haven't we identified yet?

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I believe that I identified how to fix this issue.

To install Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. from https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads avoids this error.
I think that this problem does not occur with conda. This problem occurs with only pip.

The following is my validation step. If someone double-checks it, we would appreciate it.

// Install Windows10 enterprise (no additional application is installed)
> mkdir c:\pyarrow
> cd c:\pyarrow
> bitsadmin /TRANSFER htmlget https://www.python.org/ftp/python/3.7.4/python-3.7.4-embed-amd64.zip c:\pyarrow\python-3.7.4-embed-amd64.zip
extract all python-3.7.4-embed-amd64.zip to c:\pyarrow\python-3.7.4-embed-amd64 from Explorer
> cd python-3.7.4-embed-amd64
notepad python37._pth
...
#import site <=== remove # in this line
> type python37._pth
python37.zip
.
# Uncomment to run site.main() automatically
import site

> bitsadmin /TRANSFER htmlget https://bootstrap.pypa.io/get-pip.py c:\pyarrow\python-3.7.4-embed-amd64\get-pip.py
> python get-pip.py
...
Successfully installed pip-19.2.3 setuptools-41.2.0 wheel-0.33.6
> python -m pip install pyarrow
Collecting pyarrow
Downloading https://files.pythonhosted.org/packages/97/7c/0ea4554d64c6ed3d6d4f8da492df287d2496adbab2b35c01433cf1344521/pyarrow-0.14.0-cp37-cp37m-win_amd64.whl (17.4MB)
...
Collecting numpy>=1.14 (from pyarrow)
Downloading https://files.pythonhosted.org/packages/cb/41/05fbf6944b098eb9d53e8a29a9dbfa20a7448f3254fb71499746a29a1b2d/numpy-1.17.1-cp37-cp37m-win_amd64.whl (12.8MB)|
...
Collecting six>=1.0.0 (from pyarrow)
Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Installing collected packages: numpy, six, pyarrow
WARNING: The script f2py.exe is installed in 'C:\pyarrow\python-3.7.4-embed-amd64\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script plasma_store.exe is installed in 'C:\pyarrow\python-3.7.4-embed-amd64\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-1.17.1 pyarrow-0.14.0 six-1.12.0
> python -c "import pyarrow"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\pyarrow\python-3.7.4-embed-amd64\lib\site-packages\pyarrow__init__.py", line 49, in <module>
from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified module could not be found.
> python -m pip freeze
numpy==1.17.1
pyarrow==0.14.0
six==1.12.0
> dir Lib\site-packages\pyarrow
Volume in drive C is OS
Volume Serial Number is 1234-5678|

Directory of C:\pyarrow\python-3.7.4-embed-amd64\Lib\site-packages\pyarrow

08/31/2019 05:42 AM <DIR> .
08/31/2019 05:42 AM <DIR> ..
08/31/2019 05:42 AM 47,658 array.pxi
08/31/2019 05:42 AM 5,748,736 arrow.dll
08/31/2019 05:42 AM 1,653,120 arrow.lib
08/31/2019 05:42 AM 1,795,072 arrow_flight.dll
08/31/2019 05:42 AM 121,062 arrow_flight.lib
08/31/2019 05:42 AM 910,848 arrow_python.dll
08/31/2019 05:42 AM 119,994 arrow_python.lib
08/31/2019 05:42 AM 869 benchmark.pxi
08/31/2019 05:42 AM 895 benchmark.py
08/31/2019 05:42 AM 2,774 builder.pxi
08/31/2019 05:42 AM 81,920 cares.dll
08/31/2019 05:42 AM 3,691 compat.py
08/31/2019 05:42 AM 911 csv.py
08/31/2019 05:42 AM 1,126 cuda.py
08/31/2019 05:42 AM 3,161 error.pxi
08/31/2019 05:42 AM 4,026 feather.pxi
08/31/2019 05:42 AM 7,291 feather.py
08/31/2019 05:42 AM 12,472 filesystem.py
08/31/2019 05:42 AM 1,286 flight.py
08/31/2019 05:42 AM 186,880 gandiva.cp37-win_amd64.pyd
08/31/2019 05:42 AM 791,664 gandiva.cpp
08/31/2019 05:42 AM 22,094,848 gandiva.dll
08/31/2019 05:42 AM 305,626 gandiva.lib
08/31/2019 05:42 AM 16,553 gandiva.pyx
08/31/2019 05:42 AM 7,032 hdfs.py
08/31/2019 05:42 AM <DIR> include
08/31/2019 05:42 AM <DIR> includes
08/31/2019 05:42 AM 13,995 io-hdfs.pxi
08/31/2019 05:42 AM 48,879 io.pxi
08/31/2019 05:42 AM 15,981 ipc.pxi
08/31/2019 05:42 AM 6,178 ipc.py
08/31/2019 05:42 AM 897 json.py
08/31/2019 05:42 AM 8,623 jvm.py
08/31/2019 05:42 AM 1,553,408 lib.cp37-win_amd64.pyd
08/31/2019 05:42 AM 6,756,155 lib.cpp
08/31/2019 05:42 AM 10,652 lib.pxd
08/31/2019 05:42 AM 3,570 lib.pyx
08/31/2019 05:42 AM 3,243,008 libcrypto-1_1-x64.dll
08/31/2019 05:42 AM 2,613,248 libprotobuf.dll
08/31/2019 05:42 AM 650,240 libssl-1_1-x64.dll
08/31/2019 05:42 AM 13,435 lib_api.h
08/31/2019 05:42 AM 4,724 memory.pxi
08/31/2019 05:42 AM 4,912 orc.py
08/31/2019 05:42 AM 5,789 pandas-shim.pxi
08/31/2019 05:42 AM 33,456 pandas_compat.py
08/31/2019 05:42 AM 1,789,952 parquet.dll
08/31/2019 05:42 AM 346,864 parquet.lib
08/31/2019 05:42 AM 52,331 parquet.py
08/31/2019 05:42 AM 5,780 plasma.py
08/31/2019 05:42 AM 8,778 public-api.pxi
08/31/2019 05:42 AM 23,060 scalar.pxi
08/31/2019 05:42 AM 15,427 serialization.pxi
08/31/2019 05:42 AM 12,588 serialization.py
08/31/2019 05:42 AM 46,760 table.pxi
08/31/2019 05:42 AM <DIR> tensorflow
08/31/2019 05:42 AM <DIR> tests
08/31/2019 05:42 AM 48,149 types.pxi
08/31/2019 05:42 AM 6,609 types.py
08/31/2019 05:42 AM 3,549 util.py
08/31/2019 05:42 AM 89,600 zlib.dll
08/31/2019 05:42 AM 106,496 _csv.cp37-win_amd64.pyd
08/31/2019 05:42 AM 493,978 _csv.cpp
08/31/2019 05:42 AM 14,861 _csv.pyx
08/31/2019 05:42 AM 1,934 _cuda.pxd
08/31/2019 05:42 AM 34,567 _cuda.pyx
08/31/2019 05:42 AM 346,112 _flight.cp37-win_amd64.pyd
08/31/2019 05:42 AM 1,518,124 _flight.cpp
08/31/2019 05:42 AM 46,504 _flight.pyx
08/31/2019 05:42 AM 121 _generated_version.py
08/31/2019 05:42 AM 52,736 _json.cp37-win_amd64.pyd
08/31/2019 05:42 AM 311,759 _json.cpp
08/31/2019 05:42 AM 6,413 _json.pyx
08/31/2019 05:42 AM 2,156 _orc.pxd
08/31/2019 05:42 AM 3,670 _orc.pyx
08/31/2019 05:42 AM 281,600 _parquet.cp37-win_amd64.pyd
08/31/2019 05:42 AM 1,352,623 _parquet.cpp
08/31/2019 05:42 AM 17,061 _parquet.pxd
08/31/2019 05:42 AM 44,057 _parquet.pyx
08/31/2019 05:42 AM 27,524 _plasma.pyx
08/31/2019 05:42 AM 1,749 __init__.pxd
08/31/2019 05:42 AM 10,564 __init__.py
08/31/2019 05:42 AM <DIR> __pycache__
77 File(s) 56,030,721 bytes
7 Dir(s) 22,981,132,288 bytes free

// The following two steps installs Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019
> bitsadmin /TRANSFER htmlget https://aka.ms/vs/16/release/vc_redist.x64.exe c:\pyarrow\vc_redist.x64.exe
> ..\vc_redist.x64.exe

> python -c "import pyarrow"
> python -c "import pyarrow ; print (pyarrow.cpu_count())"
4
> python -m pip freeze
numpy==1.17.1
pyarrow==0.14.0
six==1.12.0
> dir Lib\site-packages\pyarrow
Volume in drive C is OS
Volume Serial Number is 1234-5678

Directory of C:\pyarrow\python-3.7.4-embed-amd64\Lib\site-packages\pyarrow

08/31/2019 05:42 AM <DIR> .
08/31/2019 05:42 AM <DIR> ..
08/31/2019 05:42 AM 47,658 array.pxi
08/31/2019 05:42 AM 5,748,736 arrow.dll
08/31/2019 05:42 AM 1,653,120 arrow.lib
08/31/2019 05:42 AM 1,795,072 arrow_flight.dll
08/31/2019 05:42 AM 121,062 arrow_flight.lib
08/31/2019 05:42 AM 910,848 arrow_python.dll
08/31/2019 05:42 AM 119,994 arrow_python.lib
08/31/2019 05:42 AM 869 benchmark.pxi
08/31/2019 05:42 AM 895 benchmark.py
08/31/2019 05:42 AM 2,774 builder.pxi
08/31/2019 05:42 AM 81,920 cares.dll
08/31/2019 05:42 AM 3,691 compat.py
08/31/2019 05:42 AM 911 csv.py
08/31/2019 05:42 AM 1,126 cuda.py
08/31/2019 05:42 AM 3,161 error.pxi
08/31/2019 05:42 AM 4,026 feather.pxi
08/31/2019 05:42 AM 7,291 feather.py
08/31/2019 05:42 AM 12,472 filesystem.py
08/31/2019 05:42 AM 1,286 flight.py
08/31/2019 05:42 AM 186,880 gandiva.cp37-win_amd64.pyd
08/31/2019 05:42 AM 791,664 gandiva.cpp
08/31/2019 05:42 AM 22,094,848 gandiva.dll
08/31/2019 05:42 AM 305,626 gandiva.lib
08/31/2019 05:42 AM 16,553 gandiva.pyx
08/31/2019 05:42 AM 7,032 hdfs.py
08/31/2019 05:42 AM <DIR> include
08/31/2019 05:42 AM <DIR> includes
08/31/2019 05:42 AM 13,995 io-hdfs.pxi
08/31/2019 05:42 AM 48,879 io.pxi
08/31/2019 05:42 AM 15,981 ipc.pxi
08/31/2019 05:42 AM 6,178 ipc.py
08/31/2019 05:42 AM 897 json.py
08/31/2019 05:42 AM 8,623 jvm.py
08/31/2019 05:42 AM 1,553,408 lib.cp37-win_amd64.pyd
08/31/2019 05:42 AM 6,756,155 lib.cpp
08/31/2019 05:42 AM 10,652 lib.pxd
08/31/2019 05:42 AM 3,570 lib.pyx
08/31/2019 05:42 AM 3,243,008 libcrypto-1_1-x64.dll
08/31/2019 05:42 AM 2,613,248 libprotobuf.dll
08/31/2019 05:42 AM 650,240 libssl-1_1-x64.dll
08/31/2019 05:42 AM 13,435 lib_api.h
08/31/2019 05:42 AM 4,724 memory.pxi
08/31/2019 05:42 AM 4,912 orc.py
08/31/2019 05:42 AM 5,789 pandas-shim.pxi
08/31/2019 05:42 AM 33,456 pandas_compat.py
08/31/2019 05:42 AM 1,789,952 parquet.dll
08/31/2019 05:42 AM 346,864 parquet.lib
08/31/2019 05:42 AM 52,331 parquet.py
08/31/2019 05:42 AM 5,780 plasma.py
08/31/2019 05:42 AM 8,778 public-api.pxi
08/31/2019 05:42 AM 23,060 scalar.pxi
08/31/2019 05:42 AM 15,427 serialization.pxi
08/31/2019 05:42 AM 12,588 serialization.py
08/31/2019 05:42 AM 46,760 table.pxi
08/31/2019 05:42 AM <DIR> tensorflow
08/31/2019 05:42 AM <DIR> tests
08/31/2019 05:42 AM 48,149 types.pxi
08/31/2019 05:42 AM 6,609 types.py
08/31/2019 05:42 AM 3,549 util.py
08/31/2019 05:42 AM 89,600 zlib.dll
08/31/2019 05:42 AM 106,496 _csv.cp37-win_amd64.pyd
08/31/2019 05:42 AM 493,978 _csv.cpp
08/31/2019 05:42 AM 14,861 _csv.pyx
08/31/2019 05:42 AM 1,934 _cuda.pxd
08/31/2019 05:42 AM 34,567 _cuda.pyx
08/31/2019 05:42 AM 346,112 _flight.cp37-win_amd64.pyd
08/31/2019 05:42 AM 1,518,124 _flight.cpp
08/31/2019 05:42 AM 46,504 _flight.pyx
08/31/2019 05:42 AM 121 _generated_version.py
08/31/2019 05:42 AM 52,736 _json.cp37-win_amd64.pyd
08/31/2019 05:42 AM 311,759 _json.cpp
08/31/2019 05:42 AM 6,413 _json.pyx
08/31/2019 05:42 AM 2,156 _orc.pxd
08/31/2019 05:42 AM 3,670 _orc.pyx
08/31/2019 05:42 AM 281,600 _parquet.cp37-win_amd64.pyd
08/31/2019 05:42 AM 1,352,623 _parquet.cpp
08/31/2019 05:42 AM 17,061 _parquet.pxd
08/31/2019 05:42 AM 44,057 _parquet.pyx
08/31/2019 05:42 AM 27,524 _plasma.pyx
08/31/2019 05:42 AM 1,749 __init__.pxd
08/31/2019 05:42 AM 10,564 __init__.py
08/31/2019 05:42 AM <DIR> __pycache__
77 File(s) 56,030,721 bytes
7 Dir(s) 22,936,629,248 bytes free
>

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
If I need to test other configurations, I can test them.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
@kiszk How do you know this would fix the issue, if you didn't manage to reproduce it before?

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
At first, I cannot reproduce this issue on my Windows 10 notebook where I have installed multiple applications.

Then, when I prepared a new fresh Windows 10 instance where I have not installed any software and ran pyarrow on it, I noticed that I can reproduce this issue.
After I installed some pip modules and installed utility software to the instance, I realized that this issue does not occur.

Next, I created another Windows 10 instance to investigate what step can avoid this issue. I checked step by step again. As a result, I found that installing Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019 can avoid this issue.

Is this an answer to your question?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:

Is this an answer to your question?

Of course, thank you. So it seems the fix (if someone wants to fix the issue and produce reliable Windows wheels for PyArrow) should be to bundle the CRT DLLs with the wheel. It used to be simple as there were just two such DLLs (msvcrt and msvcp). Nowadays, there are several of them.

Alternatively, just mention that people have to install the redistributables separately. They're useful for a ton of other software anyway.

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I agree with you. There are two possible solutions:

  1. Bundle the CRT DLLs (probably msv*dll)

  2. Ask users to install the redistributables by writing it in the document

    I think that Miniconda3 takes the first solution. Since I can find msvcp140*.dll under multiple directories in the miniconda directory.

    If we will take the first solution, I will be able to identify which DLLs are required.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I don't know that installing the redistributable runtime is an acceptable solution. Isn't the CRT distributed with Python 3.7 on Windows? I am curious how other Python wheels (e.g. PyTorch, TensorFlow) address this issue. It might be worth asking them.

Potentially there is a problem with our build environment that is introducing a dependency on a version of the CRT newer than the one distributed with Python from python.org

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:

Isn't the CRT distributed with Python 3.7 on Windows?

I don't know. Since the issue popped up on Windows 10 I suppose something is missing. I see two possible explanations:

  1. Python bundles the libraries that it uses itself, but we need more of them. For example, Python may not bundle the C++ parts (the Windows equivalent of libstdc++).
  2. Python bundles the libraries, but privately, so they are not automatically visible by other software, even Python libraries.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I see, it's plausible then that something is missing then. I took a brief look at the TensorFlow wheels and it looks like they are statically linking everything (including probably the CRT) in a single pyd file

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I don't know. I thought we were bundling the CRT with Numba or llvmlite wheels but I see that's not the case and I cannot find the trace of it in past versions. So perhaps I was mistaken.

The most annoying problem here is probably the obscure error message. If only Microsoft invested a tiny bit of their revenue to improve quality of life for end users and third-party developers...

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Historical note: apparently I originally encountered the issue with Conda packages, not Python wheels. That wouldn't happen today anymore :) . See ContinuumIO/anaconda-issues#202

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I see, it makes sense then to bundle the DLLs.

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
As far as I know, the CRT is not distributed with Python 3.7 on Windows.

> cd \python-3.7.4-embed-amd64
> \cygwin64\bin\find . -name "msv*"
./Lib/site-packages/numpy/distutils/msvc9compiler.py
./Lib/site-packages/numpy/distutils/msvccompiler.py
./Lib/site-packages/numpy/distutils/__pycache__/msvc9compiler.cpython-37.pyc
./Lib/site-packages/numpy/distutils/__pycache__/msvccompiler.cpython-37.pyc
./Lib/site-packages/setuptools/msvc.py
./Lib/site-packages/setuptools/__pycache__/msvc.cpython-37.pyc

> cd \ProgramData\Miniconda3
>\cygwin64\bin\find . -name "msv*"
./Lib/distutils/msvc9compiler.py
./Lib/distutils/msvccompiler.py
./Lib/distutils/__pycache__/msvc9compiler.cpython-37.pyc
./Lib/distutils/__pycache__/msvccompiler.cpython-37.pyc
./Lib/site-packages/setuptools/msvc.py
./Lib/site-packages/setuptools/__pycache__/msvc.cpython-37.pyc
./Library/bin/msvcp140.dll
./Library/bin/msvcp140_1.dll
./Library/bin/msvcp140_2.dll
./msvcp140.dll
./msvcp140_1.dll
./msvcp140_2.dll
./pkgs/python-3.7.3-h8c8aaf0_1/Lib/distutils/msvc9compiler.py
./pkgs/python-3.7.3-h8c8aaf0_1/Lib/distutils/msvccompiler.py
./pkgs/python-3.7.3-h8c8aaf0_1/Lib/distutils/__pycache__/msvc9compiler.cpython-37.pyc
./pkgs/python-3.7.3-h8c8aaf0_1/Lib/distutils/__pycache__/msvccompiler.cpython-37.pyc
./pkgs/setuptools-41.0.1-py37_0/Lib/site-packages/setuptools/msvc.py
./pkgs/setuptools-41.0.1-py37_0/Lib/site-packages/setuptools/__pycache__/msvc.cpython-37.pyc
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/Library/bin/msvcp140.dll
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/Library/bin/msvcp140_1.dll
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/Library/bin/msvcp140_2.dll
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/msvcp140.dll
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/msvcp140_1.dll
./pkgs/vs2015_runtime-14.15.26706-h3a45250_4/msvcp140_2.dll

In addition to that, I remember that conda automatically installs vs2015_runtime package, too although I am not sure which package has a dependency on vs2015_runtime package.

> conda install pyarrow -c conda-forge
...
The following NEW packages will be INSTALLED:

  arrow-cpp          conda-forge/win-64::arrow-cpp-0.14.1-py37h1b0c03e_0
  boost-cpp          conda-forge/win-64::boost-cpp-1.70.0-h6a4c333_2
  brotli             conda-forge/win-64::brotli-1.0.7-he025d50_1000
  c-ares             conda-forge/win-64::c-ares-1.15.0-h2fa13f4_1001
  ca-certificates    conda-forge/win-64::ca-certificates-2019.6.16-hecc5488_0
  certifi            conda-forge/win-64::certifi-2019.6.16-py37_1
  double-conversion  conda-forge/win-64::double-conversion-3.1.5-h6538335_1
  gflags             conda-forge/win-64::gflags-2.2.2-he025d50_1001
  glog               conda-forge/win-64::glog-0.4.0-he025d50_1
  grpc-cpp           conda-forge/win-64::grpc-cpp-1.23.0-h4d7d3fa_0
  intel-openmp       pkgs/main/win-64::intel-openmp-2019.4-245
  libblas            conda-forge/win-64::libblas-3.8.0-12_mkl
  libcblas           conda-forge/win-64::libcblas-3.8.0-12_mkl
  liblapack          conda-forge/win-64::liblapack-3.8.0-12_mkl
  libprotobuf        conda-forge/win-64::libprotobuf-3.8.0-h1a1b453_0
  lz4-c              conda-forge/win-64::lz4-c-1.8.3-he025d50_1001
  mkl                pkgs/main/win-64::mkl-2019.4-245
  numpy              conda-forge/win-64::numpy-1.17.1-py37hc71023c_0
  openssl            conda-forge/win-64::openssl-1.1.1c-hfa6e2cd_0
  pandas             conda-forge/win-64::pandas-0.25.1-py37he350917_0
  parquet-cpp        conda-forge/noarch::parquet-cpp-1.5.1-2
  pip                conda-forge/win-64::pip-19.2.3-py37_0
  pyarrow            conda-forge/win-64::pyarrow-0.14.1-py37h803c963_0
  python             conda-forge/win-64::python-3.7.3-h510b542_1
  python-dateutil    conda-forge/noarch::python-dateutil-2.8.0-py_0
  pytz               conda-forge/noarch::pytz-2019.2-py_0
  re2                conda-forge/win-64::re2-2019.08.01-vc14h6538335_0
  setuptools         conda-forge/win-64::setuptools-41.2.0-py37_0
  six                conda-forge/win-64::six-1.12.0-py37_1000
  snappy             conda-forge/win-64::snappy-1.1.7-h6538335_1002
  sqlite             conda-forge/win-64::sqlite-3.29.0-hfa6e2cd_1
  thrift-cpp         conda-forge/win-64::thrift-cpp-0.12.0-hd042d19_1004
  uriparser          conda-forge/win-64::uriparser-0.9.3-he025d50_1
  vc                 pkgs/main/win-64::vc-14.1-h0510ff6_4
  vs2015_runtime     pkgs/main/win-64::vs2015_runtime-14.15.26706-h3a45250_4
  wheel              conda-forge/win-64::wheel-0.33.6-py37_0
  wincertstore       conda-forge/win-64::wincertstore-0.2-py37_1002
  xz                 conda-forge/win-64::xz-5.2.4-h2fa13f4_1001
  zlib               conda-forge/win-64::zlib-1.2.11-h2fa13f4_1005
  zstd               conda-forge/win-64::zstd-1.4.0-hd8a0e53_0
...

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I think that numba (preciously llvmlite) has the same problem regarding the CRT DLLs.
A DLL load error occurs before executing vc_redist.x64.exe. The error disappear after executing vc_redist.x64.exe.

> cd \numba\python-3.7.4-embed-amd64
> python -m pip install numba
Collecting numba
...
Installing collected packages: llvmlite, numba
Successfully installed llvmlite-0.29.0 numba-0.45.1

> \cygwin64\bin\find . -name "msv*"
./Lib/site-packages/numpy/distutils/msvc9compiler.py
./Lib/site-packages/numpy/distutils/msvccompiler.py
./Lib/site-packages/numpy/distutils/__pycache__/msvc9compiler.cpython-37.pyc
./Lib/site-packages/numpy/distutils/__pycache__/msvccompiler.cpython-37.pyc
./Lib/site-packages/setuptools/msvc.py
./Lib/site-packages/setuptools/__pycache__/msvc.cpython-37.pyc

> python -c "import numpy"
>  \cygwin64\bin\find . -name "msv*"
./Lib/site-packages/numpy/distutils/msvc9compiler.py
./Lib/site-packages/numpy/distutils/msvccompiler.py
./Lib/site-packages/numpy/distutils/__pycache__/msvc9compiler.cpython-37.pyc
./Lib/site-packages/numpy/distutils/__pycache__/msvccompiler.cpython-37.pyc
./Lib/site-packages/setuptools/msvc.py
./Lib/site-packages/setuptools/__pycache__/msvc.cpython-37.pyc
> type Lib\site-packages\numpy\LICENSE.txt
...
Name: Microsoft Visual C++ Runtime Files
Files: extra-dll\msvcp140.dll
License: MSVC
  https://www.visualstudio.com/license-terms/distributable-code-microsoft-visual-studio-2015-rc-microsoft-visual-studio-2015-sdk-rc-includes-utilities-buildserver-files/#visual-c-runtime

  Subject to the License Terms for the software, you may copy and
  distribute with your program any of the files within the followng
  folder and its subfolders except as noted below. You may not modify
  these files.

    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\redist

  You may not distribute the contents of the following folders:

    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\redist\debug_nonredist
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\redist\onecore\debug_nonredist

  Subject to the License Terms for the software, you may copy and
  distribute the following files with your program in your programΓÇÖs
  application local folder or by deploying them into the Global
  Assembly Cache (GAC):

  VC\atlmfc\lib\mfcmifc80.dll
  VC\atlmfc\lib\amd64\mfcmifc80.dll


Name: Microsoft Visual C++ Runtime Files
Files: extra-dll\msvc*90.dll, extra-dll\Microsoft.VC90.CRT.manifest
License: MSVC
  For your convenience, we have provided the following folders for
  use when redistributing VC++ runtime files. Subject to the license
  terms for the software, you may redistribute the folder
  (unmodified) in the application local folder as a sub-folder with
  no change to the folder name. You may also redistribute all the
  files (*.dll and *.manifest) within a folder, listed below the
  folder for your convenience, as an entire set.

  \VC\redist\x86\Microsoft.VC90.ATL\
   atl90.dll
   Microsoft.VC90.ATL.manifest
  \VC\redist\ia64\Microsoft.VC90.ATL\
   atl90.dll
   Microsoft.VC90.ATL.manifest
  \VC\redist\amd64\Microsoft.VC90.ATL\
   atl90.dll
   Microsoft.VC90.ATL.manifest
  \VC\redist\x86\Microsoft.VC90.CRT\
   msvcm90.dll
   msvcp90.dll
   msvcr90.dll
   Microsoft.VC90.CRT.manifest
  \VC\redist\ia64\Microsoft.VC90.CRT\
   msvcm90.dll
   msvcp90.dll
   msvcr90.dll
   Microsoft.VC90.CRT.manifest
...

> python -c "import numba"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\numba\__init__.py", line 15, in <module>
    from . import config, errors, _runtests as runtests, types
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\numba\config.py", line 18, in <module>
    import llvmlite.binding as ll
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\__init__.py", line 6, in <module>
    from .dylib import *
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\dylib.py", line 4, in <module>
    from . import ffi
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\ffi.py", line 154, in <module>
    raise OSError("Could not load shared object file: {}".format(_lib_name))
OSError: Could not load shared object file: llvmlite.dll
> python -c "llvmlite.binding as ll"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\__init__.py", line 6, in <module>
    from .dylib import *
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\dylib.py", line 4, in <module>
    from . import ffi
  File "C:\numba\python-3.7.4-embed-amd64\lib\site-packages\llvmlite\binding\ffi.py", line 154, in <module>
    raise OSError("Could not load shared object file: {}".format(_lib_name))
OSError: Could not load shared object file: llvmlite.dll
> >bitsadmin /TRANSFER htmlget https://aka.ms/vs/16/release/vc_redist.x64.exe c:\numba\vc_redist.x64.exe
> ../vc_redist.x64.exe
> python -c "import llvmlite.binding as ll"
> python -c "import numba"
> 

@asfimport
Copy link
Author

Wes McKinney / @wesm:
OK, let's resolve this issue by adding documentation about installing Visual C++ Redistributable?

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
IIUC, there is already a paragraph to suggest installing Visual C++ Redistribute at here? Do we need to update the link and add an example for a failure?

If you encounter any importing issues of the pip wheels on Windows, you may need to install the Visual C++ Redistributable for Visual Studio 2015.

Finally (beyond 0.15?), is it the best solution to bundle the DLLs? Since I am studying how to package whl in windows for the release, it takes a time for me to find a way to bundle the DLLs. If there is a document, it helps me.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
In the short term we need to add documentation here

https://github.com/apache/arrow/blob/master/python/README.md

I'm submitting a PR

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I submitted a documentation PR and removed this from the release milestone for now

@asfimport
Copy link
Author

Kazuaki Ishizaki / @kiszk:
I see. I will investigate how to bundle DLLs beyond 0.15.

@asfimport
Copy link
Author

Krisztian Szucs / @kszucs:
This should be resolved by #5404
We can confirm that it works during the 0.15 release verification.

@asfimport
Copy link
Author

Sascha Hofmann / @saschahofmann:
We were observing the same error with a conda installation of pyarrow 0.15.0. Previously (on 0.14.1) we were running error-free.  After updating the DLL loading failed error started popping up. We finally fixed it by installing the Visual C++ Redistributables 2019. Neither the 2015 or 2017 versions were fixing the issue!

 

Edit: After restarting the machine, we are now seeing the error again. We are on Windows 10 and are using conda to install pyarrow.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This was resolved in 0.15.0.

Any conda issues are something else and should be diagnosed separately, can you open an issue on the pyarrow-feedstock on conda-forge?

I ran

conda create -n pyarrow-conda-test python=3.7 pyarrow -c conda-forge

and couldn't reproduce the issue locally

@asfimport
Copy link
Author

Sascha Hofmann / @saschahofmann:
Ok, great. I will do that. I just found an issue there which is probably related/ the same as ours. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants