ModuleNotFoundError: No module named 'pyarrow._parquet' #15417

asfimport · 2017-05-23T09:38:30Z

$ python
Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 20:11:04) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\bmachie\AppData\Local\Continuum\Miniconda3\envs\ml_recommender\lib\site-packages\pyarrow\__init__.py", line 28, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified procedure could not be found.
>>>

Environment: Windows 7 64-bit, conda environment, Python 3.6.1
pyarrow: 0.3.0.post-np112py36_vc14_1 conda-forge [vc14]
Reporter: Brecht Machiels / @brechtm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as ARROW-1064. Please see the migration documentation for further details.}

asfimport · 2017-05-23T10:03:53Z

Brecht Machiels / @brechtm:
It seems I had an older version (0.3.pre-np112py36_vc14_0 conda-forge [vc14]) of arrow-cpp installed. After upgrading to the current version (0.3.0.post-np112py36_vc14_1 conda-forge [vc14]), "import pyarrow" works, but "import pyarrow.parquet" fails:

$ python -c "import pyarrow.parquet"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\bmachie\AppData\Local\Continuum\Miniconda3\envs\ml_recommender\lib\site-packages\pyarrow\parquet.py", line 23, in <module>
    from pyarrow._parquet import (ParquetReader, FileMetaData,  # noqa
ModuleNotFoundError: No module named 'pyarrow._parquet'

parquet-cpp is installed: 1.1.0 vc14_1 [vc14] conda-forge

asfimport · 2017-05-23T12:21:21Z

Wes McKinney / @wesm:
This should be resolved in the next 24 hours; we are in the process of making a release

asfimport · 2017-05-23T12:25:44Z

Brecht Machiels / @brechtm:
Great! And thank you for providing Windows conda packages!

asfimport · 2017-05-24T01:28:12Z

Wes McKinney / @wesm:
Can you check out the updated conda packages and let me know if all is working?

asfimport · 2017-05-24T08:00:08Z

Brecht Machiels / @brechtm:
Yes, 0.4.0 seems to be working. I can perform the import and parse parquet files now. Importing parquet datasets consisting of multiple files but with missing _metadata doesn't seem to be possible, but I don't suppose that is a bug.

asfimport · 2017-05-24T17:50:47Z

Wes McKinney / @wesm:
That sounds buggy to me. Could you open a new JIRA?

asfimport · 2017-05-30T07:14:35Z

Brecht Machiels / @brechtm:
I did eventually get it to open the set of parquet files with missing _metadata file by removing an empty directory ("_impala_insert_staging") that was in the same directory.

I still am not able to to open a particular single-file parquet dataset though. It fails with ArrowIOError: IOError: Invalid parquet file. Corrupt footer.. It cannot be opened by fastparquet either. Trying to load it using PySpark fails with a similar error, so there must be something wrong with it:

py4j.protocol.Py4JJavaError: An error occurred while calling o76.parquet.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4, itsusraedlp08.jnj.com): java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://<snipped>/<parquet_dir>/e4a415679f64f34-7ac06c0506c56aab_1260025109_data.0. is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [50, 51, 56, 10]
        at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
        ...
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: hdfs://<snipped>/<parquet_dir>/e4a415679f64f34-7ac06c0506c56aab_1260025109_data.0. is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [50, 51, 56, 10]
        at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:423)
        ...
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more

It is possible to run queries against it using Impala though.

asfimport · 2017-05-30T07:59:05Z

Brecht Machiels / @brechtm:
Never mind the comment about the parquet file with the corrupt footer. Turns out it is a CSV file :-)

asfimport · 2017-05-31T22:04:17Z

Wes McKinney / @wesm:
I created ARROW-1079 about the empty directory issue

asfimport closed this as completed May 24, 2017

asfimport assigned wesm Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'pyarrow._parquet' #15417

ModuleNotFoundError: No module named 'pyarrow._parquet' #15417

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 24, 2017

asfimport commented May 24, 2017

asfimport commented May 24, 2017

asfimport commented May 30, 2017

asfimport commented May 30, 2017

asfimport commented May 31, 2017

ModuleNotFoundError: No module named 'pyarrow._parquet' #15417

ModuleNotFoundError: No module named 'pyarrow._parquet' #15417

Comments

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 23, 2017

asfimport commented May 24, 2017

asfimport commented May 24, 2017

asfimport commented May 24, 2017

asfimport commented May 30, 2017

asfimport commented May 30, 2017

asfimport commented May 31, 2017