New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModuleNotFoundError: No module named 'pyarrow._parquet' #15417
Comments
Brecht Machiels / @brechtm: $ python -c "import pyarrow.parquet"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\bmachie\AppData\Local\Continuum\Miniconda3\envs\ml_recommender\lib\site-packages\pyarrow\parquet.py", line 23, in <module>
from pyarrow._parquet import (ParquetReader, FileMetaData, # noqa
ModuleNotFoundError: No module named 'pyarrow._parquet' parquet-cpp is installed: 1.1.0 vc14_1 [vc14] conda-forge |
Wes McKinney / @wesm: |
Brecht Machiels / @brechtm: |
Wes McKinney / @wesm: |
Brecht Machiels / @brechtm: |
Wes McKinney / @wesm: |
Brecht Machiels / @brechtm: I still am not able to to open a particular single-file parquet dataset though. It fails with py4j.protocol.Py4JJavaError: An error occurred while calling o76.parquet.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4, itsusraedlp08.jnj.com): java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://<snipped>/<parquet_dir>/e4a415679f64f34-7ac06c0506c56aab_1260025109_data.0. is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [50, 51, 56, 10]
at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: hdfs://<snipped>/<parquet_dir>/e4a415679f64f34-7ac06c0506c56aab_1260025109_data.0. is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [50, 51, 56, 10]
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:423)
...
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more It is possible to run queries against it using Impala though. |
Brecht Machiels / @brechtm: |
Wes McKinney / @wesm: |
Environment: Windows 7 64-bit, conda environment, Python 3.6.1
pyarrow: 0.3.0.post-np112py36_vc14_1 conda-forge [vc14]
Reporter: Brecht Machiels / @brechtm
Assignee: Wes McKinney / @wesm
Note: This issue was originally created as ARROW-1064. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: