Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Regression: segfault when reading hive table with v0.14 #16812

Closed
asfimport opened this issue Jul 17, 2019 · 7 comments
Closed

[Python] Regression: segfault when reading hive table with v0.14 #16812

asfimport opened this issue Jul 17, 2019 · 7 comments

Comments

@asfimport
Copy link

asfimport commented Jul 17, 2019

I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow installed in a conda env.

The data I'm reading is a hive(-registered) table written as parquet, and with v0.13, reading this table (that is partitioned) does not cause any issues.

The code that worked before and now crashes with v0.14 is simply:

import pyarrow.parquet as pq
pq.ParquetDataset('hdfs:///data/raw/source/table').read()

Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I cannot report much more, but this is a pretty severe usability restriction. So far the solution is to enforce pyarrow<0.14

Reporter: H. Vetinari

Related issues:

Note: This issue was originally created as ARROW-5965. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Thanks for the report. A few questions:

  1. Is this reproducible if you try again with the same file? (I wonder if "Killed" means OOM and not segfault)
  2. Could you provide a (preferably as small as possible) Parquet file that triggers this behavior? I think we'll need that in order to identify and fix any issues.

@asfimport
Copy link
Author

H. Vetinari:
Hey Neal,

I tried a couple of times before filing the report, and all (~5) invocations on 0.14 crashed, and all invocations on 0.13 worked. The machine itself has lots of memory, so I don't think it's that. Not sure I'll be able to pare this down to a minimal reproducing parquet file. I'll try.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
A gdb backtrace would help us a lot. Do you know how to get one?

@asfimport
Copy link
Author

H. Vetinari:
@wesm
Would like to provide it, but would only be able to install through conda (which has a hole in the firewall).
Unfortunately,
# conda install pyarrow=0.14 gdb
Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

- pip -> python[version='>=3.7,<3.8.0a0']

which, I believe, is due to the fact that gdb has not yet been built for python 3.7. (although, just as I was preparing this message, I triggered a rerender there and this has caused some further action and the first passing 3.7 build; not yet merged because 2.7 is failing).

In the meantime I tried downgrading my whole environment to 3.6, where the program also crashes or hangs on v0.14. However, I haven't yet been able to get a gdb output. Might need some more reading of the GDB manual...

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Note I linked this with ARROW-2652 since many users aren't familiar with producing gdb backtraces generated in Python programs

@asfimport
Copy link
Author

H. Vetinari:
@wesm
Thanks for the tips. Unfortunately, I can't follow that example because the code does not generate a core-dump but only prints "Killed". I found some ways to run it in gdb that should work (best as I can tell), like gdb -ex r --args python fail.py or interactively:
gdb python
(gdb) run fail.py

but I always get:
[...]
warning: Could not trace the inferior process
Error:
warning: ptrace: Operation not permitted
During startup program exited with code 127.

Not sure if that's a mistake on my side or something in the setup/interplay of conda-gdb.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I'm guessing this is a dup of the memory issue from ARROW-6060. If you obtain a repro or additional information to suggest it's not a memory problem please reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant