-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault while loading a json file #11044
Comments
After digging through 1.4M lines of data with various debugging scripts, I found there was a json row with |
This sounds like something that should, at least, generate a much better error message. Is there any chance you can use info to create a small file that reproduces the error? |
I tried a quick test and it seems to handle correctly
trying with
If you could provide further details to help reproducing the issue I think it would be great |
Thank you so much looking into the problem.
|
Thank you very much for giving us a reproducible test case. I don't know that it would have been obvious at all without it. https://issues.apache.org/jira/browse/ARROW-13871 The root cause was that a column needed to be a list (e.g. "null": ["46.0"]) and the final chunk of the file could not contain the column at all. It didn't appear in small files because they were all one chunk. It actually had nothing to do with the word "null" after all. |
Hi,
I am trying to load a ~300MB, 1.4M Lines file in JSONL format(one json per line). It generates Segmentation fault. I saw some past issues mentioned that and there were fixes merged. But I still see this issue with pyarrow 3.0.0/4.0.0/5.0.0.
When I try to load a subset of the file it works fine, with the complete data it fails.
Thank you for any help you can offer.
Here is a sample of the data(I can not share the full file since it is private to my company):
I am using python: 3.7.10
pyarrow: 4.0.1 installed using conda(4.9.2) on a Debian 10 machine
Here is a the tiny python script
Here is the stacktrace from gdb:
Here is the memory and ulimit :
The text was updated successfully, but these errors were encountered: