Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Error inferring List type in Array.from_pandas when inner values are all None #17309

Closed
asfimport opened this issue Aug 1, 2017 · 7 comments

Comments

@asfimport
Copy link

I have an avro file in hdfs that I am reading in using fastavro, converting to a pandas dataframe and then trying to create an arrow table and get as error:

table=pyarrow.Table.from_pandas(my_dataframe)
Traceback (most recent call last):
File "", line 1, in
File "pyarrow/table.pxi", line 746, in pyarrow.lib.Table.from_pandas (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:34089)
File "pyarrow/table.pxi", line 346, in pyarrow.lib._dataframe_to_arrays (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:30476)
File "pyarrow/array.pxi", line 182, in pyarrow.lib.Array.from_pandas (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:22110)
File "pyarrow/error.pxi", line 66, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:7702)
pyarrow.lib.ArrowNotImplementedError: NotImplemented: null

The avro schema indeed has null fields possible. Is this not implemented? I am using pyarrow 0.5.0. Also, for what I am doing I am not using pandas at all, I just read in the avro and I have a list of dicts and really want to write them to disk in parquet format and am utilizing these steps (which isn't optimal but may be necessary without writing more code of my own).

thanks,
Luke

Environment: centos 7.3
Reporter: Luke Higgins
Assignee: Wes McKinney / @wesm

Note: This issue was originally created as ARROW-1309. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Thanks [~virtualluke]. Any chance you can show the input data that triggered this error? There should be a single column in the data frame that is causing the problem (it's getting passed to pyarrow.Array.from_pandas)

If it's not possible to fix this immediately, we would definitely want to make the error message more informative than that

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Following up on this, [~virtualluke] – I would like to dig in and fix this before 0.6.0 goes out next week, let me know

@asfimport
Copy link
Author

Luke Higgins:
I'm sorry, can't get back to this until the 15th. There were columns with only value of None, is that the issue? Is it the optional fields in avro causing the issue? There were probably columns which were mostly empty but a few filled. More when I can on the 15th. Sorry for the delay.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
OK, I'll poke around and try to repro on my own

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Here's the repro; one of the columns is an array type and has all nulls

In [8]: pa.Array.from_pandas(np.array([[None, None], None], dtype=object))
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-8-c3bdca54ea4c> in <module>()
----> 1 pa.Array.from_pandas(np.array([[None, None], None], dtype=object))

~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.from_pandas (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:22323)()
    174                 c_type = type.sp_type
    175             with nogil:
--> 176                 check_status(PandasObjectsToArrow(
    177                     pool, values, mask, c_type, &chunked_out))
    178 

~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:7711)()
     76             raise ArrowKeyError(message)
     77         elif status.IsNotImplemented():
---> 78             raise ArrowNotImplementedError(message)
     79         elif status.IsTypeError():
     80             raise ArrowTypeError(message)

ArrowNotImplementedError: NotImplemented: null

@asfimport
Copy link
Author

Wes McKinney / @wesm:
PR: #947

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 947
#947

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants