Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Data type conversion error #19329

Closed
asfimport opened this issue Aug 3, 2018 · 4 comments
Closed

[Python] Data type conversion error #19329

asfimport opened this issue Aug 3, 2018 · 4 comments

Comments

@asfimport
Copy link
Collaborator

I have a big pandas dataframe. I try and convert that to a pyarrow table and it fails with a conversion error. Not sure if this is a bug or is expected? 

I realize the code below showing the error is pretty useless as is. What can I do to help identify the cause in my pandas dataframe?

Here's the error:

 

In [17]: pa.Table.from_pandas(df)
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-17-6eac5d0eec08> in <module>()
----> 1 pa.Table.from_pandas(df)

table.pxi in pyarrow.lib.Table.from_pandas()

~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads)
375 arrays = list(executor.map(convert_column,
376 columns_to_convert,
--> 377 convert_types))
378 
379 types = [x.type for x in arrays]

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
--> 425 return self.__get_result()
426 
427 self._condition.wait(timeout)

~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
54 
55 try:
---> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in convert_column(col, ty)
364 
365 def convert_column(col, ty):
--> 366 return pa.array(col, from_pandas=True, type=ty)
367 
368 if nthreads == 1:

array.pxi in pyarrow.lib.array()

error.pxi in pyarrow.lib.check_status()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error converting from Python objects to Double: Got Python object of type str but can only handle these types: float

In [18]: pa.__version__
Out[18]: '0.9.0'

In [19]: pd.__version__
Out[19]: '0.23.3'

 

Environment: linux
Reporter: Christopher Brooks
Assignee: Wes McKinney / @wesm

Note: This issue was originally created as ARROW-2966. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Uwe Korn / @xhochy:
You have a column that contains mixed Python objects of float and str. There the Python objects of float are found first and the column is inferred as float but we cannot currently handle the conversion str -> float automatically at the moment.

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
I'm in the midst of refactoring this code path in ARROW-2814. I made a note to add more informative error output for this case, to show both the type and the string repr of the invalid value.

In the future it would be useful to treat unconvertible values as null: ARROW-2967

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
[~brooksch] in the next version of pyarrow (this didn't quite make it into 0.10.0), the exception will show you the offending value and its data type

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Resolved in ARROW-2814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants