Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Decimal conversion not working for NaN values #18112

Closed
asfimport opened this issue Feb 13, 2018 · 9 comments
Closed

[Python] Decimal conversion not working for NaN values #18112

asfimport opened this issue Feb 13, 2018 · 9 comments

Comments

@asfimport
Copy link

asfimport commented Feb 13, 2018

import pyarrow as pa
import pandas as pd
import decimal

pa.Table.from_pandas(pd.DataFrame({'a': [decimal.Decimal('1.1'), decimal.Decimal('NaN')]}))

throws following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/table.pxi", line 875, in pyarrow.lib.Table.from_pandas (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:44927)
  File "/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 350, in dataframe_to_arrays
    convert_types)]
  File "/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 349, in <listcomp>
    for c, t in zip(columns_to_convert,
  File "/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 345, in convert_column
    return pa.array(col, from_pandas=True, type=ty)
  File "pyarrow/array.pxi", line 170, in pyarrow.lib.array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:29224)
  File "pyarrow/array.pxi", line 70, in pyarrow.lib._ndarray_to_array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:28465)
  File "pyarrow/error.pxi", line 98, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:9068)
pyarrow.lib.ArrowException: Unknown error: an integer is required (got type str)

Same problem with other special decimal values like infinity.

Reporter: Antony Mayi / @antonymayi
Assignee: Phillip Cloud / @cpcloud

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-2145. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Phillip Cloud / @cpcloud:
Thanks for the report, taking a look now.

@asfimport
Copy link
Author

Phillip Cloud / @cpcloud:
@antonymayi Do you have a specific use case for this, or were you tinkering around and trying a few things?

@asfimport
Copy link
Author

Antony Mayi / @antonymayi:
I am trying to use it for a real system where an event with zero value (decimal.Decimal('0')) is distinct from no event (decimal.Decimal('nan')) while both cases need to be stored. Being able to store decimal nan within the same column spares me from introducing another column just to flag the no-event cases.

This is the same case like with floats - nan/inf is valid float value (and supported by pyarrow/parquet) - but here I need to use decimal because of the precision...

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Can you use null instead?

@asfimport
Copy link
Author

Antony Mayi / @antonymayi:

Can you use null instead?

I guess the current implementation looks at the first cell of the column, finds instance of Decimal and then expects the whole column contains just decimals - so the answer is no, at least based on this observation:

>>> pa.Table.from_pandas(pd.DataFrame({'a': [decimal.Decimal('1.1'), None]}))
...
pyarrow.lib.ArrowException: Unknown error: 'NoneType' object has no attribute 'as_tuple'

or for numpy.nan:

>>> pa.Table.from_pandas(pd.DataFrame({'a': [decimal.Decimal('1.1'), np.nan]}))
...
pyarrow.lib.ArrowException: Unknown error: 'float' object has no attribute 'as_tuple'

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Those both look like bugs that should be fixed. If those worked (or if Decimal('NaN') became null), would that work?

@asfimport
Copy link
Author

Phillip Cloud / @cpcloud:
Both are definitely bugs, working on a fix.

@asfimport
Copy link
Author

Antony Mayi / @antonymayi:
yes, if null was accepted that would solve my usecase perfectly (in the same way Decimal('NaN') would but I guess null is even better as arrow is sparse for nulls while NaN takes space...)

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 1651
#1651

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants