Skip to content

Conversation

@mrtns
Copy link
Contributor

@mrtns mrtns commented Sep 28, 2020

Restore fix from #311 , which was undone? in b2e6e22 .

Isolated (in plain pandas) repro:

x = pandas.DataFrame({
    'a_decimal_column': [decimal.Decimal(100.1)]
})
print(x.to_string())

                                    a_decimal_column
0  100.099999999999994315658113919198513031005859375
x.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   a_decimal_column  1 non-null      object
dtypes: object(1)
memory usage: 136.0+ bytes
x['a_decimal_column'].astype("string")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-231-9fb4add4f01e> in <module>
----> 1 x['a_decimal_column'].astype("string")

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5696         else:
   5697             # else, only a single dtype is given
-> 5698             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
   5699             return self._constructor(new_data).__finalize__(self)
   5700 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    580 
    581     def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    583 
    584     def convert(self, **kwargs):

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    623             vals1d = values.ravel()
    624             try:
--> 625                 values = astype_nansafe(vals1d, dtype, copy=True)
    626             except (ValueError, TypeError):
    627                 # e.g. astype_nansafe can fail on object-dtype of strings

/opt/conda/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    819     # dispatch on extension dtype if needed
    820     if is_extension_array_dtype(dtype):
--> 821         return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
    822 
    823     if not isinstance(dtype, np.dtype):

/opt/conda/lib/python3.7/site-packages/pandas/core/arrays/string_.py in _from_sequence(cls, scalars, dtype, copy)
    195             result[na_values] = StringDtype.na_value
    196 
--> 197         return cls(result)
    198 
    199     @classmethod

/opt/conda/lib/python3.7/site-packages/pandas/core/arrays/string_.py in __init__(self, values, copy)
    164         self._dtype = StringDtype()
    165         if not skip_validation:
--> 166             self._validate()
    167 
    168     def _validate(self):

/opt/conda/lib/python3.7/site-packages/pandas/core/arrays/string_.py in _validate(self)
    169         """Validate that we only store NA or strings."""
    170         if len(self._ndarray) and not lib.is_string_array(self._ndarray, skipna=True):
--> 171             raise ValueError("StringArray requires a sequence of strings or pandas.NA")
    172         if self._ndarray.dtype != "object":
    173             raise ValueError(

ValueError: StringArray requires a sequence of strings or pandas.NA
x['a_decimal_column'].astype("str")

0    100.099999999999994315658113919198513031005859375
Name: a_decimal_column, dtype: object

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mrtns
Copy link
Contributor Author

mrtns commented Sep 28, 2020

Not sure if this is necessary, but to get from dtype: object to dtype: string, one option could be:

x['a_decimal_column'].astype("str").astype("string")

0    100.099999999999994315658113919198513031005859375
Name: a_decimal_column, dtype: string

@igorborgest
Copy link
Contributor

Hi @mrtns, thanks for reaching out!

The current Wrangler version was adapted for the current pandas version which supports direct conversion between decimal and string.

Can you please check your pandas version?

igorborgest added a commit that referenced this pull request Sep 28, 2020
igorborgest added a commit that referenced this pull request Sep 28, 2020
igorborgest added a commit that referenced this pull request Sep 30, 2020
@igorborgest igorborgest added this to the 1.9.6 milestone Oct 6, 2020
@igorborgest igorborgest self-assigned this Oct 6, 2020
@igorborgest igorborgest self-requested a review October 6, 2020 23:02
@igorborgest igorborgest added micro release Will be addressed in the next micro release ready to release labels Oct 6, 2020
@igorborgest
Copy link
Contributor

Hi @mrtns, thanks again for reporting it.

I've fixed this issue in the commit above.

Do you mind to test it from our development branch and check if it helps in your use case?
The ideia is to publish this enhancement in the version 1.9.6 next weekend.

pip install git+https://github.com/awslabs/aws-data-wrangler.git@dev

@igorborgest igorborgest closed this Oct 6, 2020
igorborgest added a commit that referenced this pull request Oct 10, 2020
@mrtns
Copy link
Contributor Author

mrtns commented Oct 12, 2020

Can you please check your pandas version?

Ah yes, I'm definitely on an older version:

pandas.__version__
'1.0.5'

Do you mind to test it from our development branch and check if it helps in your use case?

Sorry for the delay on this. I've testing with awswrangler-1.9.6 now and can confirm that it is pulling pandas-1.1.3 and I am no longer encountering the exception. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

micro release Will be addressed in the next micro release ready to release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants