Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] write csv decimal cast error #33002

Closed
asfimport opened this issue Sep 19, 2022 · 4 comments
Closed

[Python] write csv decimal cast error #33002

asfimport opened this issue Sep 19, 2022 · 4 comments
Assignees
Milestone

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 19, 2022

Hi, when try to write table with any field in Decimal128 type, arrow raises with this message:

In [136]: ds.write_dataset(table, "data", format="csv")
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In [136], line 1
----> 1 ds.write_dataset(table, "data", format="csv")

File c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in write_dataset(data, base_dir, basename_template, format, partitioning, partitioning_flavor, schema, filesystem, file_options, use_threads, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, file_visitor, existing_data_behavior, create_dir)
    927         raise ValueError("Cannot specify a schema when writing a Scanner")
    928     scanner = data
--> 930 _filesystemdataset_write(
    931     scanner, base_dir, basename_template, filesystem, partitioning,
    932     file_options, max_partitions, file_visitor, existing_data_behavior,
    933     max_open_files, max_rows_per_file,
    934     min_rows_per_group, max_rows_per_group, create_dir
    935 )

File c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, in pyarrow._dataset._filesystemdataset_write()

File c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from decimal128(21, 15) to utf8 using function cast_string

my data is:


In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string
----
col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460000000000000,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]

 

Thanks in advance.

Reporter: Alejandro Marco Ramos
Assignee: Miles Granger / @milesgranger

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17774. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Miles Granger / @milesgranger:
Indeed and unfortunate. :(
Basically a duplicate of ARROW-17458 (as experienced in C++) but of course experienced here in Python.

Small reproducible example:

import pyarrow as pa
import pyarrow.dataset as ds

table = pa.table({'col1': pa.array([1, 2], pa.decimal128(21, 15))})
ds.write_dataset(table, "data.csv", format="csv")
...
ArrowNotImplementedError: Unsupported cast from decimal128(21, 15) to utf8 using function cast_string

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
The underlying issues appears to have been solved. I just checked that this indeed works with the CSV writer:


import decimal

import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.csv as csv

decimals = pa.array([decimal.Decimal(1), decimal.Decimal(0.3)])
small_decimals = pc.cast(decimals, pa.decimal256(12, 6), safe=False)
table = pa.Table.from_pydict({"rownum": [1, 2], "decimal": small_decimals})
csv.write_csv(table, '/tmp/foo.csv')

with open('/tmp/foo.csv') as f:
    print(f.read())
# "rownum","decimal"
# 1,1.000000
# 2,0.299999

Do we want to add any pyarrow tests? Or just close this.

@asfimport
Copy link
Collaborator Author

Miles Granger / @milesgranger:
Thanks @westonpace, good timing. :)
I added a PR #14525 with some simple checks, let me know if you have suggestions there or if you don't think it's worth it I'm fine just closing this and the PR. (y)

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 14525
#14525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants