-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Unable to read date64 or date32 in specific format from CSV #28303
Comments
Joris Van den Bossche / @jorisvandenbossche: In [3]: pa.array(["2012-01-01"]).cast(pa.timestamp('ms'))
Out[3]:
<pyarrow.lib.TimestampArray object at 0x7fae22d778e0>
[
2012-01-01 00:00:00.000
]
In [4]: pa.array(["2012-01-01"]).cast(pa.date32())
...
ArrowNotImplementedError: Unsupported cast from string to date32 using function cast_date32 |
Seems this has no traction - facing the same with another date format. Works if |
@yan-hic this format seems a bit tricky since it requires knowledge about the language of the CSV -- English in this case. |
@felipecrv I don't understand. The language is irrelevant. One passes the Until then, it's a 2 step process: |
Which also means that it's impossible to read some dates |
That issue was about inferring strings in default ISO date format as date type, not about being able to specify a custom string format (this issue).
Can you clarify this? What kind of date is impossible to read as a timestamp? |
@felipecrv that's a general issue for parsing that kind of strings, i.e. the same locale-dependent behaviour applies to parsing it as a timestamp as well, and that's something we already support:
The issue here is also enabling passing such custom format for date32/date64 columns, and not only for timestamp columns. |
Sorry, I was wrong - assumed date32/date64 is stored as number of days since UNIX epoch |
This is still an issue.. It also extends to time32[s].. CSV conversion error to time32[s]: invalid value '7:55:00' Right now the timestamp_parser will only convert different string formats to timestamp[x] types.. If you have a schema column with date32, date64 or time32 then the CSV conversion will fail. The solution is to allow the format lists in timestamp_parser to apply to date32, date64, time32 columns.. I have a current hack I implemented to be able to parse DATEs out of CSV files.. Code to swap out date columns with timestamp columns in a schema for dataset api
Expression code to cast timestamp columns to dates when reading CSV files using dataset.to_table
|
when importing csv data with dates in the format
"%d-%b-%y"
or"%d-%b-%Y"
an error is given in conversion:example:
Reporter: Stephen Bias
Related issues:
Note: This issue was originally created as ARROW-12539. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: