New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Parse time32 from string and infer in CSV reader #27146
Comments
Neal Richardson / @nealrichardson: I'll split to a separate issue the date vs. datetime type inference: ARROW-11247. |
Jared Lander: |
Neal Richardson / @nealrichardson: In other news, I did make a fix for the date type detection and that was just merged, so it will make it into the 3.0.0 release we're about to do. |
Jared Lander: |
Antoine Pitrou / @pitrou: |
Neal Richardson / @nealrichardson: Separately, and IDK if there was another issue made for this, there should be a way to convert a string to Time32 (maybe strptime is the way now). |
Antoine Pitrou / @pitrou: |
Neal Richardson / @nealrichardson: |
Weston Pace / @westonpace:
HH:MM:SS HH:MM:SS.sss... HH:MM
...and then there is another form where the colons are omitted but everything is prefixed with T
THH THHMM THHMMSS THHMMSSsss |
Antoine Pitrou / @pitrou: |
When reading a CSV with read_csv_arrow() with date types and time types, the dates are read as datetimes rather than dates and times are read as characters rather than time.
The first problem can be fixed by supplying date32() to schema(), though better inference would be nice. However, supplying time32() to schema() causes an error.
Here is a sample dataset, also attached.
date,time,reading
2021-01-01,00:00:00,67.8
2021-01-01,00:00:00,72.4
2021-01-01,00:00:00,63.1
2021-01-01,00:05:00,67.8
Reading with readr::read_csv() results in a tibble with three columns: date, time, dbl, as expected.
Reading with arrow::read_csv_arrow() without providing schema() results in a tibble with three columns: dttm, chr, dbl.
Reading with arrow::read_csv_arrow() and providing date=date32() via schema() to col_types results in a tibble with three columns: date, chr, dbl.
Reading with arrow::read_csv_arrow() and providing time=time32() via schema() to col_types generates an error.
The same error occurs when using compact string notation.
This is something in the internals, so far beyond me to figure out a fix, but I saw it in action and wanted to report it.
Environment: Ubuntu 18.04, R 4.0.3
Reporter: Jared Lander
Assignee: Antoine Pitrou / @pitrou
Related issues:
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-11243. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: