[C++] Parse time32 from string and infer in CSV reader #27146

asfimport · 2021-01-13T22:01:07Z

When reading a CSV with read_csv_arrow() with date types and time types, the dates are read as datetimes rather than dates and times are read as characters rather than time.

The first problem can be fixed by supplying date32() to schema(), though better inference would be nice. However, supplying time32() to schema() causes an error.

Here is a sample dataset, also attached.

date,time,reading
2021-01-01,00:00:00,67.8
2021-01-01,00:00:00,72.4
2021-01-01,00:00:00,63.1
2021-01-01,00:05:00,67.8

Reading with readr::read_csv() results in a tibble with three columns: date, time, dbl, as expected.

samp_readr <- readr::read_csv('sampledata.csv')
samp_readr

# A tibble: 4 x 3
  date       time   reading
  <date>     <time>   <dbl>
1 2021-01-01 00'00"    67.8
2 2021-01-01 00'00"    72.4
3 2021-01-01 00'00"    63.1
4 2021-01-01 05'00"    67.8

Reading with arrow::read_csv_arrow() without providing schema() results in a tibble with three columns: dttm, chr, dbl.

samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
samp_arrow_plain

# A tibble: 4 x 3
  date                time     reading
  <dttm>              <chr>      <dbl>
1 2020-12-31 19:00:00 00:00:00    67.8
2 2020-12-31 19:00:00 00:00:00    72.4
3 2020-12-31 19:00:00 00:00:00    63.1
4 2020-12-31 19:00:00 00:05:00    67.8

Reading with arrow::read_csv_arrow() and providing date=date32() via schema() to col_types results in a tibble with three columns: date, chr, dbl.

samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(date=date32()))
samp_arrow_date

# A tibble: 4 x 3
  date       time     reading
  <date>     <chr>      <dbl>
1 2021-01-01 00:00:00    67.8
2 2021-01-01 00:00:00    72.4
3 2021-01-01 00:00:00    63.1
4 2021-01-01 00:05:00    67.8

Reading with arrow::read_csv_arrow() and providing time=time32() via schema() to col_types generates an error.

samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(time=time32()))

Error in csv___TableReader__Read(self) : 
  NotImplemented: CSV conversion to time32[ms] is not supported

The same error occurs when using compact string notation.

samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', col_names=c('date', 'time', 'reading'), skip=1)

Error in csv___TableReader__Read(self) : 
  NotImplemented: CSV conversion to time32[ms] is not supported

This is something in the internals, so far beyond me to figure out a fix, but I saw it in action and wanted to report it.

Environment: Ubuntu 18.04, R 4.0.3
Reporter: Jared Lander
Assignee: Antoine Pitrou / @pitrou

_{Note: This issue was originally created as ARROW-11243. Please see the migration documentation for further details.}

asfimport · 2021-01-13T22:40:01Z

Neal Richardson / @nealrichardson:
Thanks. The error message comes from https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L621, and it's true, there is no case defined for Type::TIME32. There also is no cast method to convert/parse the string to time32 after the fact.

I'll split to a separate issue the date vs. datetime type inference: ARROW-11247.

asfimport · 2021-01-14T15:46:50Z

Jared Lander:
My work around was using dplyr::mutate() and {hms} to convert the character to time after collecting the data, but that seems less than ideal for real workflows.

asfimport · 2021-01-14T16:14:16Z

Neal Richardson / @nealrichardson:
Yeah you can fix it when you're pulling the data into R but that's not going to help for purely Arrow workflows.

In other news, I did make a fix for the date type detection and that was just merged, so it will make it into the 3.0.0 release we're about to do.

asfimport · 2021-01-14T16:20:35Z

Jared Lander:
I saw that fix. Glad it was easy. Guessing the time fix is harder.

asfimport · 2021-06-23T16:35:14Z

Antoine Pitrou / @pitrou:
Can someone explain in plain terms what the desired behaviour should be? @nealrichardson

asfimport · 2021-06-23T16:43:17Z

Neal Richardson / @nealrichardson:
If I recall, the idea would be to add a type detector where if the values matched the pattern [0-9]{2}:[0-9]{2}:[0-9]{2}, then we infer the column type as Time32.

Separately, and IDK if there was another issue made for this, there should be a way to convert a string to Time32 (maybe strptime is the way now).

asfimport · 2021-06-23T16:52:49Z

Antoine Pitrou / @pitrou:
The accepted format would be HH:MM:SS, right?

asfimport · 2021-06-23T16:56:39Z

Neal Richardson / @nealrichardson:
I suppose so, though IDK whether we would want to support fractional seconds too. (Maybe strptime is the way in the CSV reader too?)

asfimport · 2021-06-23T17:47:18Z

Weston Pace / @westonpace:
If we add a Time32 converter we should probably support the ISO-8601 definition for time. It has two forms, the obvious one...

HH:MM:SS

HH:MM:SS.sss...

HH:MM

...and then there is another form where the colons are omitted but everything is prefixed with T

THH

THHMM

THHMMSS

THHMMSSsss

asfimport · 2021-07-27T09:55:15Z

Antoine Pitrou / @pitrou:
Issue resolved by pull request 10782
#10782

asfimport closed this as completed Jul 27, 2021

asfimport assigned pitrou Jan 10, 2023

asfimport added this to the 6.0.0 milestone Jan 11, 2023

asfimport mentioned this issue Jan 11, 2023

[C++] Unable to read date64 or date32 in specific format from CSV #28303

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Parse time32 from string and infer in CSV reader #27146

[C++] Parse time32 from string and infer in CSV reader #27146

asfimport commented Jan 13, 2021 •

edited

asfimport commented Jan 13, 2021

asfimport commented Jan 14, 2021

asfimport commented Jan 14, 2021

asfimport commented Jan 14, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jul 27, 2021

[C++] Parse time32 from string and infer in CSV reader #27146

[C++] Parse time32 from string and infer in CSV reader #27146

Comments

asfimport commented Jan 13, 2021 • edited

Related issues:

Original Issue Attachments:

PRs and other links:

asfimport commented Jan 13, 2021

asfimport commented Jan 14, 2021

asfimport commented Jan 14, 2021

asfimport commented Jan 14, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jun 23, 2021

asfimport commented Jul 27, 2021

asfimport commented Jan 13, 2021 •

edited