Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROTOCOL RFC] Add TIME & INTERVAL as supported dtypes #2827

Open
ion-elgreco opened this issue Mar 30, 2024 · 6 comments
Open

[PROTOCOL RFC] Add TIME & INTERVAL as supported dtypes #2827

ion-elgreco opened this issue Mar 30, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@ion-elgreco
Copy link

ion-elgreco commented Mar 30, 2024

I am often getting the question, why doesn't delta-rs support time/interval dtypes.. Seems obvious to add since it's supported by Parquet and supported by most engines.

We can do the same thing as timestampNtz and gate these behind a reader & writer feature. I am proposing we add these primitive types:

Type Name Description
time Microsecond precision time of day (backed by parquet TimeType) https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time
interval A fixed amount of time https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval

Time

This feature introduces a new data type to support time. For example: 00:00:00, or 24:00:00. The serialization method would be: {hour}:{minute}:{second}

Interval

This feature introduces a new data type to support day-time intervals. For example: 5d 10h 5m 10s. The serialization method would be: {day}d:{hour}h:{minute}m:{second}s

To support these feature:

To have a column of Time/Interval type in a table, the table must have Reader Version 3 and Writer Version 7. A feature name time/interval must exist in the table's readerFeatures and writerFeatures.

Engine support:

Engine Type Name Support equivalent engine type
Spark time No N/A
Spark interval Yes DayTimeIntervalType
Trino time Yes TIME(P)
Trino interval Yes INTERVAL DAY TO SECOND
Flink time Yes TIME(P)
Flink interval Yes INTERVAL DAY TO SECOND
Datafusion time Yes Time (in arrow: Time64)
Datafusion interval Yes Interval(DayTime)
@ion-elgreco ion-elgreco added the enhancement New feature or request label Mar 30, 2024
@ion-elgreco
Copy link
Author

Supersedes this issue: #2319

@tdas @ryan-johnson-databricks and @bart-samwel

@ion-elgreco ion-elgreco changed the title [Feature Request] Add TIME & INTERVAL as support dtypes in protocol [Feature Request] Add TIME & INTERVAL as supported dtypes in protocol Mar 30, 2024
@ion-elgreco ion-elgreco changed the title [Feature Request] Add TIME & INTERVAL as supported dtypes in protocol [Protocol RFC] Add TIME & INTERVAL as supported dtypes in protocol Mar 30, 2024
@ion-elgreco ion-elgreco changed the title [Protocol RFC] Add TIME & INTERVAL as supported dtypes in protocol [Protocol RFC] Add TIME & INTERVAL as supported dtypes Mar 30, 2024
@ion-elgreco ion-elgreco changed the title [Protocol RFC] Add TIME & INTERVAL as supported dtypes [PROTOCOL RFC] Add TIME & INTERVAL as supported dtypes Mar 30, 2024
@bart-samwel
Copy link
Collaborator

This makes sense to me if it's just like TimestampNTZ. It's unfortunate that Spark doesn't have a TIME type (yet). :/

@ion-elgreco
Copy link
Author

@bart-samwel what would be the next steps to make this happen?

@scovich
Copy link
Collaborator

scovich commented Apr 19, 2024

Seems like we need to add the datatype support to spark, as a starting point?
(seems quite doable, but I'm not the expert there)

@ion-elgreco
Copy link
Author

ion-elgreco commented May 24, 2024

Seems like we need to add the datatype support to spark, as a starting point?
(seems quite doable, but I'm not the expert there)

Can we maybe start with interval first then, since it's supported in all engines?

I can add the support in delta-kernel-rs and Delta-RS for both types

@ion-elgreco
Copy link
Author

@scovich @tdas @ryan-johnson-databricks and @bart-samwel

Can we move forward with this adding it to the protocol?

@scovich I don't see why we should at this to Spark first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants