-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34605][SQL] Support java.time.Duration
as an external type of the day-time interval type
#31729
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135721 has finished for PR 31729 at commit
|
java.time.Duration
as an external type of the day-time interval typejava.time.Duration
as an external type of the day-time interval type
@cloud-fan @HyukjinKwon @dongjoon-hyun @viirya @maropu @yaooqinn Could you review this PR, please. |
I believe that |
@yaooqinn I think of java.time.Period (https://docs.oracle.com/javase/8/docs/api/java/time/Period.html) |
Does it mean that we only use the period with year and month filed and fail for the day part? BTW, what is the behavior of Dataframe.show(), will it be something like |
Yes, we will use only year and month fields.
Let's discuss this in https://issues.apache.org/jira/browse/SPARK-34615
No-no. It will be in the format that conforms to the ANSI SQL standard like:
BTW, the format |
Seq( | ||
"PT20.123456S", | ||
"P2DT3H4M", | ||
"P2D").foreach { intervalStr => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many people are not familiar with this format. Can we add comments or use a different way to create Duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cam we add some negative cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I created durations using another Duration
methods. Also added negative tests.
} | ||
|
||
test("SPARK-34605: construct literals from arrays of java.time.Duration") { | ||
val duration0 = Duration.parse("P2DT3H4M") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
It's better we update the |
@yaooqinn Let's update docs separately. I opened the sub-task SPARK-34619 for that. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135752 has finished for PR 31729 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
In the PR, I propose to extend Spark SQL API to accept
java.time.Duration
as an external type of recently added new Catalyst type -DayTimeIntervalType
(see #31614). The Java classjava.time.Duration
has similar semantic to ANSI SQL day-time interval type, and it is the most suitable to be an external type forDayTimeIntervalType
. In more details:DurationConverter
which convertsjava.time.Duration
instances to/from internal representation of the Catalyst typeDayTimeIntervalType
(toLong
type). TheDurationConverter
object uses new methods ofIntervalUtils
:durationToMicros()
converts the input duration to the total length in microseconds. If this duration is too large to fitLong
, the method throws the exceptionArithmeticException
. Note: the input duration has nanosecond precision, the method casts the nanos part to microseconds by dividing by 1000.microsToDuration()
obtains ajava.time.Duration
representing a number of microseconds.DayTimeIntervalType
inRowEncoder
via the methodscreateDeserializerForDuration()
andcreateSerializerForJavaDuration()
.java.time.Duration
instances.Why are the changes needed?
java.time.Duration
collections, and construct day-time interval columns. Also to collect such columns back to the driver side.Does this PR introduce any user-facing change?
The PR extends existing functionality. So, users can parallelize instances of the
java.time.Duration
class and collect them back:How was this patch tested?
CatalystTypeConvertersSuite
to check conversion from/tojava.time.Duration
.RowEncoderSuite
.DayTimeIntervalType
are tested inLiteralExpressionSuite
DatasetSuite
andJavaDatasetSuite
.