Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SPARK-9115][SQL] date functions #6981

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
d0e2f99
date functions
tarekbecker Jun 24, 2015
5ebb235
resolved naming conflict
tarekbecker Jun 24, 2015
4d8049b
fixed tests and added type check
tarekbecker Jun 24, 2015
638596f
improved codegen
tarekbecker Jun 24, 2015
849fb41
fixed stupid test
tarekbecker Jun 24, 2015
c739788
added support for quarter SPARK-8178
tarekbecker Jun 24, 2015
b680db6
added codegeneration to all functions
tarekbecker Jun 24, 2015
a5ea120
added python api; changed test to be more meaningful
tarekbecker Jun 24, 2015
02efc5d
removed doubled code
tarekbecker Jun 26, 2015
356df78
rely on cast mechanism of Spark. Simplified implementation
tarekbecker Jun 29, 2015
3bfac90
fixed style
tarekbecker Jun 29, 2015
5fe74e1
fixed python style
tarekbecker Jun 29, 2015
a8edebd
use Calendar instead of SimpleDateFormat
tarekbecker Jun 29, 2015
f120415
improved runtime
tarekbecker Jun 30, 2015
eb6760d
Merge branch 'master' into SPARK-8199
tarekbecker Jul 4, 2015
5a105d9
[SPARK-8199] rebase after #6985 got merged
tarekbecker Jul 4, 2015
7bc9d93
Merge branch 'master' into SPARK-8199
tarekbecker Jul 9, 2015
d9f8ac3
[SPARK-8199] implement fast track
tarekbecker Jul 9, 2015
6f5d95c
[SPARK-8199] fixed year interval
tarekbecker Jul 9, 2015
f3e7a9f
[SPARK-8199] revert change in DataFrameFunctionsSuite
tarekbecker Jul 9, 2015
7d9f0eb
[SPARK-8199] git renaming issue
tarekbecker Jul 9, 2015
10e4ad1
Merge branch 'master' into date-functions-fast
tarekbecker Jul 9, 2015
ccb723c
[SPARK-8199] style and fixed merge issues
tarekbecker Jul 9, 2015
c42b444
Removed merge conflict file
tarekbecker Jul 9, 2015
ad17e96
improved implementation
tarekbecker Jul 10, 2015
f775f39
fixed return type
tarekbecker Jul 10, 2015
1a436c9
wip
tarekbecker Jul 13, 2015
4fb66da
WIP: date functions on calculation only
tarekbecker Jul 13, 2015
740af0e
implement date function using a calculation based on days
tarekbecker Jul 13, 2015
1358cdc
Merge remote-tracking branch 'origin/master' into SPARK-8199
tarekbecker Jul 16, 2015
ec87c69
[SPARK-8119] bug fixing and refactoring
tarekbecker Jul 16, 2015
0852655
[SPARK-8119] changed from ExpectsInputTypes to implicit casts
tarekbecker Jul 16, 2015
1b2e540
[SPARK-8119] style fix
tarekbecker Jul 16, 2015
b382267
[SPARK-8199] fixed bug in day calculation; removed set TimeZone in Hi…
tarekbecker Jul 17, 2015
d6aa14e
[SPARK-8199] fixed Hive compatibility
tarekbecker Jul 17, 2015
e223bc0
[SPARK-8199] refactoring
tarekbecker Jul 17, 2015
56c4a92
[SPARK-8199] update python docu
tarekbecker Jul 17, 2015
d01b977
[SPARK-8199] python underscore
tarekbecker Jul 17, 2015
2259299
[SPARK-8199] day_of_month alias
tarekbecker Jul 17, 2015
523542d
[SPARK-8199] address comments
tarekbecker Jul 17, 2015
0ad6db8
[SPARK-8199] minor fix
tarekbecker Jul 17, 2015
746b80a
[SPARK-8199] build fix
tarekbecker Jul 17, 2015
cdfae27
[SPARK-8199] cleanup & python docstring fix
tarekbecker Jul 17, 2015
fb98ba0
[SPARK-8199] python docstring fix
tarekbecker Jul 17, 2015
3c6ae2e
[SPARK-8199] removed binary search
tarekbecker Jul 18, 2015
70238e0
Merge branch 'master' into SPARK-8199
tarekbecker Jul 18, 2015
ea6c110
[SPARK-8199] fix after merging master
tarekbecker Jul 18, 2015
4afc09c
[SPARK-8199] concise leap year handling
tarekbecker Jul 18, 2015
6e0c78f
[SPARK-8199] removed setTimeZone in tests, according to cloud-fans co…
tarekbecker Jul 18, 2015
5983dcc
[SPARK-8199] whitespace fix
tarekbecker Jul 18, 2015
256c357
[SPARK-8199] code cleanup
tarekbecker Jul 18, 2015
3e095ba
[SPARK-8199] style and timezone fix
tarekbecker Jul 18, 2015
bb567b6
[SPARK-8199] fixed test
tarekbecker Jul 18, 2015
f7b4c8c
[SPARK-8199] fixed bug in tests
tarekbecker Jul 19, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,17 @@ object FunctionRegistry {
expression[Substring]("substr"),
expression[Substring]("substring"),
expression[Upper]("ucase"),
expression[Upper]("upper")
expression[Upper]("upper"),

// datetime functions
expression[DateFormat]("dateformat"),
expression[Year]("year"),
expression[Month]("month"),
expression[Day]("day"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin In Jira you mentioned there should be an alias. Can I just add expression[Day]("day_of_month")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw please sort the expressions alphabetically

expression[Hour]("hour"),
expression[Minute]("minute"),
expression[Second]("second"),
expression[WeekOfYear]("weekofyear")
)

val builtin: FunctionRegistry = {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.expressions

import java.sql.Date
import java.text.SimpleDateFormat

import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext}
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String

case class DateFormat(left: Expression, right: Expression)
extends BinaryExpression with ExpectsInputTypes {

override def dataType: DataType = StringType

override def expectedChildTypes: Seq[DataType] = Seq(TimestampType, StringType)

override def foldable: Boolean = left.foldable && right.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
val valueLeft = left.eval(input)
if (valueLeft == null) {
null
} else {
val valueRight = right.eval(input)
if (valueRight == null) {
null
} else {
val sdf = new SimpleDateFormat(valueRight.asInstanceOf[UTF8String].toString)
left.dataType match {
case TimestampType =>
UTF8String.fromString(sdf.format(new Date(valueLeft.asInstanceOf[Long] / 10000)))
case DateType =>
UTF8String.fromString(sdf.format(DateTimeUtils.toJavaDate(valueLeft.asInstanceOf[Int])))
case StringType =>
UTF8String.fromString(
sdf.format(DateTimeUtils.stringToTime(valueLeft.asInstanceOf[UTF8String].toString)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just valueLeft.toString

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAVER - http://www.naver.com/

sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL] date functions (#6981)> 이 다음과 같은 이유로 전송 실패했습니다.


받는 사람이 회원님의 메일을 수신차단 하였습니다.


}
}
}
}

override def toString: String = s"DateFormat($left, $right)"

override protected def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
val sdf = "java.text.SimpleDateFormat"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classOf[].getName

val utf8 = "org.apache.spark.unsafe.types.UTF8String"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.stringType

val dtUtils = "org.apache.spark.sql.catalyst.util.DateTimeUtils"

val eval1 = left.gen(ctx)
val eval2 = right.gen(ctx)

val calc = left.dataType match {
case TimestampType =>
s"""$utf8.fromString(sdf.format(new java.sql.Date(${eval1.primitive} / 10000)));"""
case DateType =>
s"""$utf8.fromString(
sdf.format($dtUtils.toJavaDate(${eval1.primitive})));"""
case StringType =>
s"""
$utf8.fromString(sdf.format(new java.sql.Date($dtUtils.stringToTime(${eval1.primitive}.toString()).getTime())));
"""
}

s"""
${eval1.code}
boolean ${ev.isNull} = ${eval1.isNull};
${ctx.javaType(dataType)} ${ev.primitive} = ${ctx.defaultValue(dataType)};
if (!${ev.isNull}) {
${eval2.code}
if (!${eval2.isNull}) {
$sdf sdf = new $sdf(${eval2.primitive}.toString());
${ev.primitive} = $calc
} else {
${ev.isNull} = true;
}
}
"""
}
}

case class Year(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("y")).eval(input) match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to call DateFormat(child, Literal("y")).eval(input) or is there a more elegant way.

Is there a way to call genCode of DataFormat?

case null => null
case x: UTF8String => x.toString.toInt
}
}

}

case class Month(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("M")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}
}

case class Day(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("d")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}

}

case class Hour(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("H")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}
}

case class Minute(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("m")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}
}

case class Second(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("s")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}
}

case class WeekOfYear(child: Expression) extends UnaryExpression with ExpectsInputTypes {

override def dataType: DataType = IntegerType

override def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType, TimestampType)

override def foldable: Boolean = child.foldable

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("w")).eval(input) match {
case null => null
case x: UTF8String => x.toString.toInt
}
}


}
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.expressions

import java.sql.{Timestamp, Date}
import java.text.SimpleDateFormat

import org.apache.spark.SparkFunSuite

class DateTimeFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper {

val sdf = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss")
val d = new Date(sdf.parse("2015/04/08 13:10:15").getTime)
val ts = new Timestamp(sdf.parse("2013/04/08 13:10:15").getTime)

test("DateFormat") {
checkEvaluation(DateFormat(Literal(d), Literal("y")), "2015")
checkEvaluation(DateFormat(Literal(d.toString), Literal("y")), "2015")
checkEvaluation(DateFormat(Literal(ts), Literal("y")), "2013")
}

test("Year") {
checkEvaluation(Year(Literal(d)), 2015)
checkEvaluation(Year(Literal(d.toString)), 2015)
checkEvaluation(Year(Literal(ts)), 2013)
}

test("Month") {
checkEvaluation(Month(Literal(d)), 4)
checkEvaluation(Month(Literal(d.toString)), 4)
checkEvaluation(Month(Literal(ts)), 4)
}

test("Day") {
checkEvaluation(Day(Literal(d)), 8)
checkEvaluation(Day(Literal(d.toString)), 8)
checkEvaluation(Day(Literal(ts)), 8)
}

test("Hour") {
checkEvaluation(Hour(Literal(d)), 0)
checkEvaluation(Hour(Literal(d.toString)), 0)
checkEvaluation(Hour(Literal(ts)), 13)
}

test("Minute") {
checkEvaluation(Minute(Literal(d)), 0)
checkEvaluation(Minute(Literal(d.toString)), 0)
checkEvaluation(Minute(Literal(ts)), 10)
}

test("Seconds") {
checkEvaluation(Second(Literal(d)), 0)
checkEvaluation(Second(Literal(d.toString)), 0)
checkEvaluation(Second(Literal(ts)), 15)
}

test("WeekOfYear") {
checkEvaluation(WeekOfYear(Literal(d)), 15)
checkEvaluation(WeekOfYear(Literal(d.toString)), 15)
checkEvaluation(WeekOfYear(Literal(ts)), 15)
}

}
Loading