Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SPARK-9115][SQL] date functions #6981

Closed
wants to merge 54 commits into from

Conversation

override def nullable: Boolean = true

override def eval(input: InternalRow): Any = {
DateFormat(child, Literal("y")).eval(input) match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to call DateFormat(child, Literal("y")).eval(input) or is there a more elegant way.

Is there a way to call genCode of DataFormat?

@adrian-wang
Copy link
Contributor

I think this should wait #6782 , since we are creating a file with same usage.

@tarekbecker tarekbecker changed the title [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177] date functions [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SQL] date functions Jun 24, 2015
fixed test cases

dataframe test issues:
- dateformat with date string is null
- hours, minutes, seconds is zero for timestamp
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String

case class DateFormatClass(left: Expression, right: Expression)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If name is DateFormat build failes in line

def toString(days: Int): String = Cast.threadLocalDateFormat.get.format(toJavaDate(days))

@tarekbecker
Copy link
Contributor Author

@davies @rxin
Is it possbile to use ExpectsInputTypes with a a type similar to Any. If I dont know, how to specifyDateTypeOR StringType. If I useoverride def expectedChildTypes: Seq[DataType] = Seq(DateType, StringType` it looses the time information. (Hour, Minute, Second test will fail)

@tarekbecker tarekbecker changed the title [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SQL] date functions [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL] date functions Jun 24, 2015
@tarekbecker
Copy link
Contributor Author

@adrian-wang I guess we can implement the stuff already. A rebase or merge should be trivial.

@tarekbecker
Copy link
Contributor Author

Code generation is adopted from

UTF8String.fromString(sdf.format(DateTimeUtils.toJavaDate(valueLeft.asInstanceOf[Int])))
case StringType =>
UTF8String.fromString(
sdf.format(DateTimeUtils.stringToTime(valueLeft.asInstanceOf[UTF8String].toString)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just valueLeft.toString

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAVER - http://www.naver.com/

sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL] date functions (#6981)> 이 다음과 같은 이유로 전송 실패했습니다.


받는 사람이 회원님의 메일을 수신차단 하였습니다.


@tarekbecker
Copy link
Contributor Author

@YijieSHEN Thanks for all your feedback. I changed the things you mentioned. Have you an idea to improve DateFormatClass? The problem is I have to provide format and for that particular case I have to evaluate format first. Is there a way to combine these two?

@tarekbecker tarekbecker changed the title [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL] date functions [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL][WIP] date functions Jun 29, 2015
@tarekbecker tarekbecker changed the title [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL][WIP] date functions [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SQL] date functions Jun 29, 2015
@davies
Copy link
Contributor

davies commented Jun 29, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #35952 has finished for PR 6981 at commit 356df78.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class DateFormatExpression extends UnaryExpression with ExpectsInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression
    • case class Year(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class Quarter(child: Expression) extends DateFormatExpression
    • case class Month(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class Day(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class Hour(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class Minute(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class Second(child: Expression) extends DateFormatExpression with ExpectsInputTypes
    • case class WeekOfYear(child: Expression) extends DateFormatExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37680 has finished for PR 6981 at commit 4afc09c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37717 has finished for PR 6981 at commit 6e0c78f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37718 has finished for PR 6981 at commit 5983dcc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

@tarekbecker
Copy link
Contributor Author

@rxin / @davies First of all thanks for all your feedback so far. I removed the manual binary search. I guess we all agree that the if/else structure is much more readable.

@davies I removed testWithTimezone because of @cloud-fan comment in #7488.

@cloud-fan Could you have a look on this PR, if you have time? Would be great.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37725 has finished for PR 6981 at commit 256c357.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

specialized implementation.

>>> df0 = sqlContext.createDataFrame([('2015-04-08',)], ['a'])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Why do we have this blank line?

@davies
Copy link
Contributor

davies commented Jul 18, 2015

@tarekauel This is pretty close now, one thing left is that the int for DateType is in UTC, in order to get millisecond, we need to minus offset of timezone, see daysToMillis. But once you switch the timezone for Calendar to UTC, no offset anymore (it's zero).

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37729 has finished for PR 6981 at commit 3e095ba.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37742 has finished for PR 6981 at commit bb567b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37746 has finished for PR 6981 at commit f7b4c8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Second(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DayInYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Year(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Quarter(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Month(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class Day(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class DateFormatClass(left: Expression, right: Expression) extends BinaryExpression

(0 to 5).foreach { i =>
val c = Calendar.getInstance()
c.set(y, m, 28, 0, 0, 0)
c.add(Calendar.DATE, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> i

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

Thanks @tarekauel. I'm going to merge this and push a commit to fix that two problems @davies pointed out.

@asfgit asfgit closed this in 83b682b Jul 19, 2015
(0 to 5).foreach { i =>
val c = Calendar.getInstance()
c.set(y, m, 28, 0, 0, 0)
c.add(Calendar.DATE, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> i

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

@tarekauel congratulations. this is a big pr and a major step for date functions.

(0 to 5).foreach { i =>
val c = Calendar.getInstance()
c.set(y, m, 28, 0, 0, 0)
c.add(Calendar.DATE, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> i

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this when I looked for the last bug. I'm going to create a follow PR

asfgit pushed a commit that referenced this pull request Jul 19, 2015
…stems.

This pull request fixes some of the problems in #6981.

- Added date functions to `__all__` so they get exposed
- Rename day_of_month -> dayofmonth
- Rename day_in_year -> dayofyear
- Rename week_of_year -> weekofyear
- Removed "day" from Scala/Python API since it is ambiguous. Only leaving the alias in SQL.

Author: Reynold Xin <rxin@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #7506 from rxin/datetime and squashes the following commits:

0cb24d9 [Reynold Xin] Export all functions in Python.
e44a4a0 [Reynold Xin] Removed day function from Scala and Python.
9c08fdc [Reynold Xin] [SQL] Make date/time functions more consistent with other database systems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants