Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8186][SPARK-8187][SPARK-8194][SPARK-8198][SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation #7754

Closed
wants to merge 12 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jul 29, 2015

This PR is based on #7589 , thanks to @adrian-wang

Added SQL function date_add, date_sub, add_months, month_between, also add a rule for
add/subtract of date/timestamp and interval.

Closes #7589

cc @rxin

adrian-wang and others added 8 commits July 22, 2015 00:04
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
	sql/core/src/main/scala/org/apache/spark/sql/functions.scala
	sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38865 has finished for PR 7754 at commit 989b5b9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38866 has finished for PR 7754 at commit e47ff2c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@rxin
Copy link
Contributor

rxin commented Jul 29, 2015

I'm going to look through this. You will need to rebase because of #7745

Add(right, left) // switch the order

case Add(left, right) if right.dataType == IntervalType =>
Cast(TimeAdd(Cast(left, TimestampType), right), left.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implicitly casts all types to TimestampType. How about requiring left to be string, date, or timestamp?

@davies davies changed the title [SPARK-8186][SPARK-8187][SPARK-8194][SPARK-8198][SPARK-9133] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation [SPARK-8186][SPARK-8187][SPARK-8194][SPARK-8198][SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation Jul 29, 2015
* Turns Add/Subtract of DateType/TimestampType/StringType and CalendarIntervalType
* to TimeAdd/TimeSub
*/
object DateTimeOperations extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test for this rule?

make sure there is one test case just for this rule, and another that tests DateTimeOperations and ImplicitTypeCasts in combination to make sure they still work. I worry if we change ImplicitTypeCasts in the future, DateTimeOperations might break (e.g. the interval got converted to string type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, the combination of them will be covered by function tests.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38901 has finished for PR 7754 at commit bf0e9db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 29, 2015

cc @yjshen to take a look at the date time function implementations too


/**
* Returns number of months between time1 and time2. time1 and time2 are expressed in
* microseconds since 1.1.1970
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should document that this returns an integer if it is the same day in two different month

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38902 has finished for PR 7754 at commit 061e012.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38909 has finished for PR 7754 at commit 3efd86e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38929 has finished for PR 7754 at commit 5b7af5b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

} else {
dayOfMonth
}
firstDayOfMonth(absoluteMonth) + currentDayInMonth - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getYear,getMonth, getDayOfMonth will call getYearAndDayInYear first, therefore 3 times in total. If performance is a consideration here, I think we could have a function (date) => (year, month, dayOfMonth)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can do that later.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39001 has finished for PR 7754 at commit 1d07de1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@davies davies force-pushed the date_add branch 2 times, most recently from 127596c to 237452b Compare July 30, 2015 06:45
@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39017 has finished for PR 7754 at commit 237452b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39011 has finished for PR 7754 at commit 127596c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39020 has finished for PR 7754 at commit b923157.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #1239 has finished for PR 7754 at commit 1c4553d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39070 has finished for PR 7754 at commit 6224ce4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

Conflicts:
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@davies
Copy link
Contributor Author

davies commented Jul 30, 2015

@rxin If no more objection, I will merge this once it pass the tests.

@rxin
Copy link
Contributor

rxin commented Jul 30, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39083 has finished for PR 7754 at commit 446f762.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class UnixTimestamp(timeExp: Expression, format: Expression)
    • case class FromUnixTime(sec: Expression, format: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)
    • abstract class ArrayData extends SpecializedGetters with Serializable
    • class GenericArrayData(array: Array[Any]) extends ArrayData

@asfgit asfgit closed this in 1abf7dc Jul 30, 2015
@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39084 has finished for PR 7754 at commit 9e8e085.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class UnixTimestamp(timeExp: Expression, format: Expression)
    • case class FromUnixTime(sec: Expression, format: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)
    • case class AddMonths(startDate: Expression, numMonths: Expression)
    • case class MonthsBetween(date1: Expression, date2: Expression)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants