Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions #13522

Closed
wants to merge 2 commits into from

Conversation

rajeshbalamohan
Copy link

What changes were proposed in this pull request?

Here is the generated code snippet when executing date functions. SimpleDateFormat is fairly expensive and can show up bottleneck when processing millions of records. It would be better to instantiate it once.

/* 066 */     UTF8String primitive5 = null;
/* 067 */     if (!isNull4) {
/* 068 */       try {
/* 069 */         primitive5 = UTF8String.fromString(new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(
/* 070 */             new java.util.Date(primitive7 * 1000L)));
/* 071 */       } catch (java.lang.Throwable e) {
/* 072 */         isNull4 = true;
/* 073 */       }
/* 074 */     }

With modified code, here is the generated code

/* 010 */   private java.text.SimpleDateFormat sdf2;
/* 011 */   private UnsafeRow result13;
/* 012 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder bufferHolder14;
/* 013 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter15;
/* 014 */
...
...
/* 065 */     boolean isNull0 = isNull3;
/* 066 */     UTF8String primitive1 = null;
/* 067 */     if (!isNull0) {
/* 068 */       try {
/* 069 */         if (sdf2 == null) {
/* 070 */           sdf2 = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
/* 071 */         }
/* 072 */         primitive1 = UTF8String.fromString(sdf2.format(
/* 073 */             new java.util.Date(primitive4 * 1000L)));
/* 074 */       } catch (java.lang.Throwable e) {
/* 075 */         isNull0 = true;
/* 076 */       }
/* 077 */     }

Similarly Calendar.getInstance was used in DateTimeUtils which can be lazily inited.

How was this patch tested?

org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite,org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite

@rajeshbalamohan rajeshbalamohan changed the title [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in… [SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date functions Jun 6, 2016
@rajeshbalamohan
Copy link
Author

@cloud-fan - Sorry about the delay. Rebased SPARK-14321 for master. #12105 had become stale and got little messy in my system. Ended up creating this PR. I will close the earlier one after review.

@@ -554,14 +561,19 @@ case class FromUnixTime(sec: Expression, format: Expression)
boolean ${ev.isNull} = true;
${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};""")
} else {
val sdfTerm = ctx.freshName("formatter")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trivial but why use a different variable name here from the above one? (which is called "formatter")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatter looks better

@SparkQA
Copy link

SparkQA commented Jun 6, 2016

Test build #60042 has finished for PR 13522 at commit 602d4a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jun 6, 2016

LGTM except one naming comment, thanks for working on it!

@rajeshbalamohan
Copy link
Author

Thank you. I have pushed the fixes in the recent commit.

@SparkQA
Copy link

SparkQA commented Jun 7, 2016

Test build #60118 has finished for PR 13522 at commit 425aa7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jun 7, 2016

Some more ideas: I think we should create a subclass of UnixTime to handle the left.dataType == StringType && right.foldable case. In optimizer, we can replace it with null literal if the formatter is null, then we don't need to do null check for every record.

asfgit pushed a commit that referenced this pull request Jun 9, 2016
… date functions

## What changes were proposed in this pull request?
The current implementations of `UnixTime` and `FromUnixTime` do not cache their parser/formatter as much as they could. This PR resolved this issue.

This PR is a take over from #13522 and further optimizes the re-use of the parser/formatter. It also fixes the improves handling (catching the actual exception instead of `Throwable`). All credits for this work should go to rajeshbalamohan.

This PR closes #13522

## How was this patch tested?
Current tests.

Author: Herman van Hovell <hvanhovell@databricks.com>
Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #13581 from hvanhovell/SPARK-14321.

(cherry picked from commit b076853)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in b076853 Jun 9, 2016
zjffdu pushed a commit to zjffdu/spark that referenced this pull request Jun 10, 2016
… date functions

## What changes were proposed in this pull request?
The current implementations of `UnixTime` and `FromUnixTime` do not cache their parser/formatter as much as they could. This PR resolved this issue.

This PR is a take over from apache#13522 and further optimizes the re-use of the parser/formatter. It also fixes the improves handling (catching the actual exception instead of `Throwable`). All credits for this work should go to rajeshbalamohan.

This PR closes apache#13522

## How was this patch tested?
Current tests.

Author: Herman van Hovell <hvanhovell@databricks.com>
Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes apache#13581 from hvanhovell/SPARK-14321.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants