Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22694][SQL] Like and RLike should not use global variables #19897

Closed
wants to merge 1 commit into from

Conversation

mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Dec 5, 2017

What changes were proposed in this pull request?

Like and RLike are using a global variable which is not needed. This can generate some unneeded entries in the constant pool.

The PR removes the unnecessary mutable states and makes them local variables.

How was this patch tested?

added UTs

@SparkQA
Copy link

SparkQA commented Dec 5, 2017

Test build #84483 has finished for PR 19897 at commit 3220df1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 6, 2017

@cloud-fan @kiszk @viirya may you please review this? Thanks

@@ -119,33 +119,34 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi
if (rVal != null) {
val regexStr =
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
ctx.addMutableState(patternClass, pattern,
s"""$pattern = ${patternClass}.compile("$regexStr");""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can avoid compiling the regex every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are adding a new one every time, so we are compiling it every time. Am I wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the code generating logic runs only once, the generated code runs many times, for each input row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you are right, my bad mistake. I am closing this PR immediately, sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's ok, Spark's codegen framework is pretty hard to understand, it takes time to get familiar with it :)

@mgaido91 mgaido91 closed this Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants