Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-18368: Fix regexp_replace with task serialization. #15816

Closed
wants to merge 3 commits into from

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Nov 8, 2016

What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, result: StringBuffer will be correctly initialized.

How was this patch tested?

  • Verified that this patch fixed the query that found the bug.
  • Added a test case that fails without the fix.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68373 has finished for PR 15816 at commit 92980fc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Wow I am actually surprised that this wasn't caught before. Integration testing should have uncovered this. Could you add one?

@rdblue rdblue force-pushed the SPARK-18368-fix-regexp-replace branch from 92980fc to 3536f6a Compare November 8, 2016 23:50
@hvanhovell
Copy link
Contributor

Sorry I missed your update.

@@ -191,4 +192,17 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(StringSplit(s1, s2), null, row3)
}

test("RegExpReplace serialization") {
val serializer = new JavaSerializer(new SparkConf()).newInstance
Copy link
Contributor

@hvanhovell hvanhovell Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a similar test to ExpressionEvalHelper.

@rdblue
Copy link
Contributor Author

rdblue commented Nov 9, 2016

@hvanhovell, I updated the test so that all expressions passed through checkEvaluation are serialized and deserialized before the checks are run. I'll open an issue if it catches anything else. I'd rather not hold up this PR if it finds lots of other bugs.

@SparkQA
Copy link

SparkQA commented Nov 9, 2016

Test build #68369 has finished for PR 15816 at commit 8740a2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2016

Test build #68377 has finished for PR 15816 at commit 1080505.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2016

Test build #68375 has finished for PR 15816 at commit 3536f6a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 9, 2016
## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue <blue@apache.org>

Closes #15816 from rdblue/SPARK-18368-fix-regexp-replace.

(cherry picked from commit b9192bb)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@rxin
Copy link
Contributor

rxin commented Nov 9, 2016

Merging in master/branch-2.1/branch-2.0.

asfgit pushed a commit that referenced this pull request Nov 9, 2016
## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue <blue@apache.org>

Closes #15816 from rdblue/SPARK-18368-fix-regexp-replace.

(cherry picked from commit b9192bb)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@rxin
Copy link
Contributor

rxin commented Nov 9, 2016

I'm surprised too that we haven't caught this earlier ...

@asfgit asfgit closed this in b9192bb Nov 9, 2016
@rdblue
Copy link
Contributor Author

rdblue commented Nov 9, 2016

Thanks @rxin and @hvanhovell! I appreciate the quick reviews.

@yhuai
Copy link
Contributor

yhuai commented Nov 9, 2016

I am wondering if it breaks some tests?

org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite.e
org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite.pi
org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite.binary log
org.apache.spark.sql.catalyst.expressions.StringExpressionsSuite.format_number / FormatNumber

http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite&test_name=e

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/1952/

@yhuai
Copy link
Contributor

yhuai commented Nov 9, 2016

oh, seems the last commit did not pass build.

image

Sorry. I am going to revert this patch.

@yhuai
Copy link
Contributor

yhuai commented Nov 9, 2016

Reverted from master/branch-2.1/branch-2.0.

@yhuai
Copy link
Contributor

yhuai commented Nov 9, 2016

@rdblue Can you send a new pr with the fix? Thanks!

@zsxwing
Copy link
Member

zsxwing commented Nov 9, 2016

If result is called before serialized, it has the same issue. Right? I misunderstood. NVM.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

This makes the result value both transient and lazy, so that if the RegExpReplace object is initialized then serialized, `result: StringBuffer` will be correctly initialized.

## How was this patch tested?

* Verified that this patch fixed the query that found the bug.
* Added a test case that fails without the fix.

Author: Ryan Blue <blue@apache.org>

Closes apache#15816 from rdblue/SPARK-18368-fix-regexp-replace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants