Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20548][FLAKY-TEST] share one REPL instance among REPL test cases #17844

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

ReplSuite.newProductSeqEncoder with REPL defined class was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of ReplSuite, we create a REPL instance, which creates a classloader and loads a lot of classes related to SparkContext. More details please see #17833 (comment).

In this PR, we create a new test suite, SingletonReplSuite, which shares one REPL instances among all the test cases. Then we move most of the tests from ReplSuite to SingletonReplSuite, to avoid creating a lot of REPL instances and reduce memory footprint.

How was this patch tested?

test only change

@cloud-fan
Copy link
Contributor Author

cc @zsxwing @sameeragarwal

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76423 has finished for PR 17844 at commit 1565a34.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented May 4, 2017

Test build #76461 has finished for PR 17844 at commit 1565a34.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76471 has finished for PR 17844 at commit 9248a5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Just some nits.

val currentOffset = out.getBuffer.length()
// append a `val _result = 1` statement to the end of the given code, so that we can know what's
// the final output of this code snippet and rely on it to wait until the output is ready.
in.write((input + s"\nval _result = 1\n").getBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's better to use some more special string. E.g.,

val timestamp = System.currentTimeMillis()
in.write((input + s"\nval _result_$timestamp = 1\n").getBytes)
in.flush()
val stopMessage = s"_result_$timestamp: Int = 1"

}
}

def runInterpreter(input: String): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment mentioning that the codes in input should not throw an exception?

super.afterAll()
}

private def waitUntil(cond: () => Boolean): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use eventually

    eventually(timeout(50.seconds), interval(500.millis)) {
      assert(cond(), "current output: " + out.toString)
    }

assertDoesNotContain("Exception", output)
assertContains(": Int = 20", output)
}

test("line wrapper only initialized once when used as encoder outer scope") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you don't move this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test only works in local mode...

val classpath = paths.map(new File(_).getAbsolutePath).mkString(File.pathSeparator)

System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
Main.conf.set("spark.master", "local-cluster[2,4,4096]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's safe to just use local-cluster[2,1,1024].

Main.sparkSession = null

// Starts a new thread to run the REPL interpreter, so that we won't block.
thread = new Thread(new Runnable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please call Thread.setDaemon(true) in case that something is wrong and it prevents the process from exiting.

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76569 has finished for PR 17844 at commit fd5b43b.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76577 has finished for PR 17844 at commit fd5b43b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76582 has finished for PR 17844 at commit fd5b43b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76600 has finished for PR 17844 at commit fd5b43b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76618 has started for PR 17844 at commit 7dde745.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76649 has finished for PR 17844 at commit 7dde745.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76656 has finished for PR 17844 at commit 7dde745.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76675 has finished for PR 17844 at commit 7dde745.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 9, 2017
`ReplSuite.newProductSeqEncoder with REPL defined class` was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of `ReplSuite`, we create a REPL instance, which creates a classloader and loads a lot of classes related to `SparkContext`. More details please see #17833 (comment).

In this PR, we create a new test suite, `SingletonReplSuite`, which shares one REPL instances among all the test cases. Then we move most of the tests from `ReplSuite` to `SingletonReplSuite`, to avoid creating a lot of REPL instances and reduce memory footprint.

test only change

Author: Wenchen Fan <wenchen@databricks.com>

Closes #17844 from cloud-fan/flaky-test.

(cherry picked from commit f561a76)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to maser/2.2!

@asfgit asfgit closed this in f561a76 May 9, 2017
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?

`ReplSuite.newProductSeqEncoder with REPL defined class` was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of `ReplSuite`, we create a REPL instance, which creates a classloader and loads a lot of classes related to `SparkContext`. More details please see apache#17833 (comment).

In this PR, we create a new test suite, `SingletonReplSuite`, which shares one REPL instances among all the test cases. Then we move most of the tests from `ReplSuite` to `SingletonReplSuite`, to avoid creating a lot of REPL instances and reduce memory footprint.

## How was this patch tested?

test only change

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#17844 from cloud-fan/flaky-test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants