Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26180][CORE][TEST] Reuse withTempDir function to the SparkCore test case #23151

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

@heary-cao heary-cao commented Nov 27, 2018

What changes were proposed in this pull request?

Currently, the common withTempDir function is used in Spark SQL test cases. To handle val dir = Utils. createTempDir() and Utils. deleteRecursively (dir). Unfortunately, the withTempDir function cannot be used in the Spark Core test case. This PR Sharing withTempDir function in Spark Sql and SparkCore to clean up SparkCore test cases. thanks.

How was this patch tested?

N / A

@SparkQA
Copy link

SparkQA commented Nov 27, 2018

Test build #99302 has finished for PR 23151 at commit d29d4ec.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 27, 2018

Test build #99303 has finished for PR 23151 at commit 6402d2d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 27, 2018

Test build #99315 has finished for PR 23151 at commit 6402d2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

cc @cloud-fan, @kiszk, @dongjoon-hyun

@kiszk
Copy link
Member

kiszk commented Nov 28, 2018

Good catch

*
* @todo Probably this method should be moved to a more general place
*/
protected def withCreateTempDir(f: File => Unit): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to use withTempDir as a function name like other modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trait SQLTestUtils extends SparkFunSuite with SQLTestUtilsBase with PlanTest
if SparkFunSuite and SQLTestUtilsBase use the same name withTempDir.
Can cause name contamination? thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this function in SparkFunSuite, why do we need to define it again in SQLTestUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, withTempDir and withCreateTempDir are somewhat different.
withTempDir

protected def withTempDir(f: File => Unit): Unit = {
    val dir = Utils.createTempDir().getCanonicalFile
    try f(dir) finally {
      // wait for all tasks to finish before deleting files
      waitForTasksToFinish()
      Utils.deleteRecursively(dir)
    }
  }

protected def waitForTasksToFinish(): Unit = {
    eventually(timeout(10.seconds)) {
      assert(spark.sparkContext.statusTracker
        .getExecutorInfos.map(_.numRunningTasks()).sum == 0)
    }
  }

withCreateTempDir

protected def withCreateTempDir(f: File => Unit): Unit = {
    val dir = Utils.createTempDir()
    try f(dir) finally {
      Utils.deleteRecursively(dir)
    }
  }

thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also do waitForTasksToFinish in withCreateTempDir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I need assert(spark.sparkContext.statusTracker .getExecutorInfos.map(_.numRunningTasks()).sum == 0). after all, protected def spark: SparkSession defined in SQLTestData. Unless we construct one waitForTasksToFinish look like

protected def waitForTasksToFinish(): Unit = {
    eventually(timeout(10.seconds)) {
    }
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, how about in SQLTestUtils we override withTempDir with this extra logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I didn't make it clear. withTempDir define in SQLTestUtilsBase not in SQLTestUtils .if we will override withTempDir first must be remove withTempDir from SQLTestUtilsBase to SQLTestUtils and override, but in this way, other functions are also involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not talking about details like which calss to override, just the idea. Why wouldn't override work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like we should be able to use an override. The subclass that needs to inject an additional method call in the block can call the super method with a lambda that calls the user-supplied block, then this other method. It's probably worth whatever surgery is needed to make this clean and reduce duplication. We already have a lot of "create temp thing" methods all over.

* @todo Probably this method should be moved to a more general place
*/
protected def withCreateTempDir(f: File => Unit): Unit = {
val dir = Utils.createTempDir()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to call .getCanonicalFile, too?

Copy link
Contributor Author

@heary-cao heary-cao Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I need call .getCanonicalFile again. i feel it's a little redundant. review Utils.createTempDir() --> createDirectory --> dir.getCanonicalFile.
It has been called .getCanonicalFile. please correct If I misunderstand. thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes shouldn't be necessary here.

* Creates a temporary directory, which is then passed to `f` and will be deleted after `f`
* returns.
*
* @todo Probably this method should be moved to a more general place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the most general place we can find...

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99421 has finished for PR 23151 at commit d7482a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99436 has finished for PR 23151 at commit 073f5f7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99440 has finished for PR 23151 at commit 0537eb6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99441 has finished for PR 23151 at commit 819dc6a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99442 has finished for PR 23151 at commit c387cc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup. I think there are even more places where createTempDir() is called and could use this new pattern. If you have time, are you willing to search out some more? we don't need to get all of them but maybe some more easy cases out there.

val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
val fileSystem = Utils.getHadoopFileSystem("/",
SparkHadoopUtil.get.newConfiguration(new SparkConf()))
try {
withTempDir { testDir =>
testDir.deleteOnExit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think this is redundant for temp dirs, you can put this in the Utils.createTempDir method and take it out in places like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteOnExit is redundant for directories created by Utils.createTempDir. That method already tracks directories to cleanup on exit.

(Also deleteOnExit does not work for non-empty directories.)

*/
protected override def withTempDir(f: File => Unit): Unit = {
val dir = Utils.createTempDir().getCanonicalFile
try f(dir) finally {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call the super method with a function that calls f, then waitForTasksToFinish()?

/**
* Creates a temporary directory, which is then passed to `f` and will be deleted after `f`
* returns.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary blank line

*
*/
protected override def withTempDir(f: File => Unit): Unit = {
super.withTempDir { dir =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this is what I expect, thanks for doing it!


// All the resources should still be remote paths, so that YARN client will not upload again.
conf.get("spark.yarn.dist.jars") should be (tmpJarPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change those spaces alone tho. Let's leave as were.


conf.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}")
conf.get("spark.submit.pyFiles") should (startWith("/"))
conf.get(PY_FILES.key) should be(s"s3a://${pyFile.getAbsolutePath}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Technically it should better be assert and avoid infix notation but I think we don't have to do it here.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. PR description should also be updated (e.g., inheritance, same name, etc.)

@heary-cao heary-cao changed the title [SPARK-26180][CORE][TEST] Add a withCreateTempDir function to the SparkCore test case [SPARK-26180][CORE][TEST] Reuse withTempDir function to the SparkCore test case Nov 30, 2018
@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99489 has finished for PR 23151 at commit 482c4f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99496 has finished for PR 23151 at commit beccd74.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99505 has finished for PR 23151 at commit beccd74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comments to date have been incorporated... this is looking good.

@@ -245,8 +245,7 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
}

test("cannot call addFile with different paths that have the same filename") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @heary-cao . This PR is very useful. While touching this file, could you replace more val dir = Utils.createTempDir() with new withTempDir in this file? For example, the above test case at line 235.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i have update it. please review it. thanks.

@SparkQA
Copy link

SparkQA commented Dec 1, 2018

Test build #99532 has finished for PR 23151 at commit de0ef11.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 1, 2018

Test build #99533 has finished for PR 23151 at commit 6d2f2d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 327ac83 Dec 1, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
… test case

## What changes were proposed in this pull request?

Currently, the common `withTempDir` function is used in Spark SQL test cases. To handle `val dir = Utils. createTempDir()` and `Utils. deleteRecursively (dir)`. Unfortunately, the `withTempDir` function cannot be used in the Spark Core test case. This PR Sharing `withTempDir` function in Spark Sql and SparkCore  to clean up SparkCore test cases. thanks.

## How was this patch tested?

N / A

Closes apache#23151 from heary-cao/withCreateTempDir.

Authored-by: caoxuewen <cao.xuewen@zte.com.cn>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants