Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9026] Refactor SimpleFutureAction.onComplete to not launch separate thread for every callback #7385

Closed

Conversation

JoshRosen
Copy link
Contributor

This commit refactors SimpleFutureAction so that registering a callback with onComplete does not immediately tie up a thread in the provided execution context.

As @zsxwing noticed in #7276 (comment), the existing implementation of SimpleFutureAction.onComplete creates a separate thread to wrap the blocking awaitResult() callback:

  override def onComplete[U](func: (Try[T]) => U)(implicit executor: ExecutionContext) {
    executor.execute(new Runnable {
      override def run() {
        func(awaitResult())
      }
    })
  }

This PR addresses this issue by making JobWaiter into a Future using that future in the implementation of SimpleFutureAction's future methods.

This patch was pair-programmed with @tdas.

@JoshRosen
Copy link
Contributor Author

@zsxwing, TD and I think that this patch's fix may eliminate the need to introduce a new submitAsyncJob API. I'm going to push another commit in a few minutes to add some code comments plus more tests, but I just wanted to submit this PR now to get your initial feedback on the approach.

private var finishedTasks = 0

// Is the job as a whole finished (succeeded or failed)?
@volatile
private var _jobFinished = totalTasks == 0

if (_jobFinished) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zsxwing, this if statement fixes a subtle bug that we found in your JobWaiter future: if a job has no tasks, then it is marked as finished immediately and taskSucceeded will not be called, so we need to complete the promise here. We noticed this because a test in AsyncRDDActionsSuite was hanging.

@JoshRosen JoshRosen force-pushed the simplefutureaction-refactoring branch from 7b25e6b to d779af8 Compare July 14, 2015 03:25
@JoshRosen
Copy link
Contributor Author

@tdas, PTAL @ the regression test that I added.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37179 has finished for PR 7385 at commit 11c7450.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37184 has finished for PR 7385 at commit 7b25e6b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Ah, interesting: it looks like this failed some of the JavaAsyncRDDActions tests. I'll investigate.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37188 has finished for PR 7385 at commit d779af8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Least(children: Expression*) extends Expression
    • case class Greatest(children: Expression*) extends Expression

override def onComplete[U](func: (Try[T]) => U)(implicit executor: ExecutionContext): Unit = {
jobWaiter.toFuture.onComplete { (jobWaiterResult: Try[Unit]) =>
// If the job succeeded, then evaluate the result function; otherwise, preserve the exception.
_value = jobWaiterResult.map(_ => resultFunc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdas, I think there's a bug here because we'll re-assign to _value if there are multiple onCompletes. There's also a race in allowing _value to be assigned here, since there's a lag between when the jobWaiter future completes and when this callback runs. Fixing this now...

@zsxwing
Copy link
Member

zsxwing commented Jul 14, 2015

@JoshRosen thank you for fixing this. Actually, I have another requirement for submitJob. Current submitJob signature is

  def submitJob[T, U, R](
      rdd: RDD[T],
      processPartition: Iterator[T] => U,
      partitions: Seq[Int],
      resultHandler: (Int, U) => Unit,
      resultFunc: => R): SimpleFutureAction[R]

I cannot access TaskContext in this method. Could you change processPartition: Iterator[T] => U to processPartition: (TaskContext, Iterator[T]) => U, or add a new method for it?

@JoshRosen
Copy link
Contributor Author

@zsxwing will TaskContext.get() work for you? AFAIK we're trying to avoid the introduction of new withContext methods when TaskContext.get() will suffice (see some of the old discussions regarding mapPartitionsWithContext, etc.).

@zsxwing
Copy link
Member

zsxwing commented Jul 14, 2015

@zsxwing will TaskContext.get() work for you? AFAIK we're trying to avoid the introduction of new withContext methods when TaskContext.get() will suffice (see some of the old discussions regarding mapPartitionsWithContext, etc.).

Forgot it. It should work. Thanks.

}
})
override def onComplete[U](func: (Try[T]) => U)(implicit executor: ExecutionContext): Unit = {
jobWaiterFuture.map { _ => resultFunc }.onComplete(func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding private lazy val _resultFunc = resultFunc and use _resultFunc in this class? This should avoid calling resultFunc multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that @tdas and I considered this and ended up not doing it because we thought that resultFunc would only be computed once, but I suppose it doesn't hurt to be more explicit here. Even if the current code works, if it's confusing enough to merit a comment then I think we should just be explicit and use a lazy val. I'll update this now to do this.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37200 has finished for PR 7385 at commit 1e2db7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

@zsxwing I've updated this to use a lazy val; please take another look.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37245 has finished for PR 7385 at commit 1346313.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case JobSucceeded => scala.util.Success(resultFunc)
case JobFailed(e: Exception) => scala.util.Failure(e)
} else {
jobWaiter.awaitResult() match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems like a bad hack to use awaitResult to get the result. Rather, there should be a JobWaiter.jobResult (make it public), that return Option[JobResult] and use that.

@JoshRosen
Copy link
Contributor Author

I just pushed a commit which changes JobWaiter to extend Future[Unit], which I think should resolve the most recent round of review feedback.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37319 has finished for PR 7385 at commit e08623a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

throw new UnsupportedOperationException("taskSucceeded() called on a finished JobWaiter")
}
resultHandler(index, result.asInstanceOf[T])
finishedTasks += 1
if (finishedTasks == totalTasks) {
_jobFinished = true
jobResult = JobSucceeded
promise.trySuccess()
this.notifyAll()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed. Right?

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37677 has finished for PR 7385 at commit 1a19268.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37706 has finished for PR 7385 at commit 1a19268.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37745 has finished for PR 7385 at commit 1a19268.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// There are certain situations where jobFailed can be called multiple times for the same
// job. We guard against this by making this method idempotent.
if (!isCompleted) {
promise.failure(exception)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks tryFailure would be simpler.

@zsxwing
Copy link
Member

zsxwing commented Jul 22, 2015

retest this please

@zsxwing
Copy link
Member

zsxwing commented Jul 22, 2015

@tdas could you take a look at this PR again? I think it's better to merge this one at first.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #61 has finished for PR 7385 at commit 1a19268.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38083 has finished for PR 7385 at commit 1a19268.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #1162 has finished for PR 7385 at commit 1a19268.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38119 has finished for PR 7385 at commit 1a19268.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38258 has finished for PR 7385 at commit 692b3a4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38280 has finished for PR 7385 at commit 17edbcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #1191 has finished for PR 7385 at commit 17edbcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #1192 has finished for PR 7385 at commit 17edbcd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38573 has finished for PR 7385 at commit 17edbcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #2093 has finished for PR 7385 at commit 17edbcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 11, 2015

@JoshRosen could you close this one, since #9264 fixes the issue as well? Thanks!

@JoshRosen JoshRosen closed this Dec 11, 2015
@JoshRosen JoshRosen deleted the simplefutureaction-refactoring branch December 11, 2015 00:52
@jaceklaskowski
Copy link
Contributor

@rxin
Copy link
Contributor

rxin commented Jan 3, 2016

That's a good find.

cc @zsxwing

@zsxwing
Copy link
Member

zsxwing commented Jan 3, 2016

Thanks @jaceklaskowski I will submit a PR to remove submitJobThreadPool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants