Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12155] [SPARK-12253] Fix executor OOM in unified memory management #10240

Closed
wants to merge 5 commits into from

Conversation

andrewor14
Copy link
Contributor

Problem. In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills.

Example. Suppose total memory is 1000B, cached blocks occupy 900B, spark.memory.storageFraction is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this regression test that I stole from @JoshRosen.

Solution. Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B.

This patch also guards against the race condition (SPARK-12253):
(1) Existing tasks collectively occupy all execution memory
(2) New task comes in and blocks while existing tasks spill
(3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory
(4) New task still cannot acquire memory and goes back to sleep

@andrewor14 andrewor14 changed the title [SPARK-12155] Fix executor OOM in unified memory management [SPARK-12155] [SPARK-12253] Fix executor OOM in unified memory management Dec 10, 2015
@andrewor14
Copy link
Contributor Author

@JoshRosen @davies

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47474 has finished for PR 10240 at commit cd0c680.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"HBase class not found $e\")\n * logDebug(\"HBase class not found\", e)\n

// We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;
// if we can't give it this much now, wait for other tasks to free up memory
// (this happens if older tasks allocated lots of memory before N grew)
if (memoryFree >= math.min(maxToGrant, poolSize / (2 * numActiveTasks) - curMem)) {
if (memoryFree >= math.min(maxToGrant, poolSize / minMemoryPerTask)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poolSize / minMemoryPerTask should be minMemoryPerTask - curMem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops good catch!!

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47491 has finished for PR 10240 at commit d8be669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #2195 has finished for PR 10240 at commit d8be669.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

ok, retest this please

@andrewor14
Copy link
Contributor Author

Latest commit actually passed tests last night.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47528 has finished for PR 10240 at commit d8be669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Per suggestion from @davies. For a detailed proof of why these
two are the same, see this gist:
https://gist.github.com/andrewor14/aea58796dd25d2ec9f20
@andrewor14
Copy link
Contributor Author

retest this please

@andrewor14
Copy link
Contributor Author

@davies please look at the final changes.

* @param maybeGrowPool a callback that potentially grows the size of this pool. It takes in
* one parameter (Long) that represents the desired amount of memory by
* which this pool should be expanded.
* @param computeMaxPoolSize a callback that returns the maximum allowable size of this pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take into account the memory that can be freed then isn't this a fixed value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual used memory by storage could be changed, so it's not a fixed value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, I see now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because if storage memory used is below a certain mark (default 0.5 of max memory) then it cannot be evicted. In this case the max pool size depends on how much unevictable storage memory there is, which varies over time.

@davies
Copy link
Contributor

davies commented Dec 10, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #2198 has finished for PR 10240 at commit d8be669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * logInfo(s\"HBase class not found $e\")\n * logDebug(\"HBase class not found\", e)\n

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47539 has finished for PR 10240 at commit 2929640.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #2197 has finished for PR 10240 at commit d8be669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #2196 has finished for PR 10240 at commit d8be669.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

Thanks, merging into master 1.6.

asfgit pushed a commit that referenced this pull request Dec 10, 2015
**Problem.** In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills.

**Example.** Suppose total memory is 1000B, cached blocks occupy 900B, `spark.memory.storageFraction` is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this [regression test](https://github.com/andrewor14/spark/blob/fix-oom/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala#L233) that I stole from JoshRosen.

**Solution.** Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B.

This patch also guards against the race condition (SPARK-12253):
(1) Existing tasks collectively occupy all execution memory
(2) New task comes in and blocks while existing tasks spill
(3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory
(4) New task still cannot acquire memory and goes back to sleep

Author: Andrew Or <andrew@databricks.com>

Closes #10240 from andrewor14/fix-oom.

(cherry picked from commit 5030923)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 5030923 Dec 10, 2015
@andrewor14 andrewor14 deleted the fix-oom branch December 10, 2015 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants