New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12155] [SPARK-12253] Fix executor OOM in unified memory management #10240
Conversation
Test build #47474 has finished for PR 10240 at commit
|
// We want to let each task get at least 1 / (2 * numActiveTasks) before blocking; | ||
// if we can't give it this much now, wait for other tasks to free up memory | ||
// (this happens if older tasks allocated lots of memory before N grew) | ||
if (memoryFree >= math.min(maxToGrant, poolSize / (2 * numActiveTasks) - curMem)) { | ||
if (memoryFree >= math.min(maxToGrant, poolSize / minMemoryPerTask)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poolSize / minMemoryPerTask
should be minMemoryPerTask - curMem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops good catch!!
Test build #47491 has finished for PR 10240 at commit
|
Test build #2195 has finished for PR 10240 at commit
|
ok, retest this please |
Latest commit actually passed tests last night. |
Test build #47528 has finished for PR 10240 at commit
|
Per suggestion from @davies. For a detailed proof of why these two are the same, see this gist: https://gist.github.com/andrewor14/aea58796dd25d2ec9f20
retest this please |
@davies please look at the final changes. |
* @param maybeGrowPool a callback that potentially grows the size of this pool. It takes in | ||
* one parameter (Long) that represents the desired amount of memory by | ||
* which this pool should be expanded. | ||
* @param computeMaxPoolSize a callback that returns the maximum allowable size of this pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take into account the memory that can be freed then isn't this a fixed value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual used memory by storage could be changed, so it's not a fixed value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I see now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because if storage memory used is below a certain mark (default 0.5 of max memory) then it cannot be evicted. In this case the max pool size depends on how much unevictable storage memory there is, which varies over time.
LGTM |
Test build #2198 has finished for PR 10240 at commit
|
Test build #47539 has finished for PR 10240 at commit
|
Test build #2197 has finished for PR 10240 at commit
|
Test build #2196 has finished for PR 10240 at commit
|
Thanks, merging into master 1.6. |
**Problem.** In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills. **Example.** Suppose total memory is 1000B, cached blocks occupy 900B, `spark.memory.storageFraction` is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this [regression test](https://github.com/andrewor14/spark/blob/fix-oom/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala#L233) that I stole from JoshRosen. **Solution.** Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B. This patch also guards against the race condition (SPARK-12253): (1) Existing tasks collectively occupy all execution memory (2) New task comes in and blocks while existing tasks spill (3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory (4) New task still cannot acquire memory and goes back to sleep Author: Andrew Or <andrew@databricks.com> Closes #10240 from andrewor14/fix-oom. (cherry picked from commit 5030923) Signed-off-by: Andrew Or <andrew@databricks.com>
Problem. In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills.
Example. Suppose total memory is 1000B, cached blocks occupy 900B,
spark.memory.storageFraction
is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this regression test that I stole from @JoshRosen.Solution. Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B.
This patch also guards against the race condition (SPARK-12253):
(1) Existing tasks collectively occupy all execution memory
(2) New task comes in and blocks while existing tasks spill
(3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory
(4) New task still cannot acquire memory and goes back to sleep