-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-11191 Global Scheduler refreshQueue cause deadLock #4726
base: trunk
Are you sure you want to change the base?
Conversation
@yb12138 I want to ask two question:
1.Though you use tryLock and park, so refresh queue thread switch to block state, but this thread still hold PremmptionManager lock ,so scheduler thread still can't allocate new container. Is it right? 2.Does this issue related to global Scheduler or just the preemption function? Looking forward to your reply, thanks! |
@luoyuan3471 |
CapacityScheduler.refreshQueue: hold: PremmptionManager.writeLock CapacityScheduler.schedule: hold: csqueue.readLock other thread(completeContainer,release Resource,etc.): require: csqueue.writeLock @yb12138 Very sorry, I'm still a little confused on this point. Can you explain more about it? Thank you! |
@luoyuan3471 |
💔 -1 overall
This message was automatically generated. |
Do you mean readLock.tryLock() will make readLock place first ,though a write lock request is already in the head of waiting queue? @yb12138 |
@luoyuan3471 |
Thanks for your explanation. I checked the code. You're right! |
For 2, Why does Global Scheduler increase the chance of this dead lock case? |
Global Scheduler has muti-threads to assign containers normally. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Override | ||
public List<CSQueue> getChildQueuesByTryLock() { | ||
try { | ||
while (!readLock.tryLock()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a regular lock()?
@@ -25,10 +25,7 @@ | |||
import org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueue; | |||
import org.apache.hadoop.yarn.util.resource.Resources; | |||
|
|||
import java.util.Collections; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid.
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception { | |||
Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize()); | |||
rm1.close(); | |||
} | |||
@Test | |||
public void testRefreshQueueWithOpenPreemption() throws Exception { | |||
CapacitySchedulerConfiguration csConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line limit is 100 chars so this should fit.
mgr.init(conf); | ||
MockRM rm1 = new MockRM(csConf); | ||
CapacityScheduler scheduler=(CapacityScheduler) rm1.getResourceScheduler(); | ||
PreemptionManager preemptionManager = scheduler.getPreemptionManager();; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;;
YarnConfiguration conf=new YarnConfiguration(csConf); | ||
conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class, | ||
ResourceScheduler.class); | ||
RMNodeLabelsManager mgr=new NullRMNodeLabelsManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces
csConf.setUserLimitFactor("root.a", 100); | ||
|
||
YarnConfiguration conf=new YarnConfiguration(csConf); | ||
conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line
} catch (InterruptedException e) { | ||
e.printStackTrace(); | ||
} | ||
preemptionManager.getKillableContainers("a", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line
Thread refreshQueueThread = new Thread(()->{ | ||
preemptionManager.getWriteLock().lock(); | ||
try { | ||
Thread.sleep(1000 * 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception { | |||
Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize()); | |||
rm1.close(); | |||
} | |||
@Test | |||
public void testRefreshQueueWithOpenPreemption() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a description explaining the locking part.
Description of PR
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?