YARN-11488. Handling CONTAINER_EXPIRED event will throw NEP if the reservation is removed from node #5627
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In AbstractYarnScheduler::completeOustandingUpdatesWhichAreReserved(), after getReservedContainer(), there is a certain possibility that the reservedContainer is removed by calling SchedulerNode::setReservedContainer() asynchronously. It will throw NEP and resourcemanager would crash like below trace log.
2023-05-07 02:04:38,201 FATAL [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type CONTAINER_EXPIRED to the Event Dispatcher
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completeOustandingUpdatesWhichAreReserved(AbstractYarnScheduler.java:725)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:686)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1927)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:172)
at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:74)
at java.lang.Thread.run(Thread.java:748)
2023-05-07 02:04:38,201 INFO [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..