-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1115] improvement: Unregister shuffle explicitly when application i… #1131
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1131 +/- ##
============================================
+ Coverage 54.27% 55.45% +1.17%
+ Complexity 2561 2560 -1
============================================
Files 388 368 -20
Lines 21985 19648 -2337
Branches 1825 1828 +3
============================================
- Hits 11933 10896 -1037
+ Misses 9348 8115 -1233
+ Partials 704 637 -67
... and 25 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Could we add UT for this pr? |
} | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(RssMRAppMaster.class); | ||
|
||
@Override | ||
protected void serviceStop() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be invoked explicitly by MR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! In general, this will be invoked explicitly.
Below is the steps:
(1) KillNewJobTransition/CommitSucceededTransition/InternalTerminationTransition => job.finished
(2) In job.finished, send JobFinishEvent
(3) JobFinishEventHandler handle the JobFinishEvent and called shutDownJob
(4) shutDownJob => MRAppMaster.this.stop() => serviceStop
Below is related logs:
2023-08-11 14:19:40,033 INFO [Thread-132] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
2023-08-11 14:19:40,033 INFO [Thread-132] org.apache.hadoop.mapreduce.v2.app.RssMRAppMaster: Unregister shuffle for app appattempt_1691730663832_0001_000001
2023-08-11 14:19:40,046 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e3969_1691730663832_0001_01_000035
@@ -109,6 +109,14 @@ public void start() throws Exception { | |||
|
|||
@Override | |||
public void shutdown() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto?
I am confused about UT. This PR make effect in main. For now, I did not find a good way to construct UT yet. |
OK. |
What changes were proposed in this pull request?
For MR/Tez, as applications is stopped, we should unregister shuffle explicitly. There are no need to wait timeout.
Why are the changes needed?
Fix: #1115
Does this PR introduce any user-facing change?
No.
How was this patch tested?
test in yarn cluster