Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1115] improvement: Unregister shuffle explicitly when application i… #1131

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

zhengchenyu
Copy link
Collaborator

What changes were proposed in this pull request?

For MR/Tez, as applications is stopped, we should unregister shuffle explicitly. There are no need to wait timeout.

Why are the changes needed?

Fix: #1115

Does this PR introduce any user-facing change?

No.

How was this patch tested?

test in yarn cluster

@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2023

Codecov Report

Merging #1131 (7fb0f91) into master (0bde6ba) will increase coverage by 1.17%.
Report is 3 commits behind head on master.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master    #1131      +/-   ##
============================================
+ Coverage     54.27%   55.45%   +1.17%     
+ Complexity     2561     2560       -1     
============================================
  Files           388      368      -20     
  Lines         21985    19648    -2337     
  Branches       1825     1828       +3     
============================================
- Hits          11933    10896    -1037     
+ Misses         9348     8115    -1233     
+ Partials        704      637      -67     
Files Changed Coverage Δ
...apache/hadoop/mapreduce/v2/app/RssMRAppMaster.java 0.00% <0.00%> (ø)
...n/java/org/apache/tez/dag/app/RssDAGAppMaster.java 48.30% <0.00%> (+0.71%) ⬆️
...rg/apache/tez/dag/app/TezRemoteShuffleManager.java 66.92% <0.00%> (-4.39%) ⬇️

... and 25 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jerqi
Copy link
Contributor

jerqi commented Aug 11, 2023

Could we add UT for this pr?

}

private static final Logger LOG = LoggerFactory.getLogger(RssMRAppMaster.class);

@Override
protected void serviceStop() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be invoked explicitly by MR ?

Copy link
Collaborator Author

@zhengchenyu zhengchenyu Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! In general, this will be invoked explicitly.
Below is the steps:
(1) KillNewJobTransition/CommitSucceededTransition/InternalTerminationTransition => job.finished
(2) In job.finished, send JobFinishEvent
(3) JobFinishEventHandler handle the JobFinishEvent and called shutDownJob
(4) shutDownJob => MRAppMaster.this.stop() => serviceStop

Below is related logs:

2023-08-11 14:19:40,033 INFO [Thread-132] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
2023-08-11 14:19:40,033 INFO [Thread-132] org.apache.hadoop.mapreduce.v2.app.RssMRAppMaster: Unregister shuffle for app appattempt_1691730663832_0001_000001
2023-08-11 14:19:40,046 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e3969_1691730663832_0001_01_000035

@@ -109,6 +109,14 @@ public void start() throws Exception {

@Override
public void shutdown() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto?

@zhengchenyu
Copy link
Collaborator Author

Could we add UT for this pr?

I am confused about UT. This PR make effect in main. For now, I did not find a good way to construct UT yet.
I think if integration test pass, there will be no any problem.

@jerqi
Copy link
Contributor

jerqi commented Aug 11, 2023

Could we add UT for this pr?

I am confused about UT. This PR make effect in main. For now, I did not find a good way to construct UT yet. I think if integration test pass, there will be no any problem.

OK.

@zuston zuston merged commit 110531d into apache:master Aug 11, 2023
32 checks passed
@zhengchenyu zhengchenyu deleted the branch-1115 branch October 20, 2023 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Unregister shuffle explicitly when application is stopped.
4 participants