HADOOP-19835 Make MapReduce Application Master class configurable in YARNRunner#8331
HADOOP-19835 Make MapReduce Application Master class configurable in YARNRunner#8331lewismc wants to merge 1 commit intoapache:trunkfrom
Conversation
| } | ||
|
|
||
| vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS); | ||
| String amClass = jobConf.get("yarn.app.mapreduce.am", |
There was a problem hiding this comment.
should it be "yarn.app.mapreduce.am.class" ?
adding a new config requires changing the constant, docs, default config file, etc.
There was a problem hiding this comment.
There was a problem hiding this comment.
Seems the Celeborn document presents another method: -Dyarn.app.mapreduce.am.command-opts=org.apache.celeborn.mapreduce.v2.app.MRAppMasterWithCeleborn, see https://github.com/apache/celeborn?tab=readme-ov-file#deploy-mapreduce-client
There was a problem hiding this comment.
Hi @RexXiong thanks.
I wasn't able to get that working because of how Hadoop builds the AM container command and what yarn.app.mapreduce.am.command-opts is used for.
How the AM is launched
In YARNRunner, the command for the AM container is built in two separate steps:
- Main class – One place in code does:
vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);So the main class is alwaysorg.apache.hadoop.mapreduce.v2.app.MRAppMaster. That value is hardcoded; no config key is read for it.
- Command opts –
yarn.app.mapreduce.am.command-optsis used elsewhere for JVM options or extra arguments. Those are merged into the same command, but they are not used as the main class. So they end up either:
- as JVM args (e.g.
-Xmx...), or - as arguments passed to the main class (i.e. to
MRAppMaster).
So the actual process looks like:
java [options from command-opts] org.apache.hadoop.mapreduce.v2.app.MRAppMaster [any extra args] 1>... 2>...
If you set:
-Dyarn.app.mapreduce.am.command-opts=org.apache.celeborn.mapreduce.v2.app.MRAppMasterWithCeleborn
then that string is treated as part of “options” or “extra args”. It does not replace the main class, so:
- JVM still runs
MRAppMasteras main. MRAppMasterWithCelebornis at best an argument toMRAppMaster, not the entry point.
The JVM never executes MRAppMasterWithCeleborn as the main class; it always runs MRAppMaster. I wasn't able to get the example from the Celeborn doc’s method running without this patch. The main class is fixed in code, and command-opts never controls it.
Unless I am mistaken, to actually run Celeborn’s AM, the main class in that launch command must be MRAppMasterWithCeleborn. The only way to do that with the current design is to change the code that builds the command so it takes the main class from config (e.g. yarn.app.mapreduce.am or as proposed by @pan3793 yarn.app.mapreduce.am.class) instead of always using APPLICATION_MASTER_CLASS. That’s what this patch for YARNRunner does; command-opts alone can’t do it.
Thanks for any feedback.
There was a problem hiding this comment.
A simple example command to reproduce and test
docker compose -f docker-compose.yml -f docker-compose.celeborn.yml exec -u hadoop namenode hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.3.jar pi 2 4
If you need access to the Docker composition to test, please let me know. Thank you.
There was a problem hiding this comment.
$ export CLASSPATH=...
$ java -Xmx1g HackMain Main foo bar
@lewismc, I guess in the above command, HackMain will run as the entrypoint?
There was a problem hiding this comment.
if so, this is a hack of the MR framework ... I think we should make the AM class configurable as you proposed
There was a problem hiding this comment.
@lewismc Agree with your proposal, make AM class configurable seems more reasonable.
There was a problem hiding this comment.
@RexXiong thanks for the feedback. Can you please provide guidance on expanding this PR? Anything in addition to
- changing
yarn.app.mapreduce.am-->yarn.app.mapreduce.am.class, and - changing the constant, docs, default config file, etc... how do I do this?
Thank you
|
cc @RexXiong, how was MR on Celeborn used without this change? |
|
💔 -1 overall
This message was automatically generated. |
Description of PR
See HADOOP-19835
How was this patch tested?
Tested with Hadoop 3.4.3 and Celeborn 0.6.2 and Nutch 1.23-SNAPSHOT. Nutch MapReduce smoke tests were run.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html