refactor generate_optional_value function & add Terasort testcase#312
refactor generate_optional_value function & add Terasort testcase#312carsonwang merged 5 commits intoIntel-bigdata:6.0from
Conversation
… right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability.
| # CDH release | ||
| elif HibenchConf['hibench.hadoop.release'].startswith('cdh'): | ||
| HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf[ | ||
| 'hibench.hadoop.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar") |
There was a problem hiding this comment.
Similar to the example jar, is there another path for CDH here?
| # set hibench.sleep.job.jar | ||
| if not HibenchConf.get('hibench.sleep.job.jar', ''): | ||
| if HibenchConf['hibench.hadoop.release'] == 'apache' and HibenchConf["hibench.hadoop.version"] == "hadoop1": | ||
| if HibenchConf['hibench.hadoop.release'] == 'apache': |
There was a problem hiding this comment.
According to the original condition, this only applies to hadoop1. For hadoop2, the path is different. We should remove this.
| # CDH release | ||
| elif HibenchConf["hibench.hadoop.release"].startswith("cdh"): | ||
| HibenchConf["hibench.hadoop.configure.dir"] = join(HibenchConf["hibench.hadoop.home"], "etc", "hadoop") | ||
| HibenchConfRef["hibench.hadoop.configure.dir"] = "Inferred by: & 'hibench.hadoop.release'" |
There was a problem hiding this comment.
There is no difference for apache, hdp, cdh anymore? Can we combine these and removing the if else.
| # determine running mode according to spark master configuration | ||
| if not (HibenchConf.get("hibench.masters.hostnames", "") or HibenchConf.get("hibench.slaves.hostnames", "")): # no pre-defined hostnames, let's probe | ||
| if not (HibenchConf.get("hibench.masters.hostnames", "") or HibenchConf.get("hibench.slaves.hostnames", | ||
| "")): # no pre-defined hostnames, let's probe |
There was a problem hiding this comment.
Is this the right style for python we are going to follow?
There was a problem hiding this comment.
Probably not, the two rows have the same length while they follow different styles.
For the newest code of these two lines, there are many line feeds. It seems weird for new python developers such as me, but it accords with pep8
| probe_java_opts() | ||
| #test_succeed() | ||
|
|
||
| def test_succeed(): |
There was a problem hiding this comment.
We need write some unit tests later to test this.
| log(spark_master, HibenchConf['hibench.masters.hostnames']) | ||
| with closing(urllib.urlopen('http://%s:8080' % HibenchConf['hibench.masters.hostnames'])) as page: | ||
| worker_hostnames=[re.findall("http:\/\/([a-zA-Z\-\._0-9]+):8081", x)[0] for x in page.readlines() if "8081" in x and "worker" in x] | ||
| worker_hostnames = [re.findall("http:\/\/([a-zA-Z\-\._0-9]+):8081", x)[0] for x in page.readlines() |
There was a problem hiding this comment.
We need fix the hard coded port number later.
…e the codes and remove some unuseful codes
…andalone and Spark on yarn
| bufsize=0, # default value of 0 (unbuffered) is best | ||
| shell=True, | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE |
There was a problem hiding this comment.
It seems the style here is not consistent with others. A space is need before and after =? Is this caused by the auto formatter?
There was a problem hiding this comment.
Yes, it's modified by autopep8, you can install it by pip install autopep8
| SIZE=`dir_size $INPUT_HDFS` | ||
| START_TIME=`timestamp` | ||
| run-hadoop-job ${HADOOP_EXAMPLES_JAR} terasort \ | ||
| -D ${REDUCER_CONFIG_NAME}=${NUM_REDS} \ |
There was a problem hiding this comment.
Can you update this to -D mapreduce.job.reduces=${NUM_REDS}? REDUCER_CONFIG_NAME will be removed later because mapreduce.job.reduces is the only value of it.
| START_TIME=`timestamp` | ||
| run-hadoop-job ${HADOOP_EXAMPLES_JAR} teragen \ | ||
| -D ${MAP_CONFIG_NAME}=${NUM_MAPS} \ | ||
| -D ${REDUCER_CONFIG_NAME}=${NUM_REDS} \ |
There was a problem hiding this comment.
Do not use MAP_CONFIG_NAME and REDUCER_CONFIG_NAME here as well
|
Thanks @gczsjdy for the work! |
…tel-bigdata#312) * refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability. * Change probe_java_opts function to deal with any weird xml style, tune the codes and remove some unuseful codes * Use autopep8 to standardize the code * Add bin/conf for terasort, already finished test for Hadoop, Spark Standalone and Spark on yarn * Use new config name instead of the old
* refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability. * Change probe_java_opts function to deal with any weird xml style, tune the codes and remove some unuseful codes * Use autopep8 to standardize the code * Add bin/conf for terasort, already finished test for Hadoop, Spark Standalone and Spark on yarn * Use new config name instead of the old
refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability