Skip to content
This repository was archived by the owner on Dec 15, 2025. It is now read-only.

refactor generate_optional_value function & add Terasort testcase#312

Merged
carsonwang merged 5 commits intoIntel-bigdata:6.0from
gcz2022:6.0
Sep 23, 2016
Merged

refactor generate_optional_value function & add Terasort testcase#312
carsonwang merged 5 commits intoIntel-bigdata:6.0from
gcz2022:6.0

Conversation

@gcz2022
Copy link
Copy Markdown
Contributor

@gcz2022 gcz2022 commented Sep 20, 2016

refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability

… right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability.
@gcz2022 gcz2022 changed the title refactor generate_optional_value function to probe the JAVA_OPTS in a… refactor generate_optional_value function Sep 20, 2016
Comment thread bin/functions/load-config.py Outdated
# CDH release
elif HibenchConf['hibench.hadoop.release'].startswith('cdh'):
HibenchConf["hibench.hadoop.examples.test.jar"] = OneAndOnlyOneFile(HibenchConf[
'hibench.hadoop.home'] + "/share/hadoop/mapreduce2/hadoop-mapreduce-client-jobclient*-tests.jar")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the example jar, is there another path for CDH here?

Comment thread bin/functions/load-config.py Outdated
# set hibench.sleep.job.jar
if not HibenchConf.get('hibench.sleep.job.jar', ''):
if HibenchConf['hibench.hadoop.release'] == 'apache' and HibenchConf["hibench.hadoop.version"] == "hadoop1":
if HibenchConf['hibench.hadoop.release'] == 'apache':
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the original condition, this only applies to hadoop1. For hadoop2, the path is different. We should remove this.

Comment thread bin/functions/load-config.py Outdated
# CDH release
elif HibenchConf["hibench.hadoop.release"].startswith("cdh"):
HibenchConf["hibench.hadoop.configure.dir"] = join(HibenchConf["hibench.hadoop.home"], "etc", "hadoop")
HibenchConfRef["hibench.hadoop.configure.dir"] = "Inferred by: & 'hibench.hadoop.release'"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference for apache, hdp, cdh anymore? Can we combine these and removing the if else.

Comment thread bin/functions/load-config.py Outdated
# determine running mode according to spark master configuration
if not (HibenchConf.get("hibench.masters.hostnames", "") or HibenchConf.get("hibench.slaves.hostnames", "")): # no pre-defined hostnames, let's probe
if not (HibenchConf.get("hibench.masters.hostnames", "") or HibenchConf.get("hibench.slaves.hostnames",
"")): # no pre-defined hostnames, let's probe
Copy link
Copy Markdown
Collaborator

@carsonwang carsonwang Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right style for python we are going to follow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, the two rows have the same length while they follow different styles.
For the newest code of these two lines, there are many line feeds. It seems weird for new python developers such as me, but it accords with pep8

probe_java_opts()
#test_succeed()

def test_succeed():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need write some unit tests later to test this.

Comment thread bin/functions/load-config.py Outdated
log(spark_master, HibenchConf['hibench.masters.hostnames'])
with closing(urllib.urlopen('http://%s:8080' % HibenchConf['hibench.masters.hostnames'])) as page:
worker_hostnames=[re.findall("http:\/\/([a-zA-Z\-\._0-9]+):8081", x)[0] for x in page.readlines() if "8081" in x and "worker" in x]
worker_hostnames = [re.findall("http:\/\/([a-zA-Z\-\._0-9]+):8081", x)[0] for x in page.readlines()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need fix the hard coded port number later.

@gcz2022 gcz2022 changed the title refactor generate_optional_value function refactor generate_optional_value function & add Terasort bin/conf Sep 22, 2016
@gcz2022 gcz2022 changed the title refactor generate_optional_value function & add Terasort bin/conf refactor generate_optional_value function & add Terasort testcase Sep 23, 2016
bufsize=0, # default value of 0 (unbuffered) is best
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
Copy link
Copy Markdown
Collaborator

@carsonwang carsonwang Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the style here is not consistent with others. A space is need before and after =? Is this caused by the auto formatter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's modified by autopep8, you can install it by pip install autopep8

SIZE=`dir_size $INPUT_HDFS`
START_TIME=`timestamp`
run-hadoop-job ${HADOOP_EXAMPLES_JAR} terasort \
-D ${REDUCER_CONFIG_NAME}=${NUM_REDS} \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this to -D mapreduce.job.reduces=${NUM_REDS}? REDUCER_CONFIG_NAME will be removed later because mapreduce.job.reduces is the only value of it.

START_TIME=`timestamp`
run-hadoop-job ${HADOOP_EXAMPLES_JAR} teragen \
-D ${MAP_CONFIG_NAME}=${NUM_MAPS} \
-D ${REDUCER_CONFIG_NAME}=${NUM_REDS} \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use MAP_CONFIG_NAME and REDUCER_CONFIG_NAME here as well

@carsonwang
Copy link
Copy Markdown
Collaborator

Thanks @gczsjdy for the work!

@carsonwang carsonwang merged commit d8bd401 into Intel-bigdata:6.0 Sep 23, 2016
carsonwang pushed a commit to carsonwang/HiBench that referenced this pull request Nov 7, 2016
…tel-bigdata#312)

* refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability.

* Change probe_java_opts function to deal with any weird xml style, tune the codes and remove some unuseful codes

* Use autopep8 to standardize the code

* Add bin/conf for terasort, already finished test for Hadoop, Spark Standalone and Spark on yarn

* Use new config name  instead of the old
carsonwang pushed a commit that referenced this pull request Nov 8, 2016
* refactor generate_optional_value function to probe the JAVA_OPTS in a right way, gain a clearer hint & comments, wipe out support for CDH4 and MR1, and grant more readability.

* Change probe_java_opts function to deal with any weird xml style, tune the codes and remove some unuseful codes

* Use autopep8 to standardize the code

* Add bin/conf for terasort, already finished test for Hadoop, Spark Standalone and Spark on yarn

* Use new config name  instead of the old
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants