-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encountered problems with Hibench and question about concurrency #77
Comments
Thanks for your feedback! For hive concurrency mode, you need to config something like Also, could you file your fixes for HiBench as pull requests? it would be great to see more contributors and a better HiBench. We will investigate your other issues a little bit later, since the Chinese new year is on the way. We are going to have a holiday. Happy New Year and Thank you again! |
Hello again! Happy New (or maybe Goat) year! I am coming back now since I didn 't have any answer of the other issues and to let you know about how I fixed the issue with hivebench running in parallel. Actually, making hivebench run in parallel was a little bit more difficult than I expected, so, I am going to copy the links I found and helped me fix it. Fyi I have a remote metastore database and a local metastore server.
Problems I encountered:
I hope this will help other guys too! I am waiting for your answer about running in parallel the nutchindexing benchmark. |
Sorry for leaving it for so long... was working on something else these days. For Mahout versions, as far as I am concerned we are using the same version. You can even ignore the mahout hibench provided, but set you own MAHOUT_HOME to benchmark any compatible mahout, unless it doesn't support arguments we are using(we didn't test all mahout versions, but I think most of them would work). For the nutchindexing problem, it may results from a not clean config. We are switch between different configurations according to your hadoop deployment and this could cause some problem. For the dfsioe, it is a good catch. Maybe we need to handle the case when user gives us an empty configuration. If I still miss anything, feel free to let me know. You are really helping us a lot and we do appreciate everything you did. Again, we'd like you to file your fixes as pull request, so that we can review it in detail and hopefully merge them into trunk. And it would be great to see more contributors and a better HiBench. I'll file some bugs as we discussed here separately. Thanks a lot! |
Oh and for the nutchindexing temp file, can you specify which temp file we are using? |
Hello,
I've been using hadoop and Hibench for 2,5 months and I have experienced some problems as I was working with this. Now, it looks that everything is ok and all the benchmarks run BUT I still have some problems with nutchindexing and hivebench when I run these more than once in parallel (concurrently). That's why I need your valuable help!
This is my bin/hibench-config.sh:
export JAVA_HOME=/home/hduser/jdk1.7.0_51
export HADOOP_HOME=/home/hduser/hadoop2.5.1
export HADOOP_EXECUTABLE=/home/hduser/hadoop2.5.1/bin/hadoop
export HADOOP_CONF_DIR=/home/hduser/hadoop2.5.1/etc/hadoop
export HADOOP_EXAMPLES_JAR=/home/hduser/hadoop2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar
export MAPRED_EXECUTABLE=/home/hduser/hadoop2.5.1/bin/mapred
Set the varaible below only in YARN mode
export HADOOP_JOBCLIENT_TESTS_JAR=/home/hduser/hadoop2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.1-tests.jar
export HADOOP_MAPRED_HOME=/home/hduser/hadoop2.5.1
export HADOOP_VERSION=hadoop2 # set it to hadoop1 to enable MR1, hadoop2 to enable MR2
Here are the possible bugs I noticed and (almost) fixed in order to run all the benchmarks:
MAP_JAVA_OPTS=
cat $HADOOP_CONF_DIR/mapred-site.xml | grep "mapreduce.map.java.opts" | awk -F\< '{print $5}' | awk -F\> '{print $NF}'
RED_JAVA_OPTS=
cat $HADOOP_CONF_DIR/mapred-site.xml | grep "mapreduce.reduce.java.opts" | awk -F\< '{print $5}' | awk -F\> '{print $NF}'
I don't know if I did something wrong but when I fixed that dfsioe worked! So, how else should this be fixed?
(something which was not mentioned here: https://github.com/intel-hadoop/HiBench and it wasn't required to run hadoop is to set memory limits in mapred-site.xml and yarn-site.xml)
About the concurrent run:
I have noticed that nutchindexing uses a temp file which erases in the end of each run so I think this is one reason nutchindexing can't run more than one time concurrently. Also sometimes the benchmarks "breaks" and can't run properly again so I delete everything in common/hibench/nutchindexing/* and I run mvn process-sources again. Is there any solution for this please?
FATAL indexer.Indexer: Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
About hivebench I run the suggested command "hive --service metastore" but it doesn't give any different results than without running it. How hivebench can run concurrently as you mention in your paper?
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:626)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.Session
HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1453)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java
:63)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.ja
va:73)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2664)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2683)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:425)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.
java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1451)
... 12 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database
. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection
pool (set lazyInit to true if you expect to start your database after your app). Original Exception: --
java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppC
lassLoader@5947e54e, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
e.t.c. . . . . . . . . .
I am looking forward for your answer!
Thanks.
The text was updated successfully, but these errors were encountered: