New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mrjob cannot be run in company with pymongo on hadoop #913
Comments
How did you solve this issue, since I encountered the same problem. |
You'll see in the That's the web page for your job, if you go to that link you'll be able to see the status of your job, as well as which map/reduce tasks are failing. If you click the little number (under the Failed/Killed section) you'll be able to see the STDOUT/STDERR logs for each individual task, and also the python exception that's causing your job to fail. Hope this helps. |
Without the stderr/stdout/syslog, it is not possible to determine cause of failure. @whzhcahzxh -- did you install pip and pip install pymongo (and any yum dependencies) in your bootstrap? |
Thanks for letting me know, but I can't proceed without more information, so closing this ticket for now. |
Hi all... I have the same problem:
my log is:
So I don't know what is wrong. Thank you very much for helping. |
This is hard work. I write a map-reduce script runs well on my local pc. When i try to run it on hadoop (by -r hadoop), it gets error like:
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/step1.root.20140606.091711.815391
writing wrapper script to /tmp/step1.root.20140606.091711.815391/setup-wrapper.sh
reading from STDIN
Copying local files into hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/
Using Hadoop version 2.0.0
HADOOP: packageJobJar: [] [/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/hadoop-streaming.jar] /tmp/streamjob8615643898520402804.jar tmpDir=null
HADOOP: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
HADOOP: Total input paths to process : 1
HADOOP: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
HADOOP: Running job: job_201405161502_0059
HADOOP: To kill this job, run:
HADOOP: /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=v-lab-110:8021 -kill job_201405161502_0059
HADOOP: Tracking URL: http://v-lab-110:50030/jobdetails.jsp?jobid=job_201405161502_0059
HADOOP: map 0% reduce 0%
HADOOP: map 100% reduce 100%
HADOOP: To kill this job, run:
HADOOP: /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=v-lab-110:8021 -kill job_201405161502_0059
HADOOP: Tracking URL: http://v-lab-110:50030/jobdetails.jsp?jobid=job_201405161502_0059
HADOOP: Job not successful. Error: NA
HADOOP: killJob...
HADOOP: Streaming Command Failed!
Job failed with return code 256: ['/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop', 'jar', '/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/hadoop-streaming.jar', '-files', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/step1.py#step1.py', '-archives', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/STDIN', '-output', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/step-output/1', '-mapper', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --mapper', '-combiner', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --combiner', '-reducer', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --reducer']
Scanning logs for probable cause of failure
Traceback (most recent call last):
File "step1.py", line 176, in
step.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 494, in run
mr_job.execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 512, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 147, in execute
self.run_job()
File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 208, in run_job
runner.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run
self._run()
File "/usr/local/lib/python2.7/site-packages/mrjob/hadoop.py", line 239, in _run
self._run_job_in_hadoop()
File "/usr/local/lib/python2.7/site-packages/mrjob/hadoop.py", line 358, in _run_job_in_hadoop
raise CalledProcessError(returncode, step_args)
subprocess.CalledProcessError: Command '['/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop', 'jar', '/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/hadoop-streaming.jar', '-files', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/step1.py#step1.py', '-archives', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/files/STDIN', '-output', 'hdfs:///user/root/tmp/mrjob/step1.root.20140606.091711.815391/step-output/1', '-mapper', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --mapper', '-combiner', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --combiner', '-reducer', 'sh -e setup-wrapper.sh python step1.py --step-num=0 --reducer']' returned non-zero exit status 256
I tried every situation and found out that if i import pymongo in step1.py, it gets this error. I think it is a bug.
The text was updated successfully, but these errors were encountered: