-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--jobconf mapred.task.timeout does not appear to change timeout #257
Comments
I'm not sure what's wrong, but I think we've seen a similar issue at Yelp. I'll post here if/when we find out anything more. |
Are you also sorting logs by IP to have 1 IP per line? :-p I'll let you know how redistributing the key space works. |
Haha, no, but that sounds like something that would produce very long lines for a handful of heavily trafficked IPs. :) |
I think I'm just trying to soak up way too much data in my [i for i in values] generator expansion call in one specific key. Will switch to a (slower) safer way of doing that. If you're curious, here's some errors: Redistributing keys has given a new error: I also found an exit status for the reduce task: 2011-11-08 21:20:32,987 INFO org.apache.hadoop.mapred.TaskInProgress (IPC Server handler 19 on 9001): Error from attempt_201111081851_0001_r_000060_2: java.io.IOException: Task process exit with nonzero status of 137. Waiting 5.0s for S3 eventual consistency The 1st try on this mapper finishes successfully. One issue that I "don't" know how to address is - I'm not getting my status messages or counters out, don't know where to find these guys. Also the log files under "steps" show the following strange behavior: 2011-11-08 20:15:29,332 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 98% And the offending node appears to have run out of memory: 2011-11-08 20:15:07,820 WARN org.apache.hadoop.dfs.DataNode (DataNode: [/mnt/var/lib/hadoop/dfs]): java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory 2011-11-08 20:15:10,260 WARN org.apache.hadoop.dfs.DataNode (DataNode: [/mnt/var/lib/hadoop/dfs]): java.io.IOException: Cannot run program "du": java.io.IOException: error=12, Cannot allocate memory |
Try the You'll have to use a key-pair file if you aren't already; see http://packages.python.org/mrjob/amazon.html#ssh-tunneling. |
Cool didn't know that about the "development" branch. |
You might try using the configure-hadoop bootstrap action to have the option set at bootstrap time rather than job run time. |
I'm pretty sure you have to do as I suggested for this option to work. Dave, please close if you agree, or someone provide evidence to the contrary. |
No, we've definitely made this work in the past with just |
It definitely works with |
Huh. Maybe it only worked with Hadoop 0.18. That's interesting. Is this something we want to add to the documentation, or should we just close this issue? |
Well first of all we need to fix this section: http://packages.python.org/mrjob/writing-and-running.html#increasing-the-task-timeout And then we can expand this section: http://packages.python.org/mrjob/writing-and-running.html#setting-parameters-in-mapred-site-xml |
Ah, right. Cool, changing this to a "Docs" issue. |
Hi guys,
I've been running a decently large job that I'm having trouble completing because one of the reducers times out on the file upload part after it completes. I tried adding --jobconf mapred.task.timeout=1800000 to the command line, but it appears that the job terminated after 10 mins of uploading time anyways. I'm not sure why the upload should be having problems as it doesn't seem that the file is outrageously large, but it is significantly larger than the outputs of most of the other reducers (the key space is not well balanced - I'm going to try fixing this but it shouldn't matter for this bug). Anyways, it seems like mrjob is timing out stuff in the upload state, and that setting mapred.task.timeout to a larger number isn't listened to.
I'm on development-v0.2.x. Am using
bootstrap_actions: &boots
and
base_tmp_dir: /mnt/tmp/
As far as other flags go.
The bug was still happening without the memory-intensive bootstrap, though I didn't try changing mapred.task.timeout without it.
Some log file stuff:
2011-11-08 02:03:40,421 INFO org.apache.hadoop.mapred.Merger (main): Merging 9 sorted segments
2011-11-08 02:03:41,284 INFO org.apache.hadoop.mapred.Merger (main): Down to the last merge-pass, with 9 segments left of total size: 1648153512 bytes
2011-11-08 02:03:42,535 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Creating new file 's3://path/to/output/part-00050' in S3
2011-11-08 02:03:42,537 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Outputstream for key 'path/to/output/part-00050' writing to tempfile '/mnt/var/lib/hadoop/s3/output-5430952274196003800.tmp'
2011-11-08 02:03:42,578 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1/0/0 in:0=1/92 [rec/s] out:0=0/92 [rec/s]
2011-11-08 02:03:42,579 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10/0/0 in:0=10/92 [rec/s] out:0=0/92 [rec/s]
2011-11-08 02:03:42,579 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100/0/0 in:1=100/92 [rec/s] out:0=0/92 [rec/s]
...
2011-11-08 02:11:14,944 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=9700000/210394/0 in:17798=9700000/545 [rec/s] out:386=210394/545 [rec/s]
2011-11-08 02:11:19,745 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=9800000/210394/0 in:17850=9800000/549 [rec/s] out:383=210394/549 [rec/s]
2011-11-08 02:11:24,922 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=9900000/210394/0 in:17870=9900000/554 [rec/s] out:379=210394/554 [rec/s]
2011-11-08 02:11:32,005 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10000000/210394/0 in:17793=10000000/562 [rec/s] out:374=210394/562 [rec/s]
End Of File
Anything else I can look at?
The text was updated successfully, but these errors were encountered: