Permission Denied error triggered by Python subprocess #40

Closed
tylere opened this Issue Nov 16, 2011 · 4 comments

Comments

Projects
None yet
2 participants

tylere commented Nov 16, 2011

When trying to get the hadoopy wordcount test to work, I am running into a "OSError: [Errno 13] Permission denied" error during the Python subprocess. Since the subprocess is triggered by hadoopy/_main.c, my normal python debugging process (pdb) doesn't seem to work.

Any suggestions as to how to find out what file/directory hadoopy is trying to access (and as what user)?

I am able to run the Cloudera example grep job (standalone mode) and the hadoopy wordcount command line test, but the hadoopy python wordcount test fails.

Environment:

  • Ubuntu 10.04
  • Cloudera Hadoop Distribution CDH3 (stand-alone mode)

The Hadoop output with the Pemission Denied error is listed below.

rm playground/out/ -fr; python -c "import hadoopy; out = hadoopy.launch('playground/wc-input-alice.txt', 'playground/out/', 'wc.py')"
/\----------Hadoop Output----------/\
hadoopy: Running[hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar -output playground/out/ -input playground/wc-input-alice.txt -mapper "python wc.py pipe map" -reducer "python wc.py pipe reduce" -file wc.py -jobconf mapred.job.name=wc -io typedbytes -outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat -inputformat AutoInputFormat]
11/11/16 11:41:08 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [wc.py] [] /tmp/streamjob2130356558435301069.jar tmpDir=null
11/11/16 11:41:08 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/11/16 11:41:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/11/16 11:41:09 INFO mapred.FileInputFormat: Total input paths to process : 1
11/11/16 11:41:09 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/.job.split.crc <- /app/hadoop/tmp/mapred/local/localRunner/.job.split.crc
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/job.split <- /app/hadoop/tmp/mapred/local/localRunner/job.split
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/job.jar <- /app/hadoop/tmp/mapred/local/localRunner/job.jar
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/.job.xml.crc <- /app/hadoop/tmp/mapred/local/localRunner/.job.xml.crc
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/.job.splitmetainfo.crc <- /app/hadoop/tmp/mapred/local/localRunner/.job.splitmetainfo.crc
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/job.xml <- /app/hadoop/tmp/mapred/local/localRunner/job.xml
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/.job.jar.crc <- /app/hadoop/tmp/mapred/local/localRunner/.job.jar.crc
11/11/16 11:41:09 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/staging/taericks-1131214868/.staging/job_local_0001/job.splitmetainfo <- /app/hadoop/tmp/mapred/local/localRunner/job.splitmetainfo
11/11/16 11:41:09 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
11/11/16 11:41:09 INFO streaming.StreamJob: Running job: job_local_0001
11/11/16 11:41:09 INFO streaming.StreamJob: Job running in-process (local Hadoop)
11/11/16 11:41:09 WARN snappy.LoadSnappy: Snappy native library is available
11/11/16 11:41:09 INFO snappy.LoadSnappy: Snappy native library loaded
11/11/16 11:41:09 INFO mapred.MapTask: numReduceTasks: 1
11/11/16 11:41:09 INFO mapred.MapTask: io.sort.mb = 100
11/11/16 11:41:09 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/16 11:41:09 INFO mapred.MapTask: record buffer = 262144/327680
11/11/16 11:41:09 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
11/11/16 11:41:09 INFO streaming.PipeMapRed: PipeMapRed exec [/home/terickson/.virtualenvs/hadoopy/bin/python, wc.py, pipe, map]
11/11/16 11:41:09 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
11/11/16 11:41:09 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
11/11/16 11:41:09 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
Traceback (most recent call last):
  File "wc.py", line 15, in <module>
    hadoopy.run(mapper, reducer, doc=__doc__)
  File "_main.pyx", line 445, in _main.run (hadoopy/_main.c:5554)
  File "/usr/lib/python2.6/subprocess.py", line 480, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.6/subprocess.py", line 633, in __init__
    errread, errwrite)
  File "/usr/lib/python2.6/subprocess.py", line 1138, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied
11/11/16 11:41:09 INFO streaming.PipeMapRed: MRErrorThread done
11/11/16 11:41:09 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
11/11/16 11:41:09 INFO streaming.PipeMapRed: log:null
R/W/S=2274/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=taericks
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Wed Nov 16 11:41:09 PST 2011
java.io.IOException: Broken pipe
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:297)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.io.WritableUtils.writeString(WritableUtils.java:100)
    at org.apache.hadoop.typedbytes.TypedBytesOutput.writeString(TypedBytesOutput.java:223)
    at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeText(TypedBytesWritableOutput.java:182)
    at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.write(TypedBytesWritableOutput.java:126)
    at org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeTypedBytes(TypedBytesInputWriter.java:57)
    at org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeValue(TypedBytesInputWriter.java:52)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)


11/11/16 11:41:09 WARN streaming.PipeMapRed: java.io.IOException: Broken pipe
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:297)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.DataOutputStream.flush(DataOutputStream.java:123)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:125)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

11/11/16 11:41:09 INFO streaming.PipeMapRed: mapRedFinished
11/11/16 11:41:09 WARN streaming.PipeMapRed: java.io.IOException: Stream Closed
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:297)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.DataOutputStream.flush(DataOutputStream.java:123)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

11/11/16 11:41:09 INFO streaming.PipeMapRed: mapRedFinished
11/11/16 11:41:09 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: log:null
R/W/S=2274/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=taericks
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Wed Nov 16 11:41:09 PST 2011
java.io.IOException: Broken pipe
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:297)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.io.WritableUtils.writeString(WritableUtils.java:100)
    at org.apache.hadoop.typedbytes.TypedBytesOutput.writeString(TypedBytesOutput.java:223)
    at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeText(TypedBytesWritableOutput.java:182)
    at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.write(TypedBytesWritableOutput.java:126)
    at org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeTypedBytes(TypedBytesInputWriter.java:57)
    at org.apache.hadoop.streaming.io.TypedBytesInputWriter.writeValue(TypedBytesInputWriter.java:52)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)


    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:126)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
11/11/16 11:41:10 INFO streaming.StreamJob:  map 0%  reduce 0%
11/11/16 11:41:10 INFO streaming.StreamJob: Job running in-process (local Hadoop)
11/11/16 11:41:10 ERROR streaming.StreamJob: Job not successful. Error: NA
11/11/16 11:41:10 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/terickson/.virtualenvs/hadoopy/lib/python2.6/site-packages/hadoopy/_runner.py", line 214, in launch
    raise subprocess.CalledProcessError(process.returncode, ' '.join(cmd))
subprocess.CalledProcessError: Command 'hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar -output playground/out/ -input playground/wc-input-alice.txt -mapper "python wc.py pipe map" -reducer "python wc.py pipe reduce" -file wc.py -jobconf mapred.job.name=wc -io typedbytes -outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat -inputformat AutoInputFormat' returned non-zero exit status 1
Owner

bwhite commented Nov 18, 2011

I believe this is due to local execution mode, try psuedo-distributed. Hadoop's local mode is poor at simulating the environment and I've found several bugs in it that I haven't had a chance to report. One I know of is that it doesn't handle relative paths so that could be the issue here.

tylere commented Nov 18, 2011

Yes, I was using the standalone configuration for Hadoop. Once I changed over to pseudo-distributed mode, I received a different error (which I will described in a separate issue).

If hadoopy does not work with Hadoop in Standalone Mode, this should probably be noted in the hadoopy documentation (perhaps in an installation section).

@tylere tylere closed this Nov 18, 2011

Owner

bwhite commented Jan 3, 2012

I looked into local a bit more and the issue isn't on hadoopy's side, local doesn't properly support emulating the distributed cache with respect to .tar file decompression. Eventually we can try to get this fixed in hadoop but it isn't a high priority.

Owner

bwhite commented Jan 3, 2012

Updated the readme to point to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment