Skip to content
This repository

java.io.FileNotFoundException: File does not exist #137

Closed
geofferyzh opened this Issue October 05, 2012 · 15 comments

3 participants

geofferyzh Antonio Piccolboni Claudio Reggiani
geofferyzh

I installed RHadoop (rmr2,rhdfs packages) on my cloudera cdh3 virtual machine yesterday. When I tried to run the second tutorial example, the job seemed to finish correctly, but a "FileNotFoundException" occurred.

I had the same error message when trying to run the kmeans.R example. What did I do wrong here?

Thanks,
Shaohua


library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: itertools
Loading required package: iterators
Loading required package: digest
Loading required package: functional
library(rhdfs)
Loading required package: rJava

HADOOP_CMD=/usr/bin/hadoop-0.20

Be sure to run hdfs.init()

hdfs.init()

groups = rbinom(32, n = 50, prob = 0.4)
groups = to.dfs(groups)

12/10/05 12:08:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/05 12:08:46 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/10/05 12:08:46 INFO compress.CodecPool: Got brand-new compressor

from.dfs(mapreduce(input = groups, map = function(k,v) keyval(v, 1), reduce = function(k,vv) keyval(k, length(vv))))
packageJobJar: [/tmp/RtmpYyfscn/rmr-local-env163a6ff6d07e, /tmp/RtmpYyfscn/rmr-global-env163a5affd46d, /tmp/RtmpYyfscn/rmr-streaming-map163a474285bd, /tmp/RtmpYyfscn/rmr-streaming-reduce163a63a9bfda, /var/lib/hadoop-0.20/cache/training/hadoop-unjar6175181393484515689/] [] /tmp/streamjob5614144549414339994.jar tmpDir=null
12/10/05 12:09:00 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/05 12:09:00 INFO streaming.StreamJob: getLocalDirs(): [/var/lib/hadoop-0.20/cache/training/mapred/local]
12/10/05 12:09:00 INFO streaming.StreamJob: Running job: job_201210051159_0001
12/10/05 12:09:00 INFO streaming.StreamJob: To kill this job, run:
12/10/05 12:09:00 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201210051159_0001
12/10/05 12:09:00 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201210051159_0001
12/10/05 12:09:01 INFO streaming.StreamJob: map 0% reduce 0%
12/10/05 12:09:09 INFO streaming.StreamJob: map 100% reduce 0%
12/10/05 12:09:18 INFO streaming.StreamJob: map 100% reduce 100%
12/10/05 12:09:20 INFO streaming.StreamJob: Job complete: job_201210051159_0001
12/10/05 12:09:20 INFO streaming.StreamJob: Output: /tmp/RtmpYyfscn/file163a71ea0500
Exception in thread "main" java.io.FileNotFoundException: File does not exist: 3
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:546)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
$key
[1] 0

$val
[1] 50

Antonio Piccolboni
Collaborator
geofferyzh

Thanks for your quick reply.

Antonio Piccolboni piccolbo closed this October 05, 2012
Antonio Piccolboni piccolbo reopened this October 05, 2012
Antonio Piccolboni
Collaborator

Are you sure the result is correct? maybe you should run it once more but print groups before calling to.dfs so that you know what to expect. It appears your sample had 50 0s in it.

geofferyzh

Ok, so I ran the following code to print the output, but got something not readable...

hadoop fs -cat /tmp/RtmpYyfscn/file163a403e27e2/part-00000

output:
Q/org2[training@server0 ~]$ .TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritableͤ����-���M���

what could have gone wrong?

Antonio Piccolboni
Collaborator
geofferyzh

Thanks. I'm new to both Hadoop and R and only worked with text file in hadoop. So I wrongfully assumed that "everything" is in text format...

The result I got was incorrect. I got all 50 0s. "from.dfs(to.dfs(rbinom(32, n = 50, prob = 0.4)))" gives me all zeros.

$key
NULL

$val
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0

Any pointers?

Antonio Piccolboni
Collaborator
geofferyzh

Thanks. I'm using Cloudera's CDH3 training VM on Windows 7.

While waiting for the next release, is there a way that I can install RMR1 so that I can start learning RHadoop ?

Antonio Piccolboni
Collaborator
geofferyzh

Linux server0.training.local 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:56 EDT 2011 i686 i686 i386 GNU/Linux

This is the printout i got.

I will try to install CDH4, thanks

Claudio Reggiani

I have the same issue geofferyzh had above
#137 (comment)

so I'm working on CentOS 5.8 32bit, I will wait for the patch.

Antonio Piccolboni
Collaborator
Claudio Reggiani
> from.dfs(to.dfs(1:10))
12/10/20 00:13:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/20 00:13:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/10/20 00:13:16 INFO compress.CodecPool: Got brand-new compressor
12/10/20 00:13:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/20 00:13:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/10/20 00:13:20 INFO compress.CodecPool: Got brand-new decompressor
$key
NULL

$val
 [1]  1  2  3  4  5  6  7  8  9 10

>

I think the patch is working, but there is another error, while calling mapreduce function (whatever from the tutorial) I get

12/10/20 00:12:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/20 00:12:01 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
12/10/20 00:12:01 INFO compress.CodecPool: Got brand-new compressor
Error in do.call(paste.options, backend.parameters) : 
  second argument must be a list
>

Finally I agree with you for the architecture, I'm using rhadoop on my personal computer and I'm not deploying anything in production mode, so I need to study MapReduce, Hadoop, R together.

Thanks for all

Antonio Piccolboni
Collaborator
Antonio Piccolboni
Collaborator

@Nophiq I may have missed one of your reports (the backend.parameters problem) in the midst of your message, i believe I fixed it based on a separate report but please check again for me in the 2.0.1 branch. Also, it would really help me if we keep it to one problem per issue and the reason is that an issue is either open or close, if there are two problems in one issue I can't mark one closed and other open.

Antonio Piccolboni piccolbo closed this October 25, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.