Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text output and combiner don't work together #113

Closed
piccolbo opened this issue Jun 29, 2012 · 4 comments
Closed

text output and combiner don't work together #113

piccolbo opened this issue Jun 29, 2012 · 4 comments

Comments

@piccolbo
Copy link
Collaborator

It was reported by Saar on the RHadoop Google group https://groups.google.com/d/topic/rhadoop/5yYKZZLSX8U/discussion. It seems a consequence of mapper and reducer writing out in two different formats, and the reducer expecting the one from the mapper. When the combiner is activated, it outputs the reducer output format which the reducer then can't read. This warning is telling:

52 WARN streaming.PipeMapRed: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable

Investigating.

@piccolbo
Copy link
Collaborator Author

Repro with

from.dfs(mapreduce(to.dfs(1:10), combine = T, reduce = function(k,vv) keyval(NULL,sum(unlist(vv))), output.format="csv"), format="csv")

@piccolbo
Copy link
Collaborator Author

Entered a comment here: https://issues.apache.org/jira/browse/HADOOP-1722?focusedCommentId=13404201#comment-13404201 as this is where binary formats for hadoop streaming were introduced and I suspect they did not foresee the use of streaming combiners, added way later with HADOOP-4842. I added a comment there too. I am trying to understand what the intent was when both binary formats and streaming combiners were added to hadoop.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Dec 6, 2012

Updated test case to

from.dfs(mapreduce(to.dfs(1:10), combine = T, map = function(k,v) keyval(1,v),  reduce = function(k,vv) keyval(1,sum(unlist(vv))), output.format="csv"), format="csv")

@piccolbo
Copy link
Collaborator Author

piccolbo commented Mar 5, 2013

this is now RevolutionAnalytics/rmr2#16

@piccolbo piccolbo closed this as completed Mar 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant