Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support general partitioners and comparators #4

Closed
piccolbo opened this issue Sep 13, 2011 · 4 comments
Closed

support general partitioners and comparators #4

piccolbo opened this issue Sep 13, 2011 · 4 comments

Comments

@piccolbo
Copy link
Collaborator

advanced but important hadoop features, one can't work around their unavailability. Possible approach to do this is to create java classes that start an R server and pass an R expression to it to eval. See JRI.

@piccolbo
Copy link
Collaborator Author

@piccolbo
Copy link
Collaborator Author

piccolbo commented Jan 2, 2013

The above issue is abandoned, i think we could use a different route made possible by https://issues.apache.org/jira/browse/HADOOP-5528, see also #129

The approach would be as follows. Use BinaryPartitioner or a custom written partitioner to read key and value of type typedbyteswritable and convert the key into an integer. Make sure the integer is in the appropriate range, most likely by taking the reminder with the number of partitions. Return such reminder. Done. The simplifying assumption here is that we let the values v carry any data we are interested in. We let the key be the partition number. Given that a variety of complex data structures are allowed for the value, this is unlikely to imply any loss of generality.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Jan 2, 2013

It should be possible to use BinaryPartitioner. This is because it needs the key to be BinaryComparable and implementing WritableComparable implies BinaryComparable and TypedBytesWritable is BinaryComparable. The other reason is that since the BinaryPartitioner Patch was proposed by the author of Dumbo, so we know it must work for TypedBytesWritable keys. The advantage is that this class is part of all major recent distros already.

@piccolbo
Copy link
Collaborator Author

This is now RevolutionAnalytics/rmr2#21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant