-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support general partitioners and comparators #4
Comments
The above issue is abandoned, i think we could use a different route made possible by https://issues.apache.org/jira/browse/HADOOP-5528, see also #129 The approach would be as follows. Use BinaryPartitioner or a custom written partitioner to read key and value of type typedbyteswritable and convert the key into an integer. Make sure the integer is in the appropriate range, most likely by taking the reminder with the number of partitions. Return such reminder. Done. The simplifying assumption here is that we let the values v carry any data we are interested in. We let the key be the partition number. Given that a variety of complex data structures are allowed for the value, this is unlikely to imply any loss of generality. |
It should be possible to use BinaryPartitioner. This is because it needs the key to be BinaryComparable and implementing WritableComparable implies BinaryComparable and TypedBytesWritable is BinaryComparable. The other reason is that since the BinaryPartitioner Patch was proposed by the author of Dumbo, so we know it must work for TypedBytesWritable keys. The advantage is that this class is part of all major recent distros already. |
This is now RevolutionAnalytics/rmr2#21 |
advanced but important hadoop features, one can't work around their unavailability. Possible approach to do this is to create java classes that start an R server and pass an R expression to it to eval. See JRI.
The text was updated successfully, but these errors were encountered: