-
-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cupy.partition #162
Comments
https://en.wikipedia.org/wiki/Selection_algorithm#Unordered_partial_sorting There are the following algorithms to solve such kind of problems:
The following papers mention on radixselect:
The first paper has its public source code and Tensorflow guys have some discussion on it recently. |
I compared the performance of:
(Celeron G3900 @ 2.80GHz and GeForce GTX 750 Ti) Just https://gist.github.com/sonots/7e11231af2f2cc6b1f519ad832632566
|
According to Fig. 7 in the paper,
sort & choose has O(N) complexity in time for vector length It seems reasonable to first implement |
cupy.partition
, the counterpart ofnumpy.partition
, creates a copy of an array with its elements rearranged in such a way that the value of the element in k-th position is in the position it would be in a sorted array. This function is equivalent tostd::nth_element
of C++ STL.numpy.partition
https://docs.scipy.org/doc/numpy/reference/generated/numpy.partition.html
std::nth_element
http://en.cppreference.com/w/cpp/algorithm/nth_element
Unfortunately, neither of the following parallel algorithm libraries implement this kind of parallel selection algorithm:
For now, the most likely is to implement radix select based on CUB's radix sort implementation.
The text was updated successfully, but these errors were encountered: