# Quick Select



## Quick Select Algorithm

The Quick Select algorithm is a selection algorithm in computer science, used to find the k-th smallest element in an unordered list. It was developed by Tony Hoare, who also created the Quick Sort algorithm. Quick Select is an efficient in-place algorithm with an average-case complexity of O(n), where n is the number of elements in the list. However, its worst-case complexity is O(n^2).

The algorithm works similarly to Quick Sort, using the partitioning method. Here's an overview of the steps:

* Choose a 'pivot' element from the list (there are various ways to select the pivot, such as choosing the first, last, or median element).
* Partition the list such that all elements less than the pivot are to its left, and all elements greater than the pivot are to its right.
* Check the position of the pivot in the partitioned list:

1. If the pivot's position is equal to k, then the pivot is the k-th smallest element, and the algorithm terminates.
2. If the pivot's position is greater than k, repeat the process on the left partition.
3. If the pivot's position is less than k, repeat the process on the right partition, adjusting the value of k accordingly.

In most cases, Quick Select is more efficient than sorting the list and then selecting the k-th element, especially when k is small relative to the size of the list. However, Quick Select is not a stable selection algorithm, meaning that it doesn't preserve the relative order of equal elements in the list.

![GIF](https://upload.wikimedia.org/wikipedia/commons/0/04/Selecting_quickselect_frames.gif)

## Finding Median in Linear Time - maybe

To find the median value of a list using the Quick Select algorithm, you need to determine the position of the median element, depending on whether the list has an odd or even number of elements.

If the list has an odd number of elements (2n + 1), the median is the element at position (n + 1). So, you would use Quick Select to find the (n + 1)-th smallest element in the list.

If the list has an even number of elements (2n), the median is the average of the two middle elements at positions n and (n + 1). In this case, you would run Quick Select twice: first, to find the n-th smallest element, and then to find the (n + 1)-th smallest element. Finally, you would compute the average of these two elements to get the median value.

Using Quick Select to find the median value can be very efficient, with an average-case complexity of O(n). However, in the worst-case scenario, the complexity can be O(n^2). To avoid the worst-case complexity, you can use an algorithm called the Median of Medians, which is a pivot selection strategy that guarantees linear complexity (O(n)) for finding the median value.

In [1]:
import random
random.seed(2023) # For reproducibility

def partition(arr, low, high):
    pivot_index = random.randint(low, high)
    pivot = arr[pivot_index]
    arr[pivot_index], arr[high] = arr[high], arr[pivot_index]

    i = low
    for j in range(low, high):
        if arr[j] <= pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1

    arr[i], arr[high] = arr[high], arr[i]
    return i

def quick_select(arr, k, low=None, high=None):
    if low is None:
        low = 0
    if high is None:
        high = len(arr) - 1

    if low == high:
        return arr[low]

    pivot_index = partition(arr, low, high)

    if k == pivot_index:
        return arr[k]
    elif k < pivot_index:
        return quick_select(arr, k, low, pivot_index - 1)
    else:
        return quick_select(arr, k, pivot_index + 1, high)

# Example usage
arr = [10, 4, 5, 8, 6, 11, 3]
k = len(arr) // 2  # The 4th smallest element - here it is median
median = quick_select(arr, k - 1)
print(f"The {k}-th smallest element is: {median}")

The 3-th smallest element is: 5


In [13]:
# Let's generate some random numbers and find the median
a_1M = [random.randint(0, 1_000_000) for _ in range(100_000+1)] # to have odd numbers for simplicity

a_1M_backup = a_1M.copy() # to compare with sorted array
# Let's find the median
median = quick_select(a_1M, len(a_1M) // 2 )
print(f"The median of the array is: {median}")

# check that the array is not changed
if a_1M != a_1M_backup:
    print("The array is changed!")

# let's find median using sorted array
a_1M_sorted = sorted(a_1M)
print(f"The median of the array is: {a_1M_sorted[len(a_1M_sorted) // 2]}")

The median of the array is: 501732
The array is changed!
The median of the array is: 501732


## Timing the Algorithms

In [14]:
%%timeit
quick_select(a_1M, len(a_1M) // 2 )

36.7 ms ± 4.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [15]:
%%timeit
sorted(a_1M)[len(a_1M) // 2]

10.6 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [17]:
%%timeit
sorted(a_1M_backup)[len(a_1M_backup) // 2]

18.2 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [9]:
# lets get 10 million random numbers
a_10M = [random.randint(0, 100_000_000) for _ in range(10_000_000+1)] # to have odd numbers for simplicity


In [10]:
%%timeit
quick_select(a_10M, len(a_10M) // 2 )

6.04 s ± 1.73 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
%%timeit
sorted(a_10M)[len(a_10M) // 2]

5 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Why is our quick select so slow? - MOM to the rescue

We are fighting against optimized Timsort. 

We are also using a naive partitioning algorithm. We can do better.

From Jeff Erickson's [blog pg.37 ch. 1](https://jeffe.cs.illinois.edu/teaching/algorithms/book/01-recursion.pdf):

Specifically, we divide the input array into dn/5e blocks, each containing
exactly 5 elements, except possibly the last. (If the last block isn’t full, just throw
in a few 1s.) We compute the median of each block by brute force, collect
those medians into a new array M[1 ..dn/5e], and then recursively compute
the median of this new array. Finally, we use the median of the block medians
(called “mom” in the pseudocode below) as the quickselect pivot.

In [19]:
def partition(arr, low, high, pivot_value):
    for i in range(low, high):
        if arr[i] == pivot_value:
            arr[i], arr[high] = arr[high], arr[i]
            break

    pivot_index = high
    i = low
    for j in range(low, high):
        if arr[j] <= pivot_value:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1

    arr[i], arr[pivot_index] = arr[pivot_index], arr[i]
    return i

def median_of_medians(arr, low, high):
    if high - low + 1 <= 5:
        return sorted(arr[low:high+1])[len(arr[low:high+1])//2]

    medians = []
    for i in range(low, high+1, 5):
        group_end = min(i+5, high+1)
        median = sorted(arr[i:group_end])[len(arr[i:group_end])//2]
        medians.append(median)

    return median_of_medians(medians, 0, len(medians) - 1)

def quick_select(arr, k, low=None, high=None):
    if low is None:
        low = 0
    if high is None:
        high = len(arr) - 1

    if low == high:
        return arr[low]

    pivot_value = median_of_medians(arr, low, high)
    pivot_index = partition(arr, low, high, pivot_value)

    if k == pivot_index:
        return arr[k]
    elif k < pivot_index:
        return quick_select(arr, k, low, pivot_index - 1)
    else:
        return quick_select(arr, k, pivot_index + 1, high)

# Example usage
arr = [10, 4, 5, 8, 6, 11, 3]
k = len(arr) // 2  # The 4th smallest element - here it is median
median = quick_select(arr, k - 1)
print(f"The {k}-th smallest element is: {median}")

The 3-th smallest element is: 5


In [21]:
len(a_1M)

100001

In [22]:
quick_select(a_1M, len(a_1M) // 2 )

501732

In [23]:
a_1M[len(a_1M) // 2]  # aha! we already know the median

501732

In [24]:
%%timeit
a_1M[len(a_1M) // 2]  #obviously lookup is the fastest

103 ns ± 1.73 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [20]:
%%timeit
quick_select(a_1M, len(a_1M) // 2 )

5.93 s ± 1.76 s per loop (mean ± std. dev. of 7 runs, 1 loop each)




## References

* [Quick Select on Wikipedia](https://en.wikipedia.org/wiki/Quickselect)
* [Median of Medians on Wikipedia](https://en.wikipedia.org/wiki/Median_of_medians)
* [Quick Select Visualization](https://www.cs.usfca.edu/~galles/visualization/QuickSelect.html)