# Question: How to find the k-th smallest number in an unsorted array?

In [38]:
import random
from typing import Callable

data = [random.randint(1, 100) for _ in range(20)]
k = 5

print("data is", data)
print("The sorted data is", sorted(data))
print("The", f"{k}-th of the array is:", sorted(data)[k-1])

data is [85, 25, 20, 67, 23, 37, 88, 73, 70, 83, 63, 54, 43, 18, 82, 34, 86, 73, 13, 1]
The sorted data is [1, 13, 18, 20, 23, 25, 34, 37, 43, 54, 63, 67, 70, 73, 73, 82, 83, 85, 86, 88]
The 5-th of the array is: 23


## 1. Pre-sort
Most intuitively, we can sort the array and pick the k-th elements. Generally speaking sort an array need at least $O(n\log n)$ time(Radix sort and count sort aren't satisfy all array).

In [39]:
def presort_find_kth(arr: list[int], k: int) -> int:
    sorted_arr = sorted(arr)
    return sorted_arr[k-1]#begin with index 0

In [40]:
#test the algorithm:
print("The", f"{k}-th of the array is:", presort_find_kth(data, k))

The 5-th of the array is: 23


## 2. Zone-select
We consider to choose a **pivot** $v$ like **quickly sort**, divide the whole array into three subarray: $(L, M, R)$. All elements in $L$ are smaller than $v$, all elements in $M$ are equal to $v$ and all elements are greater than $v$. Like this:

In [41]:
def divide(arr: list[int], v: int) -> tuple[list[int], list[int], list[int]]:
    L, M, R = [], [], []
    for e in arr:
        if e < v:
            L.append(e)
        elif e == v:
            M.append(e)
        else:
            R.append(e)
    return L, M, R

# e.g.
data_of_divide = [1,3,5,2,4,6,3,3,3]
L, M, R = divide(data_of_divide, 3)
print(f"The array {data_of_divide} is divided into : [L: {L}, M: {M}, R: {R}] by pivot 3")

The array [1, 3, 5, 2, 4, 6, 3, 3, 3] is divided into : [L: [1, 2], M: [3, 3, 3, 3], R: [5, 4, 6]] by pivot 3


Then there are three cases:
1. If $k\le|L|$, means the target is in the $L$, then we recursely call `find_kth(L, k)` to get the target.
2. If $k\in(|L|, |L|+|M|]$, then the target is rightly $v$
3. Otherwise, the target is in $R$, the we call `find_kth(R, k-len(L)-len(R))`

In [42]:
def find_kth(
    arr: list[int],
    k: int,
    find_pivot: Callable[[list[int]], int]
        = lambda arr :random.choice(arr)) -> int:
    pivot = find_pivot(arr)
    L, M, R = divide(arr, pivot)
    if k <= len(L):
        return find_kth(L, k)
    elif k <= len(L) + len(M):
        return pivot
    return find_kth(R, k-len(L)-len(M))

print("The", f"{k}-th of the array is:", presort_find_kth(data, k))

The 5-th of the array is: 23


Note that the choice of pivot determinate the size of the subprocess. Most ublucky, if the pivot we choose is the largest/smallest one, then the size of subprocess is same as itself. 

More generally, we analyse the total time complexity:
$$
T(n) = T_{\text{find pivot}}(n) + T(|L|\ or\ |R|) + O(n)(\text{Divide the total array})
$$

The **worst time** obviously is:
$$
T_{worst}(n) = T_{\text{find pivot}}(n) + T(max{|L|, |R|}) + O(n)
$$

Note that if we fix the pivot in a fixed ratio section(i.e. $w~1-w$ of the array), then the $max{|L|, |R|}\le (1-w)n$. We then have:
$$
T_{worst}(n) \le T_{\text{find pivot}}(n) + T(wn) + O(n)
$$

If we can control that $T_{\text{find pivot}}(n)=O(n)$, then we have:
$$
T_{worst}(n) \le T(wn) + O(n) = O(n)
$$


There are two ways to find the pivot:

## 1. Randomly choose
We randomly choose a pivot. After we divide the array into $(L,M,R)$,:
+ If $max(|L|, |R|) > \frac{n}{2}$, then choose a new pivot
+ Else, continue the algorithm.

The probability of the pivot in $0.25~0.75$ of the array is $\frac{1}{2}$, so on average we just need two time to choose a pivot. The time of the choosing and dividing is $O(n)$. Then the total time is $O(n)$

In [43]:
def find_kth_random_choose(
    arr: list[int],
    k: int,
    find_pivot: Callable[[list[int]], int]
        = lambda arr :random.choice(arr)) -> int:
    
    while True:
        pivot = find_pivot(arr)
        L, M, R = divide(arr, pivot)
        if max(len(L), len(R)) <= len(arr) * 0.5:
            break
        
    if k <= len(L):
        return find_kth(L, k)
    elif k <= len(L) + len(M):
        return pivot
    return find_kth(R, k-len(L)-len(M))

print("The", f"{k}-th of the array is:", find_kth_random_choose(data, k))

The 5-th of the array is: 23


## 2. BFPRT (Median od medians)
This algorithm is to fastly($O(n)$) find the median of medians. The precise analysis are in another note(Data structure-sort-quicksort). We just show the example code:

In [44]:
def median_of_medians(arr: list[int]) -> int:
    def brute_find_median(arr: list[int]) -> int:
        sorted_arr = sorted(arr)
        return sorted_arr[len(arr)//2]
    if len(arr) <= 5:
        return brute_find_median(arr)
    i = 0
    medians = []
    while i < len(arr):
        medians.append(brute_find_median(arr[
            i:i + (5 if i+5 <= len(arr) else len(arr))
        ]))
        i += 5
    return median_of_medians(medians)

print("The", f"{k}-th of the array is:", find_kth(data, k, find_pivot = median_of_medians))


The 5-th of the array is: 23
