# Order Statistics

<!--<img src="OrderStatistics.png" width=600>-->
Order statistics is to return the $k$-th smallest element in a list of $n$ numbers. For example, the smallest, the 2nd smallest, the median (which is the $\lfloor n/2\rfloor$ element), and so on.

Order statistics has a straightforward $O(n\log n)$-time solution via sorting. It's also straightforward to do in linear time under certain assumption of input arrays via linear-time sorting algorithms.

Can order statistics be done in linear time for any inputs?

Using divide-and-conquer in the same way as quicksort, but only recursively calling one of the subarrays in each iteration instead of both subarrays as in quicksort. 

<!--<img src="RandomizedSelect.png" width=400>-->

It can be shown that this algorithm has an expected linear running time. 

In [7]:
# This function returns k'th smallest element
# in arr[l..r] using QuickSort based method.
# ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
import sys
 
def kthSmallest(A, p, r, k):
    # If k is smaller than number of elements in array
    if (k > 0 and k <= r - p + 1):
 
        # Partition the array around last
        # element and get position of pivot
        # element in sorted array
        q = partition(A, p, r)
 
        # If position is same as k
        if (q - p == k - 1):
            return A[q]
        if (q - p > k - 1):  # If position is more,
                              # recur for left subarray
            return kthSmallest(A, p, q - 1, k)
 
        # Else recur for right subarray
        return kthSmallest(A, q + 1, r,
                           k - q + p - 1)
 
    # If k is greater than the number of elements in array
    return sys.maxsize

def kthLargest(A, p, r, k):
    k = len(A) - k + 1
    # If k is smaller than number of elements in array
    if (k > 0 and k <= r - p + 1):
 
        # Partition the array around last
        # element and get position of pivot
        # element in sorted array
        q = partition(A, p, r)
 
        # If position is same as k
        if (q - p == k - 1):
            return A[q]
        if (q - p > k - 1):  # If position is more,
                              # recur for left subarray
            return kthSmallest(A, p, q - 1, k)
 
        # Else recur for right subarray
        return kthSmallest(A, q + 1, r,
                           k - q + p - 1)
 
    # If k is greater than the number of elements in array
    return sys.maxsize
 
# Standard partition process of QuickSort().
# It considers the last element as pivot and
# moves all smaller element to left of it
# and greater elements to right
 
def partition(A, p, r):
 
    pivot = A[r]
    i = p
    for j in range(p, r):
        if (A[j] <= pivot):
            A[i], A[j] = A[j], A[i]
            i += 1
    A[i], A[r] = A[r], A[i]
    return i
 
# Driver's Code
if __name__ == "__main__":
    A = [12, 3, 3, 5, 7, 4, 19, 26, 26, 25, 28, 30]
    n = len(A)
    k = 4
    print("The %i-th smallest element is" %k, kthSmallest(A, 0, n - 1, k))
    print("The median element is", kthSmallest(A, 0, n - 1, (len(A)+1)//2))
    print("The %i-th largest element is" %k, kthLargest(A, 0, n - 1, k))

The 4-th smallest element is 5
The median element is 12
The 4-th largest element is 26


# Complexity Analysis of kthSmallest()

Let $n$ be the number of elements in $A$ and $T(n)$ the random varialbe denoting the running time of executing kthSmallest$(A, 0, n-1, k_0)$ on a given value $k_0$. Let $X_k$ denote a random variable such that $X_k =1$ if, after partition, the subarray $A[p..q]$ contains exactly $k$ elements. Assuming the input numbers are distributed uniformly at random, namely, any number has an equal chance to be selected a pivot (if you feel more comfortable to accept this statement, you may randomly permute $A$ first, but which is unnecessary because of our assumption). That is,
$p(k) = p(X_k = 1) = 1/n$ for any $k$. Note that $\sum_{k=1}^n X_k = 1$.
Then
\begin{eqnarray*}
T(n) &\leq& \sum_{k=1}^n X_k \cdot(T(\max\{k-1, n-k\})+O(n)) \\ 
&=& \sum_{k=1}^n X_k T(\max\{k-1, n-k\})+O(n)\sum_{k=1}^n X_k \\
&=& \sum_{k=1}^n X_k T(\max\{k-1, n-k\})+O(n).
\end{eqnarray*}
Note that $X_k$ and $T(\max\{k-1, n-k\})$ are independent because $T(\max\{k-1, n-k\})$ is the same regardness the pivot. Thus,
\begin{eqnarray*}
E[T(n)] &\leq& E[\sum_{k=1}^n X_k T(\max\{k-1, n-k\})+O(n)] \\
&=& \sum_{k=1}^n E[X_k]E[T(\max\{k-1, n-k\}] + O(n)
\end{eqnarray*}
Note that
$\max\{k-1,n-k\} = k-1$ if $k > \lceil n/2\rceil$ and $n-k$ otherwise.
Also note that each $T(k)$ from $k$ from $\lceil n/2\rceil$ to $n-1$ appears exactly twise. If $n$ is odd, $T(\lfloor n/2\rfloor)$ appears once. Thus,
\begin{eqnarray*}
E[T(n)] &\leq& 2\sum_{k=\lfloor n/2\rfloor}^{n-1} E[T(k)] +O(n) \\
&\leq& 2\cdot\sum_{k=\lfloor n/2\rfloor}^{n-1} ck\cdot p(k) +an \mbox{ for some positive $a$ and $c > 4a$} \\
&=& \frac{2c}{n}\left(\sum_{k=1}^{n-1} k - \sum_{k=1}^{\lfloor n/2\rfloor-1} k\right) + an \\
&=& \frac{2c}{n}\cdot \left(\frac{n(n-1)}{2} - \frac{\lfloor n/2\rfloor(\lfloor n/2\rfloor+1)}{2}\right) + an \\
&\leq& \frac{2c}{n}\cdot \left(\frac{n(n-1)}{2} - \frac{(n/2-2)(n/2-1)}{2}\right) + an  \\
&=& cn \left(\frac{3n}{4}+\frac{1}{2} - \frac{2}{n}\right) + an \\
&<& \frac{3cn}{4}+\frac{c}{2} + an \\
&=& cn -\left(\frac{cn}{4} - \frac{c}{2} - an\right) \\
&\leq& cn~ \mbox{  (when $n \geq 2c/(c-4a)$)}.
\end{eqnarray*}
Thus, $E[T(n)] = O(n)$.

In [1]:
"""
Find the kth smallest element in expected linear time using divide and conquer similar to quicksort.
Assume all input numbers are distinct.
"""
from random import choice
from typing import List

def random_pivot(A):
    """
    Choose a random pivot from A. A more sophisticated algorithm such as the median-of-medians may be used here
    """
    return choice(A)

def kth_order(A: List[int], k: int) -> int:
    # pick a pivot and separate into list based on pivot.
    pivot = random_pivot(A)

    # partition based on pivot
    small = [e for e in A if e < pivot]
    big = [e for e in A if e > pivot]

    # if we get lucky, pivot might be the element we want.
    if len(small) == k - 1:
        return pivot
    # pivot is in elements bigger than k
    elif len(small) < k - 1:
        return kth_order(big, k - len(small) - 1)
    else: # pivot is in elements smaller than k
        return kth_order(small, k)
    
print(kth_order([2, 1, 3, 4, 5], 3))
print(kth_order([2, 1, 3, 4, 5], 5))
print(kth_order([25, 21, 98, 100, 76, 22, 43, 60, 89, 87], 4))
print(kth_order([2, 1, 3, 4, 5, 1], 1))
print(kth_order([3, 2, 5, 6, 7, 8], 2))

3
5
43
1
3


# Order Statistics in Worst-Case Linear Time

We can push it further to obtain an algorithm that computes order statistics in worst-case linear time using divide-and-conquer in a more sophisticated way, which is more of a theortical interest because in practice, the algorithm based on quicksort partition is sufficient. Thus, we will omit this algorithm. If you like the theoretical challenge to figure it out, please refer to Section 9.3 for details.