# Chapter 09: Fast Sorting and Selection

It is possible to sort n elements in $O(n)$ time, provided the keys being used to sort them are a reasonably small range of integers

## Bucket Sort and Radix Sort

### Bucket Sort

Instead of comparisons, bucket sort uses keys as indices to a bucket array B. An item with key k is placed in B[k], which itself is a list of items of key k. After inserting each item of the input sequence S into its bucket, the bucket contents are enumerates in order (B[0], B[1], ..., B[N-1]). This algorithm is as follows:

![bucket-sort](./res/09-bucket-sort.PNG)

Bucket sort is a **stable** algorithm, meaning that the order of items with equal keys is maintained. Bucket sort achieves stability by always removing the first element from S and from each list B[i] during the execution of the algorithm. Bucket sort has a run time of $O(n+N)$, with n being the number of items and N being the maximum integer key value

### Radix Sort

Radix sort applies bucket sort twice on a sequence of pairs; first using the second component of the pair as the ordering key and then using the first component. This maintains the **lexicographical** convention, where $(k_1,l_1) < (k_2,l_2)$ if:

- $k_1 < k_2$ or
- $k_1 = k_2$ and $l_1 < l_2$
  
Let S be a sequence of n key-element items, each of which has a key $(k_1,k_2,...,k_d)$ where $k_i$ is an integer in the range $[0, N-1]$ for some integer $N \geq 24$. S may be sorted lexicographically using radix sort in time $O(d(n + N))$

## Selection

**Median**: element such that half of the other elements are smaller and the remaining half are larger

**Order statistics**: queries that ask for an element with a given rank

**Selection problem**: selecting the kth smallest element from an unsorted collection of n comparable elements. May be solved in $O(n)$ time using the **prune-and-search** methodology

### Randomized Quick Select

Randomized quick select solves the selection problem in $O(n)$ expected time and $O(n^2)$ worst case time. An element x from S is picked at random and is used as a pivot to subdivide S into three sub-sequences of L, E, and G. A determination based on k is then made to see which of these sequences should be solved 

![quick-select](./res/09-quick-select.PNG)

**Linearity of expectation** is used to justify quick select's $O(n)$ running time

### Deterministic Selection

This is a modification of quick-select to make it deterministic yet still run in $O(n)$ time. It is based on the following approach:

- partition the set S into $\lceil n/5 \rceil$ groups of size 5 each
- sort each group and identify its median element
- apply the algorithm recursively on these $\lceil n/5 \rceil$ "baby medians" to find their medians
- use this element as the pivot and proceed as in quick select

This yields the following algorithm:

![deterministic-selection](./res/09-deterministic-selection.PNG)

Compared to quick select, deterministic selection reduces the odds of picking a bad pivot

## Weighted Medians

Weighted medians represent the median of weighted elements with respect to their weights. The prune-and-search technique is used to efficiently solve the weighted median problem. This algorithm uses deterministic selection to find the unweighted median, y. Then, the weights of every element less than y and greater than or equal to y are computed. If the first of these weights is greater than total weight W/2 then y is too large, meaning that the elements less than y should be recursed over. If the second weight is less than W/2 then y is too small, meaning recursion on the elements greater than y. Otherwise, y is the weighted median. This is demonstrated in the following pseudocode:

![prune-median](./res/09-prune-median.PNG)

The above algorithm runs in $O(n)$ time