## 14.7 Quicksort variants

This section presents two variations on quicksort to further reinforce
the divide-and-conquer approach and its relation to decrease and conquer.

### 14.7.1 Three-way quicksort

Divide and conquer doesn't have to be in halves.
We can partition the _unsorted_ sequence in three,
with the items smaller than, equal to and larger than the pivot.
Items with the same key as the pivot don't have to be further sorted.

The main quicksort algorithm stays the same, because it already
divides the input in three sequences and recurs into two of them.

1. if _n_ < 2:
   1. let _sorted_ be _unsorted_
1. otherwise:
    1. let (_smaller_, _pivot_, _larger_) be partition(_unsorted_)
    1. let _sorted_ be quicksort(_smaller_, _key_) concatenated with _pivot_ and quicksort(_larger_, _key_)

The partition function does change slightly:
the middle sequence is no longer one item (the pivot),
but is all items with the same key as the pivot.

I take the opportunity to choose a random pivot to reduce the chance of
quadratic complexity for already sorted inputs.

1. let _smaller_ be the empty sequence
1. let _equal_ be the empty sequence
1. let _larger_ be the empty sequence
1. let _pivot_ be a random element of _unsorted_
1. for each _item_ in _unsorted_:
   1. if _key_(_item_) < _key_(_pivot_):
      1. append _item_ to _smaller_
   1. otherwise if _key_(_item_) = _key_(_pivot_):
      1. append _item_ to _equal_
   1. otherwise
      1. append _item_ to _larger_
1. let _output_ be (_smaller_, _equal_, _larger_)

Is this version stable?

___

The items with the same key as the pivot are copied in their original order
to partition _equal_, hence it's stable.

#### Exercise 14.7.1

If all items in the input sequence have the same key, what's the complexity of

- 'normal' quicksort, i.e. without steps 5.2 and 5.2.1?
- three-way quicksort?

_Write your answer here._

[Hint](../31_Hints/Hints_14_7_01.ipynb)
[Answer](../32_Answers/Answers_14_7_01.ipynb)

Three-way quicksort still has quadratic worst-case complexity
if each chosen pivot has the lowest or highest key. However, it's unlikely that
every recursive call will randomly choose the worst possible pivot.

A sorted and a reverse-sorted input are no longer worst-case scenarios:
both are sorted in log-linear time due to the random pivot choice.

Three-way quicksort isn't adaptive either.
The partition sizes and therefore the number of recursive calls
depend on where the pivot is in the sorted output and
how many items have the same key as the pivot,
not on whether the input is partially sorted.

### 14.7.2 Quickselect

Next I'm going to show a decrease-and-conquer adaption of quicksort to
solve a different problem.

Consider the **selection problem**: find the _k_-th smallest item in a non-empty
unsorted sequence, with 0 < _k_ ≤ _n_. For example,
if _k_ = 1 then we're looking for the minimum and
if _k_ = _n_ then we're looking for the maximum.

If we know that there will be many queries on the same sequence,
then it's best to sort it once and return the _k_-th item for each query.
Let's assume we don't know that and thus must solve the selection problem
without sorting.

The **quickselect** algorithm adapts two-way quicksort. It only recursively
searches the partition that includes the sought item,
discarding the other partition. How does it know where the item is?

Well, if partition _smaller_ has _k_ − 1 items, then the pivot, which is
the next larger item, is the _k_-th smallest item. This is a base case:
the algorithm returns the pivot without recurring into either partition.

If partition _smaller_ has _k_ or more items, then the _k_-th smallest
must be there, so the algorithm recurs into it and ignores partition _larger_.

Finally, if partition _smaller_ has fewer than _k_ − 1 items, the sought item is
in the other partition. But it's not the _k_-th smallest item of that partition.
Let's suppose we're looking for the 17th smallest item among 20 items and that
partition _smaller_ has 14 items. Together with the pivot, we can discard
15 items. The sought item is thus the second smallest in partition _larger_.
More generally, if _smaller_ has _s_ items,
we search for the _k_−_s_−1-th smallest item in _larger_.

It has been proven that on average quickselect has linear complexity.

#### Exercise 14.7.2

What kind of decrease and conquer is quickselect?

_Write your answer here._

[Hint](../31_Hints/Hints_14_7_02.ipynb)
[Answer](../32_Answers/Answers_14_7_02.ipynb)

#### Exercise 14.7.3

Here again is the quicksort algorithm.
The _pivot_ returned by the auxiliary function is a single-item sequence.

1. if _n_ < 2:
   1. let _sorted_ be _unsorted_
1. otherwise:
    1. let (_smaller_, _pivot_, _larger_) be partition(_unsorted_)
    1. let _sorted_ be quicksort(_smaller_, _key_) concatenated with _pivot_ and quicksort(_larger_, _key_)

Modify the above to become the quickselect algorithm.
You can assume the function call is quickselect(_unsorted_, _key_, _k_)
with a non-empty _unsorted_ sequence and 0 < _k_ ≤ _n_.

[Hint](../31_Hints/Hints_14_7_03.ipynb)
[Answer](../32_Answers/Answers_14_7_03.ipynb)

⟵ [Previous section](14_6_quicksort.ipynb) | [Up](14-introduction.ipynb) | [Next section](14_8_pigeonhole.ipynb) ⟶