## 14.6 Quicksort

Like merge sort, **quicksort** divides the input sequence in two partitions,
recursively sorts each partition and then puts them together.
Whereas merge sort divides the sequence by position, in two halves,
quicksort divides the items by key: those with smaller keys go into
one partition, those with larger keys go in the other.
After the partitions are sorted,
they just have to be concatenated, which is much simpler than merging.

Insertion sort did less work than selection sort when splitting the input but
more when combining the subsolution with the removed item.
Likewise, merge sort does less work than quicksort when splitting but
more work when combining the subsolutions.

<div class="alert alert-warning">
<strong>Note:</strong> You can design different decrease-and-conquer or divide-and-conquer algorithms
by making different phases of the approach simpler.
</div>

### 14.6.1 Algorithm

Quicksort starts by choosing one item as the **pivot**,
then splits the other items into those smaller and those larger than the pivot.
After each partition is sorted, they're put together:
first the smaller items, then the pivot and finally the larger items.

Here's how quicksort processes our example, with the pivot
being the first item:

![This diagram shows seven rows of sequences of letters.
When a sequence is split, it is connected with lines to
the two resulting unsorted sequences and the pivot in the next row.
When two sorted sequences and the pivot are concatenated,
they are connected with straight lines to
the resulting sequence in the next row.
The first row shows sequence SORTING.
The second row shows it has been split into ORING and T, with pivot S.
The third row shows that ORING is split into ING and R, with pivot O.
The fourth row shows that ING is split into G and N with pivot I.
These are concatenated into GIN.
The sixth row concatenates GIN and R around pivot O to obtain GINOR.
The seventh row concatenates GINOR and T around pivot S to obtain GINORST.
](14_6_quicksort.png)

In the first step, the pivot is S, the letters that come before S are
ORING and the only letter that comes after is T. After sorting ORING into GINOR,
the S and T are concatenated back in that order, resulting in GINORST.

Here's a recursive quicksort algorithm.
It uses an auxiliary function that partitions the unsorted sequence into
three sequences: the smaller items, a sequence of length&nbsp;1 with the pivot and
the sequence of larger items.

1. if _n_ < 2:
   1. let _sorted_ be _unsorted_
1. otherwise:
    1. let (_smaller_, _pivot_, _larger_) be partition(_unsorted_)
    1. let _sorted_ be quicksort(_smaller_, _key_) concatenated with _pivot_ and quicksort(_larger_, _key_)

Step&nbsp;2.1 is an abuse of the assignment notation to
make the algorithm more readable.

The partition function simply chooses the first item as the pivot and adds the
other items to either partition, depending on how they compare to the pivot.

1. let _smaller_ be ()
2. let _larger_ be ()
3. let _pivot_ be _unsorted_[0]
4. for each _index_ from 1 to _n_ − 1:
   1. let _item_ be _unsorted_[*index*]
   1. if _key_(_item_) < _key_(_pivot_):
      1. append _item_ to _smaller_
   2. otherwise:
      1. append _item_ to _larger_
5. let _output_ be (_smaller_, (_pivot_), _larger_)

Note that the final step doesn't return the pivot but a sequence with it,
so that the concatenation operation can be applied in step&nbsp;2.2 of quicksort.

#### Exercise 14.6.1

Is the algorithm stable?

_Write your answer here._

[Hint](../31_Hints/Hints_14_6_01.ipynb)
[Answer](../32_Answers/Answers_14_6_01.ipynb)

### 14.6.2 Complexity

Each recursive call goes through its input sequence twice:
first to partition it and then to concatenate the partitions and the pivot.
The recursive complexity definition is:

- if _n_ < 2: T(_n_) = Θ(1)
- if _n_ ≥ 2: T(_n_) = Θ(_n_) + T(│*smaller*│) + T(│*larger*│) + Θ(_n_)
  = T(│*smaller*│) + T(│*larger*│) + Θ(_n_).

If the input is sorted, then the pivot (the first item)
is the smallest one. So all other items are put in partition _larger_  and
partition _smaller_  is empty. The recurrence relation becomes:

T(_n_) = T(0) + T(_n_ − 1) + Θ(_n_) = Θ(1) + T(_n_ − 1) + Θ(_n_) = T(_n_ − 1) + Θ(_n_).

We've seen before that this resolves to T(_n_) = Θ(*n*²).

In the best-case scenario, the pivot is the middle value and quicksort halves
the sequence, like merge sort. The recurrence becomes:

T(_n_) = T(_n_ / 2) + T(_n_ / 2) + Θ(_n_) = 2×T(_n_ / 2) + Θ(_n_).

This resolves to T(_n_) = Θ(_n_ log _n_).

It has been proven that the average complexity of quicksort,
when items are in random order, is also log-linear.

#### Exercise 14.6.2

What's the complexity of quicksort if the input is in reverse order?

_Write your answer here._

[Hint](../31_Hints/Hints_14_6_02.ipynb)
[Answer](../32_Answers/Answers_14_6_02.ipynb)

#### Exercise 14.6.3

Is quicksort adaptive?

_Write your answer here._

[Answer](../32_Answers/Answers_14_6_03.ipynb)

As the analysis shows, the choice of pivot is crucial to achieve
log-linear complexity. One common approach is to choose a random item.
Another way is to pick the median of the first, middle and last items.
Unless we're unlucky and those three items have duplicate keys, this guarantees
the pivot has neither the lowest nor the highest key in the sequence,
which would lead to quadratic complexity.

### 14.6.3 Code and performance

The code mainly follows the algorithm with two small changes.
First, the pivot's key is computed only once.
Second, the partitioning algorithm isn't in a separate function.
This makes the code shorter and easier to follow, in my opinion.

In [1]:
%run -i ../m269_util
%run -i ../m269_sorting

def quick_sorted(unsorted: list, key: Callable) -> list:
    """Return a permutation with keys in non-decreasing order.

    Preconditions: for any indices i and j,
    key(unsorted[i]) and key(unsorted[j]) are comparable
    """
    # base case: sequences with 0 or 1 items are sorted
    if len(unsorted) < 2:
        return unsorted
    # divide the input: select the pivot and create the partitions
    smaller = []
    larger = []
    pivot = unsorted[0]
    pivot_key = key(pivot)
    for index in range(1, len(unsorted)):
        item = unsorted[index]
        if key(item) < pivot_key:
            smaller.append(item)
        else:
            larger.append(item)
    # recur into the partitions and combine the results
    return quick_sorted(smaller, key) + [pivot] + quick_sorted(larger, key)

test(quick_sorted, sorting_tests)

Tests finished.


Let's confirm that sorting an ascending sequence takes quadratic time:

In [2]:
for doubling in range(5):
    items = list(range(100 * 2**doubling))
    %timeit -r 5 quick_sorted(items, identity)

925 µs ± 31.6 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
3.92 ms ± 570 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)
15.4 ms ± 354 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)
63.1 ms ± 3.73 ms per loop (mean ± std. dev. of 5 runs, 10 loops each)
236 ms ± 32.5 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)


To observe the log-linear complexity, I use a Python function to
shuffle the items to put them in random order:

In [3]:
from random import shuffle

for doubling in range(5):
    items = list(range(100 * 2**doubling))
    shuffle(items)
    %timeit -r 5 quick_sorted(items, identity)

161 µs ± 8.05 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
410 µs ± 15.7 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
894 µs ± 97.7 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
1.8 ms ± 79.5 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
4.4 ms ± 557 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)


### 14.6.4 In-place version

Quicksort is usually implemented in-place, swapping smaller and larger items
so that the smaller items end up in the left-hand part of the sequence and
the larger items in the right-hand part, with the pivot between them.
Once each part is sorted, no concatenation is necessary.
A visualisation explaining the in-place algorithm is [here](https://learn2.open.ac.uk/mod/oucontent/view.php?id=1827810&extra=thumbnail_idm45069228656528).

In-place quicksort isn't stable but uses less memory and is much faster
than the version above because it doesn't create and concatenate sequences.
It nevertheless has the same best-, average- and worst-case complexities.

Python used in-place quicksort before the invention of Timsort.

⟵ [Previous section](14_5_merge_sort.ipynb) | [Up](14-introduction.ipynb) | [Next section](14_7_quicksort_variants.ipynb) ⟶