## Profiling Python's built-in `sorted()` and `list.sort()` methods

In [15]:
import time
import numpy as np
from typing import cast

rng = np.random.default_rng()
TEST_RUNS = 10
sizes = [10, 50, 100, 1000, 2500, 7500, 10000, 12500, 17500, 20000]

### Testing `sorted`

In [16]:
for i in range(TEST_RUNS):
        n = sizes[i]
        arr = rng.integers(low=0, high=100, size=n).tolist()

        start = time.perf_counter()
        sorted(arr)
        end = time.perf_counter()

        print(f"n = {n:<8,} time = {(end - start) * 1_000:.3f} ms")

n = 10       time = 0.005 ms
n = 50       time = 0.006 ms
n = 100      time = 0.013 ms
n = 1,000    time = 0.183 ms
n = 2,500    time = 0.247 ms
n = 7,500    time = 1.142 ms
n = 10,000   time = 1.763 ms
n = 12,500   time = 1.931 ms
n = 17,500   time = 1.985 ms
n = 20,000   time = 2.391 ms



### Results

To figure out the expected runtime of `sorted`, we can look at ratios for common average-case runtimes:
* $\Theta(n)$
* $\Theta(\lg n)$
* $\Theta(n \lg n)$
* $\Theta(n^2)$

The results are as follows:

| $n$    | $T(n)$    | $\dfrac{T(n)}{n}$ | $\dfrac{T(n)}{\lg n}$ | $\dfrac{T(n)}{n \lg n}$ | $\dfrac{T(n)}{n^2}$ |
| ------ | --------- | ----------------- | --------------------- | ----------------------- | ------------------- |
| 10     | 0.005     | 0.000500          | 0.001505              | 0.000150                | 0.000050            |
| 50     | 0.006     | 0.000120          | 0.001063              | 0.000021                | 0.0000024           |
| 100    | 0.013     | 0.000130          | 0.001958              | 0.000020                | 0.00000013          |
| 1,000  | 0.183     | 0.000183          | 0.01836               | 0.0000184               | 0.000000183         |
| 2,500  | 0.247     | 0.000099          | 0.02188               | 0.00000875              | 0.0000000395        |
| 7,500  | 1.142     | 0.000152          | 0.08874               | 0.0000118               | 0.0000000152        |
| 10,000 | 1.763     | 0.000176          | 0.1326                | 0.0000133               | 0.0000000176        |
| 12,500 | 1.931     | 0.000154          | 0.1418                | 0.0000113               | 0.0000000123        |
| 17,500 | 1.985     | 0.000113          | 0.1408                | 0.00000804              | 0.00000000649       |
| 20,000 | 2.391     | 0.000120          | 0.1674                | 0.00000593              | 0.00000000598       |

Looking at the ratios for $\Theta(n)$, we don't see the values stabilize at all. Similarly with $\Theta(n \lg n)$,
the ratios appear to increase exponentially as $n$ increases. Looking at $\Theta(n^2)$, the ratios appear to decrease
as $n$ increases. The ratios for $\Theta(n \lg n)$ are not perfect, but we can see that they tend to stabilize somewhere
between $1.0 \times 10^{-5}$ and $1.0 \times 10^{-6}$. This leads us to conclude that `sorted` runs in $\Theta(n \lg n)$ time.

### Testing `list.sort()`

In [17]:
for i in range(TEST_RUNS):
        n = sizes[i]
        arr = cast(list, rng.integers(low=0, high=100, size=n).tolist())

        start = time.perf_counter()
        arr.sort()
        end = time.perf_counter()

        print(f"n = {n:<8,} time = {(end - start) * 1_000:.3f} ms")

n = 10       time = 0.002 ms
n = 50       time = 0.004 ms
n = 100      time = 0.007 ms
n = 1,000    time = 0.090 ms
n = 2,500    time = 0.229 ms
n = 7,500    time = 0.904 ms
n = 10,000   time = 1.042 ms
n = 12,500   time = 1.171 ms
n = 17,500   time = 1.840 ms
n = 20,000   time = 2.367 ms



### Results

We will analyze `list.sort` similar to how we analyzed `sorted` - by looking at ratios for common average-case runtimes:
* $\Theta(n)$
* $\Theta(\lg n)$
* $\Theta(n \lg n)$
* $\Theta(n^2)$

The results are as follows:

| $n$    | $T(n)$ | $\dfrac{T(n)}{n}$ | $\dfrac{T(n)}{\lg n}$ | $\dfrac{T(n)}{n \lg n}$ | $\dfrac{T(n)}{n^2}$ |
| ------ | ------ | ----------------- | --------------------- | ----------------------- | ------------------- |
| 10     | 0.002  | 0.000200          | 0.000603              | 0.0000603               | 0.000020            |
| 50     | 0.004  | 0.000080          | 0.000709              | 0.0000141               | 0.0000016           |
| 100    | 0.007  | 0.000070          | 0.001053              | 0.0000105               | 0.0000007           |
| 1,000  | 0.090  | 0.000090          | 0.00904               | 0.000000906             | 0.000000090         |
| 2,500  | 0.229  | 0.0000916         | 0.02028               | 0.00000811              | 0.0000000366        |
| 7,500  | 0.904  | 0.0001205         | 0.07027               | 0.00000937              | 0.0000000161        |
| 10,000 | 1.042  | 0.0001042         | 0.07842               | 0.00000785              | 0.0000000104        |
| 12,500 | 1.171  | 0.0000937         | 0.08603               | 0.00000688              | 0.00000000749       |
| 17,500 | 1.840  | 0.000105          | 0.1305                | 0.00000746              | 0.00000000600       |
| 20,000 | 2.367  | 0.000118          | 0.1657                | 0.00000821              | 0.00000000592       |

Similar to the results for `sorted`, the ratios for $\Theta(n \lg n)$ are the most stable (somewhere
between $1.0 \times 10^{-5}$ and $1.0 \times 10^{-7}$). This leads us to conclude that `list.sort` also
runs in $\Theta(n \lg n)$ time.

## Final Analysis

Both `sorted` and `list.sort` in Python are designed to use the same sorting algorithm, known as _Timsort_.
It's a hybrid algorithm, mirroring the behavior of both merge-sort and insertion-sort. It achieves
an average-case $\Theta(n \lg n)$ runtime, which is consistent with our experimental results.

The two methods are similar, but have a few minor differences. While `list.sort` will only work on
instances of the `list` class, `sorted` is generic, and will sort any `iterable` whose elements are comparable.
The `list.sort` method performs all operations _in-place_, so the list it acts upon will be modified. The `sorted`
method, on the other hand, returns a new sorted collection - taking up $\mathrm{O}(n)$ auxillary space in the process.
