In [1]:
from IPython.core.display import HTML
with open('../style.css') as file:
    css = file.read()
HTML(css)

# An Iterative Implementation of Merge Sort

The function $\texttt{sort}(L)$ sorts the list $L$ in place using <em style="color:blue">merge sort</em>.
It takes advantage of the fact that, in *Python*, lists are stored internally as arrays.
The function `sort` is a wrapper for the function `merge_sort`.  Its sole purpose is to allocate the auxiliary array `A`, 
which has the same size as the array storing `L`.

In [2]:
def sort(L):
    A = L[:]  # A is a copy of L
    mergeSort(L, A)

The function `mergeSort` is called with 2 arguments.
  - The first parameter $\texttt{L}$ is the list that is to be sorted.
  - The second parameter $\texttt{A}$ is used as an auxiliary array.  This array is needed
    as <em style="color:blue">temporary storage</em> and is required to have the same size as the list $\texttt{L}$.

The implementation uses two loops:
* The outer while loop sorts sublists of length `n`. Before the $\texttt{n}^{\mbox{th}}$ iteration of the outer while
  loop, all sublists of the form `L[n*k:n*(k+1)]` are sorted.  After the $\texttt{n}^{\mbox{th}}$ iteration, all
  sublists of the form `L[2*n*k:2*n*(k+1)]` are sorted.
* The inner while loop merges the sublists `L[n*k:n*(k+1)]` and `L[n*(k+1):n*(k+2)]` for even values of `k`.

In [3]:
def mergeSort(L, A):
    n = 1
    while n < len(L):
        k = 0
        while n * k + n < len(L):
            top = min(n * k + 2 * n, len(L))
            merge(L, n * k, n * k + n, top, A)
            k += 2    
        n *= 2

The function `merge` takes five arguments.
  - `L`      is a list,
  - `start`  is an integer such that $\texttt{start}  \in \{0, \cdots, \texttt{len}(L)-1 \}$,
  - `middle` is an integer such that $\texttt{middle} \in \{0, \cdots, \texttt{len}(L)-1 \}$,
  - `end`    is an integer such that $\texttt{end}    \in \{0, \cdots, \texttt{len}(L)-1 \}$, 
  - `A`      is a list of the same length as `L`.
  
Furthermore, the indices `start`, `middle` and `end` have to satisfy the following inequations:
$$ 0 \leq \texttt{start} < \texttt{middle} < \texttt{end} \leq \texttt{len}(L) $$
The function assumes that the sublists `L[start:middle]` and `L[middle:end]` are already sorted.
The function merges these sublists so that when the call returns the sublist `L[start:end]`
is sorted.  The last argument `A` is used as auxiliary memory.

In [4]:
def merge(L, start, middle, end, A):
    A[start:end] = L[start:end]
    idx1 = start
    idx2 = middle
    i    = start
    while idx1 < middle and idx2 < end:
        if A[idx1] <= A[idx2]:
            L[i]  = A[idx1]
            idx1 += 1
        else:
            L[i]  = A[idx2]
            idx2 += 1
        i += 1
    if idx1 < middle:
        L[i:end] = A[idx1:middle]
    if idx2 < end:
        L[i:end] = A[idx2:end]

## Testing

In [5]:
import random as rnd
from collections import Counter

In [6]:
def demo():
    L = [ rnd.randrange(1, 100) for n in range(1, 20) ]
    print("L = ", L)
    S = L[:]
    sort(S)
    print("S = ", S)
    print(Counter(L))
    print(Counter(S))
    print(Counter(L) == Counter(S))

In [7]:
demo()

L =  [65, 50, 95, 3, 17, 1, 13, 51, 91, 50, 1, 36, 42, 61, 17, 43, 55, 42, 74]
S =  [1, 1, 3, 13, 17, 17, 36, 42, 42, 43, 50, 50, 51, 55, 61, 65, 74, 91, 95]
Counter({50: 2, 17: 2, 1: 2, 42: 2, 65: 1, 95: 1, 3: 1, 13: 1, 51: 1, 91: 1, 36: 1, 61: 1, 43: 1, 55: 1, 74: 1})
Counter({1: 2, 17: 2, 42: 2, 50: 2, 3: 1, 13: 1, 36: 1, 43: 1, 51: 1, 55: 1, 61: 1, 65: 1, 74: 1, 91: 1, 95: 1})
True


The function `isOrdered(L)` checks that the list `L` is sorted in ascending order.

In [8]:
def isOrdered(L):
    for i in range(len(L) - 1):
        assert L[i] <= L[i+1]

The function `sameElements(L, S)` returns `True`if the lists `L` and `S` contain the same elements and, furthermore, each 
element $x$ occurring in `L` occurs in `S` the same number of times it occurs in `L`.

In [9]:
def sameElements(L, S):
    assert Counter(L) == Counter(S)

The function $\texttt{testSort}(n, k)$ generates $n$ random lists of length $k$, sorts them, and checks whether the output is sorted and contains the same elements as the input.

In [10]:
def testSort(n, k):
    for i in range(n):
        L = [ rnd.randrange(2*k) for x in range(k) ]
        oldL = L[:]
        sort(L)
        isOrdered(L)
        sameElements(oldL, L)
        print('.', end='')
    print()
    print("All tests successful!")

In [11]:
%%time
testSort(100, 20000)

....................................................................................................
All tests successful!
CPU times: user 10.8 s, sys: 129 ms, total: 10.9 s
Wall time: 11 s


In [12]:
%%timeit
k = 1_000_000
L = [ rnd.randrange(2*k) for x in range(k) ]
sort(L)

7.21 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


*Python* offers a predefined function `sorted` that can be used to sort a list of numbers.
Let us see how it compares to our implementation.

In [None]:
%%timeit
k = 1_000_000
L = [ rnd.randrange(2*k) for x in range(k) ]
S = sorted(L)

In [None]:
help(sorted)