# Heapsort

Heapsort combines the better attributes of insertion sort and merge sort: it runs in time $O(n\log n)$ like merge sort, and uses insertion operation like insertion sort, but on a balanced binary structure called heap. Using an appropriate data structure for designin algorithm is another algorithm design technique.

A heap is a full binary tree except possibly the bottom (leaf) level, where each node holds a number such that for each non-leaf node, its number is greater than or equal to the largest numbers of its childen. The heap is also a good mechanis for implementing a priorty queue.

Given a list (an array) $A$ of $n$ numbers, heapsort first builds a heap for these numbers, then repeat the following: (1) remove the root and place it to an array, and (2) heapify the remaining tree, until only one node is left.

In particular, we view $A$ as a binary tree as follows: $A[0]$ is the root, $A[1]$ and $A[2]$ are the left child and the right child of the root, respectively. Recursively, for any $i \leq \lfloor n/2\rfloor-1$, $A[2i+1]$ and $A[2i+2]$ are the left child and the right child of $A[i]$ (if exists), respectively. 

For example, let $A = [1, 2, 3, 4, 5, 6, 7]$ with $n = 7$. Let $i = 2$, then $A[5] = 6$ and $A[6] = 7$ are
left and right child of $A[2] = 3$, respectively. If $A = [1, 2, 3, 4, 5, 6, 7, 8]$ with $n = 8$, then
$A[3] = 4$, $A[7] = 8$ is the left child of $A[3]$, but $A[3]$ doesn't have a right child.

Now we would need to place the largest number of a subtree to the root of the subtree by comparisons and swapping. Recursively, start from the root of the subtree at the lowest level and build a heap up. This procedure is called heapify.

In [1]:
# To heapify subtree rooted at index i.
# n is size of heap

# Build a max heap
def heapify_max(A, n, i):
    largest = i # Initialize largest as root of the subtree
    l = 2 * i + 1 # left = 2*i + 1
    r = 2 * i + 2 # right = 2*i + 2

    # See if left child of root exists and is
    # greater than root
    if l < n and A[i] < A[l]:
        largest = l

    # See if right child of root exists and is
    # greater than root
    if r < n and A[largest] < A[r]:
        largest = r

    # Change root, if needed
    if largest != i:
        A[i], A[largest] = A[largest], A[i] # swap
        # Heapify the root.
        heapify_max(A, n, largest)

# Build a heap
def buildHeap_max(A):
    n = len(A)
    # Build a maxheap.
    # Since last parent will be at ((n//2)-1) we can start at that location.
    # Build a heap from bottom up
    for i in range(n // 2 - 1, -1, -1):
        heapify_max(A, n, i)

# The main function to sort an array of given size
def heapSort_max(A):
    n = len(A)
    buildHeap_max(A)
    # One by one extract elements
    print('Sorted array is:', end= " ")
    for i in range(n - 1, -1, -1):
        print(A[0], end=" ")
        A[i], A[0] = A[0], A[i] # swap
        heapify_max(A, i, 0)


In [2]:
# Driver code to test above
A = [10, 9, 12, 11, 13, 5, 6, 7, 2, 1]
heapSort_max(A)


Sorted array is: 13 12 11 10 9 7 6 5 2 1 

# Complexity analysis

For buildHeap($A$), we start from the subtrees at lowest level and build up the tree by moving up. Thus, when we move up to one level of the tree starting from the bottom, for each node at that level, it is the root of two subtrees that are already heaps. Thus, to make the new subtree a heap, the number of comparisions and swapping is propotional to the height $h$ of the subtree; namely, $ch$ for a constant $c > 0$. At the leaf level, there are no more than $n/2$ nodes, and the height of each node is 0. At a higher level there are $n/4$ nodes that are roots of subtrees of height 0 with the height of the subtree being 1. Let $h_{\max} = \log n$ be the height of the tree, then the number of operations is the following:
\begin{align*}
& c(0 \cdot n/2 + 1 \cdot n/2^2 + 2 \cdot n/2^3 + \cdots h_{\max} \cdot 1) \\
& = c \cdot \sum_{h=1}^{\log n} \frac{n}{2^{h+1}} \\
& = \frac{cn}{2} \frac{1/2-1/2^{\log n +1}}{1 - 1/2} \\
& < cn/2 \\
& = O(n).
\end{align*}

For heapSort(A), we remove the number on the root, then heapify the remaining numbers, which takes $O(\log m)$ operations with $m$ being the number of the corresponding subtree. Since $m \leq n$ and there are $n$ numbers to move, heap sort takes $O(n \log n)$ time.

# Is Heapsort Stable?

No. Reason: Operations can change the relative order of equivalent keys. 

Challenge question for you to think about: Can you make heapsort stable as an in-place algorithm?

We could use additional information to use heapsort and reverse unstabled items back to the original order as follows:

Let $A$ be the original list. Create a new array 
<code>B = [(A[i], i) for i in range(len(A))]</code>
Then heapsort B on the first element. For the sorted array, still call it $B$, for those items with the same first element, sort these items on the second elements to restore the original order. Then out the first elements in order.

In [19]:
import math
print(math.log(10,2)) # base 2, value 10
print(math.log(10,2)//1) # the same as int(math.log(10,2))
print

3.0